Skip to content

Commit

Permalink
improve: add Python 3.12 support (#3033) (#3047)
Browse files Browse the repository at this point in the history
### Summary

Closes #2959. Updates the dependency and CI to add support for Python
3.12.

The MongoDB ingest tests were disabled due to jobs like [this
one](https://github.com/Unstructured-IO/unstructured/actions/runs/9133383127/job/25116767333)
failing due to issues with the `bson` package. `bson` is a dependency
for the AstraDB connector, but `pymongo` does not work when `bson` is
installed from `pip`. This issue is documented by MongoDB
[here](https://pymongo.readthedocs.io/en/stable/installation.html). Spun
off #3049 to resolve this. Issue seems unrelated to Python 3.12, though
unsure why this didn't surface previously.

Disables the `argilla` tests because `argilla` does not yet support
Python 3.12. We can add the `argilla` tests back in once the PR
references below is merged. You can still use the `stage_for_argilla`
function if you're on `python<3.12` and you install `argilla` yourself.
- argilla-io/argilla#4837

---------

Co-authored-by: Nicolò Boschi <boschi1997@gmail.com>
  • Loading branch information
MthwRobinson and nicoloboschi committed May 19, 2024
1 parent 76831f1 commit d760801
Show file tree
Hide file tree
Showing 45 changed files with 166 additions and 155 deletions.
5 changes: 5 additions & 0 deletions .github/actions/base-cache/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,14 @@ runs:
if: steps.virtualenv-cache-restore.outputs.cache-hit != 'true'
shell: bash
run: |
python${{ inputs.python-version }} -m pip install --upgrade virtualenv
python${{ inputs.python-version }} -m venv .venv
source .venv/bin/activate
[ ! -d "$NLTK_DATA" ] && mkdir "$NLTK_DATA"
if [ "${{ inputs.python-version == '3.12' }}" == "true" ]; then
python -m ensurepip --upgrade
python -m pip install --upgrade setuptools
fi
make install-ci
- name: Save Cache
if: steps.virtualenv-cache-restore.outputs.cache-hit != 'true'
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
setup:
strategy:
matrix:
python-version: ["3.9","3.10","3.11"]
python-version: ["3.9","3.10","3.11", "3.12"]
runs-on: ubuntu-latest
env:
NLTK_DATA: ${{ github.workspace }}/nltk_data
Expand All @@ -30,7 +30,7 @@ jobs:
check-deps:
strategy:
matrix:
python-version: ["3.9","3.10","3.11"]
python-version: ["3.9","3.10","3.11", "3.12"]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
Expand All @@ -44,7 +44,7 @@ jobs:
check-extras:
strategy:
matrix:
python-version: [ "3.9","3.10","3.11" ]
python-version: [ "3.9","3.10","3.11","3.12" ]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
Expand Down Expand Up @@ -98,7 +98,7 @@ jobs:
test_unit:
strategy:
matrix:
python-version: ["3.9","3.10","3.11"]
python-version: ["3.9","3.10","3.11", "3.12"]
runs-on: ubuntu-latest
env:
NLTK_DATA: ${{ github.workspace }}/nltk_data
Expand Down Expand Up @@ -161,7 +161,7 @@ jobs:
source .venv/bin/activate
sudo apt-get update
sudo apt-get install -y poppler-utils
make install-pandoc
make install-pandoc install-test
sudo add-apt-repository -y ppa:alex-p/tesseract-ocr5
sudo apt-get update
sudo apt-get install -y tesseract-ocr tesseract-ocr-kor
Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
## 0.14.1-dev0

* **Add support for Python 3.12**. `unstructured` now works with Python 3.12!

### Features

### Fixes

## 0.14.0

### BREAKING CHANGES
Expand Down
6 changes: 2 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,10 @@ install-test:
# NOTE(yao) - CI seem to always install tesseract to test so it would make sense to also require
# pytesseract installation into the virtual env for testing
python3 -m pip install unstructured.pytesseract -c requirements/deps/constraints.txt
python3 -m pip install argilla -c requirements/deps/constraints.txt
# python3 -m pip install argilla==1.28.0 -c requirements/deps/constraints.txt
# NOTE(robinson) - Installing weaviate-client separately here because the requests
# version conflicts with label_studio_sdk
python3 -m pip install weaviate-client -c requirements/deps/constraints.txt
# TODO (yao): find out if how to constrain argilla properly without causing conflicts
python3 -m pip install argilla

.PHONY: install-dev
install-dev:
Expand Down Expand Up @@ -439,7 +437,7 @@ version-sync:

.PHONY: check-coverage
check-coverage:
coverage report --fail-under=95
coverage report --fail-under=90

## check-deps: check consistency of dependencies
.PHONY: check-deps
Expand Down
2 changes: 1 addition & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -120,5 +120,5 @@ urllib3==1.26.18
# -c ././deps/constraints.txt
# -c ./base.txt
# requests
zipp==3.18.1
zipp==3.18.2
# via importlib-metadata
10 changes: 7 additions & 3 deletions requirements/base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,9 @@ mypy-extensions==1.0.0
nltk==3.8.1
# via -r ./base.in
numpy==1.26.4
# via -r ./base.in
# via
# -c ././deps/constraints.txt
# -r ./base.in
packaging==23.2
# via
# -c ././deps/constraints.txt
Expand All @@ -67,7 +69,7 @@ python-magic==0.4.27
# via -r ./base.in
rapidfuzz==3.9.0
# via -r ./base.in
regex==2024.5.10
regex==2024.5.15
# via nltk
requests==2.31.0
# via
Expand Down Expand Up @@ -104,4 +106,6 @@ urllib3==1.26.18
# requests
# unstructured-client
wrapt==1.16.0
# via -r ./base.in
# via
# -c ././deps/constraints.txt
# -r ./base.in
2 changes: 1 addition & 1 deletion requirements/build.txt
Original file line number Diff line number Diff line change
Expand Up @@ -120,5 +120,5 @@ urllib3==1.26.18
# -c ././deps/constraints.txt
# -c ./base.txt
# requests
zipp==3.18.1
zipp==3.18.2
# via importlib-metadata
9 changes: 8 additions & 1 deletion requirements/deps/constraints.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ wheel>=0.38.1
certifi>=2023.7.22
# From pycocotools in local-inference
pyparsing<3.1.0
scipy<1.11.0
scipy<1.11.4
IPython<8.13
# NOTE(alan) Pinned to avoid error that occurs with 2.4.3:
# AttributeError: 'ResourcePath' object has no attribute 'collection'
Expand Down Expand Up @@ -54,3 +54,10 @@ botocore<1.34.52

# NOTE(jennings): pinned due to later versions not supporting api_key_auth in UnstructuredClient
unstructured-client<=0.18.0

fsspec==2024.5.0

# python 3.12 support
numpy>=1.26.0
wrapt>=1.14.0

16 changes: 11 additions & 5 deletions requirements/dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ async-lru==2.0.4
# via jupyterlab
attrs==23.2.0
# via
# -c ./test.txt
# jsonschema
# referencing
babel==2.15.0
Expand Down Expand Up @@ -140,11 +141,14 @@ jsonpointer==2.4
# via jsonschema
jsonschema[format-nongpl]==4.22.0
# via
# -c ./test.txt
# jupyter-events
# jupyterlab-server
# nbformat
jsonschema-specifications==2023.12.1
# via jsonschema
# via
# -c ./test.txt
# jsonschema
jupyter==1.0.0
# via -r ./dev.in
jupyter-client==8.6.1
Expand Down Expand Up @@ -181,7 +185,7 @@ jupyter-server==2.14.0
# notebook-shim
jupyter-server-terminals==0.5.3
# via jupyter-server
jupyterlab==4.1.8
jupyterlab==4.2.0
# via notebook
jupyterlab-pygments==0.3.0
# via nbconvert
Expand Down Expand Up @@ -216,7 +220,7 @@ nest-asyncio==1.6.0
# via ipykernel
nodeenv==1.8.0
# via pre-commit
notebook==7.1.3
notebook==7.2.0
# via jupyter
notebook-shim==0.2.4
# via
Expand Down Expand Up @@ -307,6 +311,7 @@ qtpy==2.4.1
# via qtconsole
referencing==0.35.1
# via
# -c ./test.txt
# jsonschema
# jsonschema-specifications
# jupyter-events
Expand All @@ -325,6 +330,7 @@ rfc3986-validator==0.1.1
# jupyter-events
rpds-py==0.18.1
# via
# -c ./test.txt
# jsonschema
# referencing
send2trash==1.8.3
Expand Down Expand Up @@ -400,7 +406,7 @@ urllib3==1.26.18
# -c ./base.txt
# -c ./test.txt
# requests
virtualenv==20.26.1
virtualenv==20.26.2
# via pre-commit
wcwidth==0.2.13
# via prompt-toolkit
Expand All @@ -418,7 +424,7 @@ wheel==0.43.0
# pip-tools
widgetsnbextension==4.0.10
# via ipywidgets
zipp==3.18.1
zipp==3.18.2
# via importlib-metadata

# The following packages are considered to be unsafe in a requirements file:
Expand Down
1 change: 1 addition & 0 deletions requirements/extra-csv.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
#
numpy==1.26.4
# via
# -c ././deps/constraints.txt
# -c ./base.txt
# pandas
pandas==2.2.2
Expand Down
2 changes: 1 addition & 1 deletion requirements/extra-markdown.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,5 @@ importlib-metadata==7.1.0
# via markdown
markdown==3.6
# via -r ./extra-markdown.in
zipp==3.18.1
zipp==3.18.2
# via importlib-metadata
9 changes: 5 additions & 4 deletions requirements/extra-paddleocr.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ attrdict==2.0.1
# via unstructured-paddleocr
babel==2.15.0
# via flask-babel
bce-python-sdk==0.9.9
bce-python-sdk==0.9.10
# via visualdl
blinker==1.8.2
# via flask
Expand All @@ -31,7 +31,7 @@ contourpy==1.2.1
# via matplotlib
cssselect==1.2.0
# via premailer
cssutils==2.10.2
cssutils==2.11.0
# via premailer
cycler==0.12.1
# via matplotlib
Expand Down Expand Up @@ -95,6 +95,7 @@ networkx==3.2.1
# via scikit-image
numpy==1.26.4
# via
# -c ././deps/constraints.txt
# -c ./base.txt
# contourpy
# imageio
Expand Down Expand Up @@ -182,7 +183,7 @@ scikit-image==0.22.0
# via
# imgaug
# unstructured-paddleocr
scipy==1.10.1
scipy==1.11.3
# via
# -c ././deps/constraints.txt
# imgaug
Expand Down Expand Up @@ -218,7 +219,7 @@ visualdl==2.5.3
# via unstructured-paddleocr
werkzeug==3.0.3
# via flask
zipp==3.18.1
zipp==3.18.2
# via
# importlib-metadata
# importlib-resources
17 changes: 10 additions & 7 deletions requirements/extra-pdf-image.txt
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,9 @@ flatbuffers==24.3.25
# via onnxruntime
fonttools==4.51.0
# via matplotlib
fsspec==2024.3.1
fsspec==2024.5.0
# via
# -c ././deps/constraints.txt
# huggingface-hub
# torch
google-api-core[grpc]==2.19.0
Expand Down Expand Up @@ -101,6 +102,7 @@ networkx==3.2.1
# via torch
numpy==1.26.4
# via
# -c ././deps/constraints.txt
# -c ./base.txt
# contourpy
# layoutparser
Expand All @@ -119,7 +121,7 @@ onnx==1.16.0
# via
# -r ./extra-pdf-image.in
# unstructured-inference
onnxruntime==1.17.3
onnxruntime==1.18.0
# via unstructured-inference
opencv-python==4.8.0.76
# via
Expand Down Expand Up @@ -222,7 +224,7 @@ rapidfuzz==3.9.0
# via
# -c ./base.txt
# unstructured-inference
regex==2024.5.10
regex==2024.5.15
# via
# -c ./base.txt
# transformers
Expand All @@ -238,7 +240,7 @@ safetensors==0.4.3
# via
# timm
# transformers
scipy==1.10.1
scipy==1.11.3
# via
# -c ././deps/constraints.txt
# layoutparser
Expand All @@ -250,7 +252,7 @@ sympy==1.12
# via
# onnxruntime
# torch
timm==0.9.16
timm==1.0.3
# via effdet
tokenizers==0.19.1
# via transformers
Expand All @@ -272,7 +274,7 @@ tqdm==4.66.4
# huggingface-hub
# iopath
# transformers
transformers==4.40.2
transformers==4.41.0
# via unstructured-inference
typing-extensions==4.11.0
# via
Expand All @@ -296,7 +298,8 @@ urllib3==1.26.18
# requests
wrapt==1.16.0
# via
# -c ././deps/constraints.txt
# -c ./base.txt
# deprecated
zipp==3.18.1
zipp==3.18.2
# via importlib-resources
1 change: 1 addition & 0 deletions requirements/extra-xlsx.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ networkx==3.2.1
# via -r ./extra-xlsx.in
numpy==1.26.4
# via
# -c ././deps/constraints.txt
# -c ./base.txt
# pandas
openpyxl==3.1.2
Expand Down
Loading

0 comments on commit d760801

Please sign in to comment.