Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Docker images to use Python 3.12 #3051

Open
MthwRobinson opened this issue May 20, 2024 · 2 comments
Open

Update Docker images to use Python 3.12 #3051

MthwRobinson opened this issue May 20, 2024 · 2 comments
Labels
packaging Issues with building and installing `unstructured`

Comments

@MthwRobinson
Copy link
Contributor

Currently the AMD image uses Python 3.11 and the ARM image using Python 3.10. Since we support Python 3.12 as of #3047, we can now update these containers to use Python 3.12 instead. This will keep us on the latest version and reduce the risk that our build will break of Python 3.11 is dropped in wolfi-base:latest.

@MthwRobinson MthwRobinson added the packaging Issues with building and installing `unstructured` label May 20, 2024
@MthwRobinson
Copy link
Contributor Author

MthwRobinson commented May 21, 2024

Tried this but, as seen in this job, it says unstructured_inference is not installed for some reason.

python3.12 -c "from unstructured_inference.models.tables import UnstructuredTableTransformerModel; model = UnstructuredTableTransformerModel(); model.initialize('microsoft/table-transformer-structure-recognition')":
1.070 [nltk_data] Downloading package punkt to /home/nonroot/nltk_data...
1.122 [nltk_data]   Unzipping tokenizers/punkt.zip.
2.206 [nltk_data] Downloading package averaged_perceptron_tagger to
2.206 [nltk_data]     /home/nonroot/nltk_data...
2.228 [nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
2.481 Traceback (most recent call last):
2.481   File "<string>", line 1, in <module>
2.481   File "/app/unstructured/partition/model_init.py", line 3, in <module>
2.481     from unstructured_inference.models.base import get_model
2.481 ModuleNotFoundError: No module named 'unstructured_inference'
------
Dockerfile-amd64:36
--------------------
  35 |     
  36 | >>> RUN python3.12 -c "import nltk; nltk.download('punkt')" && \
  37 | >>>   python3.12 -c "import nltk; nltk.download('averaged_perceptron_tagger')" && \
  38 | >>>   python3.12 -c "from unstructured.partition.model_init import initialize; initialize()" && \
  39 | >>>   python3.12 -c "from unstructured_inference.models.tables import UnstructuredTableTransformerModel; model = UnstructuredTableTransformerModel(); model.initialize('microsoft/table-transformer-structure-recognition')"
  40 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c python3.12 -c \"import nltk; nltk.download('punkt')\" &&   python3.12 -c \"import nltk; nltk.download('averaged_perceptron_tagger')\" &&   python3.12 -c \"from unstructured.partition.model_init import initialize; initialize()\" &&   python3.12 -c \"from unstructured_inference.models.tables import UnstructuredTableTransformerModel; model = UnstructuredTableTransformerModel(); model.initialize('microsoft/table-transformer-structure-recognition')\"" did not complete successfully: exit code: 1

@MthwRobinson
Copy link
Contributor Author

Looks like the issue may be pycocotools

  Building wheel for pycocotools (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for pycocotools (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [25 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-312
      creating build/lib.linux-x86_64-cpython-312/pycocotools
      copying pycocotools/cocoeval.py -> build/lib.linux-x86_64-cpython-312/pycocotools
      copying pycocotools/coco.py -> build/lib.linux-x86_64-cpython-312/pycocotools
      copying pycocotools/mask.py -> build/lib.linux-x86_64-cpython-312/pycocotools
      copying pycocotools/__init__.py -> build/lib.linux-x86_64-cpython-312/pycocotools
      running build_ext
      /tmp/pip-build-env-4ptm5umq/overlay/lib/python3.12/site-packages/Cython/Compiler/Main.py:381: FutureWarning: Cython directive 'language_level' not set, using '3str' for now (Py3). This has changed from earlier releases! File: /tmp/pip-install-02k6iv2j/pycocotools_341f2d6f5e184f7499e7fde6f3c47217/pycocotools/_mask.pyx
        tree = Parsing.p_module(s, pxd, full_module_name)
      Compiling pycocotools/_mask.pyx because it changed.
      [1/1] Cythonizing pycocotools/_mask.pyx
      building 'pycocotools._mask' extension
      creating build/temp.linux-x86_64-cpython-312
      creating build/temp.linux-x86_64-cpython-312/common
      creating build/temp.linux-x86_64-cpython-312/pycocotools
      x86_64-pc-linux-gnu-gcc -fno-strict-overflow -DNDEBUG -g -O3 -Wall -O2 -Wall -fomit-frame-pointer -march=x86-64-v2 -mtune=broadwell -O2 -Wall -fomit-frame-pointer -march=x86-64-v2 -mtune=broadwell -fPIC -I/tmp/pip-build-env-4ptm5umq/overlay/lib/python3.12/site-packages/numpy/core/include -I./common -I/usr/include/python3.12 -c ./common/maskApi.c -o build/temp.linux-x86_64-cpython-312/./common/maskApi.o -Wno-cpp -Wno-unused-function -std=c99
      ./common/maskApi.c:8:10: fatal error: math.h: No such file or directory
          8 | #include <math.h>
            |          ^~~~~~~~
      compilation terminated.
      error: command '/usr/bin/x86_64-pc-linux-gnu-gcc' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pycocotools
Failed to build pycocotools
ERROR: Could not build wheels for pycocotools, which is required to install pyproject.toml-based projects

MthwRobinson added a commit to Unstructured-IO/unstructured-inference that referenced this issue May 22, 2024
…r extras (#350)

### Summary

First step in resolving
Unstructured-IO/unstructured#3051. Per [this
comment](Unstructured-IO/unstructured#3051 (comment)),
we were having troubling running `unstructured` in the Python 3.12
`wolfi-base` contain due to issues related to `pycocotools`, which is
only used for the legacy `detectron2` model from `layoutparser`. Since
we've replaced this with `detectron2onnx`, this PR removes the
`layoutparser` extra dependencies that caused issues with Python 3.12.

The `layoutparser` base dependency is still required because we use
layout objects from that library. It's likely we could remove these in a
future iteration.

Temporarily disabled the ingest tests, because they seem to have been
broken for the past six months. Last commit that they passed for was
[this
one](0f0c2be).
Opened #352 to reenable them.

### Testing

If CI passes we should be good to go.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
packaging Issues with building and installing `unstructured`
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant