Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug/missing english_words.txt in PyPi release 0.13.1 #2855

Closed
rskew opened this issue Apr 5, 2024 · 3 comments · Fixed by #2857
Closed

bug/missing english_words.txt in PyPi release 0.13.1 #2855

rskew opened this issue Apr 5, 2024 · 3 comments · Fixed by #2857
Labels
bug Something isn't working

Comments

@rskew
Copy link

rskew commented Apr 5, 2024

Describe the bug
PyPi release 0.13.1 is missing the file unstructured/nlp/english-words.txt causing it to crash

To Reproduce

$ python -m venv venv
$ source venv/bin/activate
$ pip install unstructured==0.13.1
$ python -c "from unstructured.partition import text_type"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/rowan/demo/venv/lib/python3.11/site-packages/unstructured/nlp/english_words.py", line 12, in <module>
    with open(ENGLISH_WORDS_FILE) as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/home/rowan/demo/venv/lib/python3.11/site-packages/unstructured/nlp/english-words.txt'

Expected behavior
Doesn't crash

Screenshots

Environment Info

$ python scripts/collect_env.py
OS version:  Linux-6.1.64-x86_64-with-glibc2.38
Python version:  3.11.6
unstructured version:  0.13.1
unstructured-inference is not installed
pytesseract is not installed
Torch is not installed
Detectron2 is not installed

[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: pip install --upgrade pip

[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: pip install --upgrade pip
PaddleOCR is not installed
Libmagic version: file-5.45
magic file from /nix/store/4if234lnkhfkindsr9m62s4b4lh3iynf-file-5.45/share/misc/magic
LibreOffice version:  LibreOffice 7.5.7.1 50(Build:1)

Additional context

@rskew rskew added the bug Something isn't working label Apr 5, 2024
@meet5398
Copy link

meet5398 commented Apr 5, 2024

missing english_words.txt file from last 3 hrs
image

@sidfinster
Copy link

Thanks for the prompt fix, came here to file a bug as well!

@cragwolfe
Copy link
Contributor

Confirmed that

python -c "from unstructured.partition import text_type"

works now in 0.13.2. i was also able to repro the issue in 0.13.1. Thanks for reporting @rskew !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants