Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common.document_parser fails to parse PDF, object has no attribute 'pageCount' #295

Open
neofob opened this issue Nov 7, 2023 · 0 comments

Comments

@neofob
Copy link

neofob commented Nov 7, 2023

Summary: The current requirement file dev_tools/requirements/parse-requirements.txt fails to be installed by pip for the environment to run common.document_parser. See issue #294. By install package by package until common.document_parser can run, I have the list of pip packages to be installed.

See the attached requirements.txt for a tentative requirement file for pip packages.

Yet, it fails to parse PDF documents with error object has no attribute 'pageCount'.

2023-11-07 17:05:59,186 - [INFO] - Finished Processing: <ForkProcess name='ForkPoolWorker-1' parent=952070 started daemon> - Filename: AFI 36-129_USAFE-AFAFRICASUP.pdf
2023-11-07 17:05:59,186 - [INFO] - Processing: <ForkProcess name='ForkPoolWorker-1' parent=952070 started daemon> - Filename: STP 10-92M15-SM-TG.pdf
running policy_analyics.parse on /mnt/extra/gamechanger-download-smallset/STP 10-92M15-SM-TG.pdf
ERROR in policy_analytics.parse: cannot open broken document
2023-11-07 17:05:59,187 - [INFO] - Finished Processing: <ForkProcess name='ForkPoolWorker-1' parent=952070 started daemon> - Filename: STP 10-92M15-SM-TG.pdf
Current Time = 17:05:59
2023-11-07 17:05:59,188 - [INFO] - Documents parsed (or attempted): 6

real	0m9.808s
user	0m9.351s
sys	0m2.399s

References:

CC: @drospond @amaruca @amaruca141

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant