Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current build version wont ingest PDF's #616

Open
clearsitedesigns opened this issue Oct 27, 2023 · 3 comments
Open

Current build version wont ingest PDF's #616

clearsitedesigns opened this issue Oct 27, 2023 · 3 comments

Comments

@clearsitedesigns
Copy link

I'm not sure what changed here, but I've tried many PDFs, and they will not ingest. The system hangs when ingesting. This happens on both PC and mac. It even depends on the demo orca and doesn't ingest. I've converted the PDF to raw text, and it will be ingested.
Did something change in the document loader for PDFs?

@Grunt-prog
Copy link

hey hi @clearsitedesigns do you have the downgrade version of localgpt

@clearsitedesigns
Copy link
Author

Hi I did not have the downgraded version. I'm using a fairly recent version, I can see that there was an extension issue, looks like that was the problem. That should be easy enough to modify the code.

@M56413893478
Copy link

I have the previous version and added a extra line in the contstraints.py file to accommodate for the uppercase extension and that worked fine.

In the new version I am not able to ingest the PDF files due to references to pdf2image and pytesseract modules.

E:\localGPT/SOURCE_DOCUMENTS\7301TopangaLeaseAgreement-TSP.pdf loading error:
No module named 'pdf2image'
E:\localGPT/SOURCE_DOCUMENTS\7301TopangaLeaseAgreement-TSP.pdf loaded.
E:\localGPT/SOURCE_DOCUMENTS\7301TopangaLeaseAgreement-TSP.pdf loading error:
No module named 'pytesseract'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants