New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pdfminer can't parse characters outside the ASCII encoding #136
Comments
Sorry, I fail to understand the title, relative to the problem statement. I understand this as "pdfminer can't parse characters outside the ASCII encoding" Probably I am just misunderstanding ? |
It's actually more complicated than that. Lines producing the '?': Parsr/server/src/input/pdfminer/pdfminer.ts Lines 196 to 205 in 6586953
Related Issues: |
An interesting related patent: https://patents.google.com/patent/US20060288281 |
This is a general problem related to the extractors used, instead of being particularly Parsr's problem. |
Attached is a document in spanish that shows that pdfminer cant process latin characters like:
á, é, í, ó, ú, ñ, etc...
caixa-one-page-spanish.pdf
The text was updated successfully, but these errors were encountered: