-
-
Notifications
You must be signed in to change notification settings - Fork 279
Gujarati, Hindi and Sanskrit Language OCR not working #583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for reporting the issue! |
In order to make this work, I need to include Could you please provide original writing of the language name for
|
@ciur
|
PR for adding above mentioned languages. Change will be available in 3.0.3 release Note that you will need to build your image as before. However, when you will start papermerge don't forget to add PAPERMERGE__OCR__DEFAULT_LANGUAGE variable so that when you import docs they will be OCRed in "default OCR" language. In ticket's screenshot you've uploaded you can see that document was OCRed with OCR language being set "German" (deu code corresponds to German language). That's why those strange characters. |
Here is screenshot with working app (as mentioned above will be part of 3.0.3): |
Description of Issue
After building Papermerge with
Gujarati
,Hindi
andSanskrit
Language support, when you upload and run OCR on files, it churns out OCR text which is not correct. I think the tesseract-ocr is consistent with the text output it gives for the file, but it seems like papermerge does not have fonts or Character Sets to display the translations in the OCR text language.Build Details
Dockerfile
to add tesseract-ocr to papermergeInfo:
The text was updated successfully, but these errors were encountered: