Skip to content

Gujarati, Hindi and Sanskrit Language OCR not working #583

@vikithakar

Description

@vikithakar

Screenshot from 2024-01-22 18-03-00

Description of Issue

After building Papermerge with Gujarati, Hindi and Sanskrit Language support, when you upload and run OCR on files, it churns out OCR text which is not correct. I think the tesseract-ocr is consistent with the text output it gives for the file, but it seems like papermerge does not have fonts or Character Sets to display the translations in the OCR text language.

Build Details

Dockerfile to add tesseract-ocr to papermerge

FROM papermerge/papermerge:3.0.2
RUN apt install tesseract-ocr-hin tesseract-ocr-guj tesseract-ocr-san -y

Info:

  • Papermerge Version 3.0.2

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions