Skip to content
This repository has been archived by the owner on Feb 19, 2021. It is now read-only.

QNAP Docker Setting to detect only eng? #692

Open
tanderson1992 opened this issue Jul 26, 2020 · 0 comments
Open

QNAP Docker Setting to detect only eng? #692

tanderson1992 opened this issue Jul 26, 2020 · 0 comments

Comments

@tanderson1992
Copy link

tanderson1992 commented Jul 26, 2020

I setup paperless with the docker instructions. After install it worked fine on a few PDFs until I got to my vehicle registration. The document is entirely in English, but it seems to be detecting it as cat/ca which is not installed. Is there a setting to force the software to use only English, or just skip OCR instead of failing to process? I see this in the 0.3.3 changelog but don't see where to set the default language. "Timezone, items per page, and default language are now all configurable..." I have "PAPERLESS_OCR_LANGUAGES=" [set to blank] in the yml file used to install paperless.

Here's a snippet of the error. I can work on full logs if that would help, but I think the issue is it's somehow detecting another language and trying to ocr in that language even though I've specified not to ocr in any language other than English.

Processing sheet #1: /tmp/paperless/paperless-1kv2atz2/convert-0000.pnm -> /tmp/paperless/paperless-1kv2atz2/convert-0000.unpaper.pnm                                                                                                                                                                                                                                             
[pgm_pipe @ 0x558c05dd90c0] Stream #0: not enough frames to estimate rate; consider increasing probesize                                                                                                                                                                                                                                                                          
[image2 @ 0x558c05ddac40] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.                                                                                                                                                                                                                                                   
[image2 @ 0x558c05ddac40] Encoder did not produce proper pts, making some up.                                                                                                                                                                                                                                                                                                     
OCRing the document                                                                                                                                                                                                                                                                                                                                                               
Parsing for eng                                                                                                                                                                                                                                                                                                                                                                   
Parsing for cat                                                                                                                                                                                                                                                                                                                                                                   
Processing sheet #1: /tmp/paperless/paperless-1kv2atz2/convert-0000.unpaper.pnm -> /tmp/paperless/paperless-1kv2atz2/convert-0000.unpaper.unpaper.pnm                                                                                                                                                                                                                             
Processing sheet #1: /tmp/paperless/paperless-1kv2atz2/convert-0000.pnm -> /tmp/paperless/paperless-1kv2atz2/convert-0000.unpaper.pnm                                                                                                                                                                                                                                             
[pgm_pipe @ 0x55dd25c170c0] [pgm_pipe @ 0x55ccf30aa0c0] Stream #0: not enough frames to estimate rate; consider increasing probesize                                                                                                                                                                                                                                              
Stream #0: not enough frames to estimate rate; consider increasing probesize                                                                                                                                                                                                                                                                                                      
[image2 @ 0x55dd25c18c40] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.                                                                                                                                                                                                                                                   
[image2 @ 0x55dd25c18c40] Encoder did not produce proper pts, making some up.                                                                                                                                                                                                                                                                                                     
[image2 @ 0x55ccf30abc40] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.                                                                                                                                                                                                                                                   
[image2 @ 0x55ccf30abc40] Encoder did not produce proper pts, making some up.                                                                                                                                                                                                                                                                                                     
OCRing the document                                                                                                                                                                                                                                                                                                                                                               
Parsing for eng                                                                                                                                                                                                                                                                                                                                                                   
Parsing for cat                                                                                                                                                                                                                                                                                                                                                                   
PARSE FAILURE for /consume/Registration.pdf: The guessed language (ca) is not available in this instance of Tesseract.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant