Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additionally installed OCR language is rejected by web UI backend #571

Closed
lehnerpat opened this issue Dec 31, 2023 · 3 comments
Closed

Additionally installed OCR language is rejected by web UI backend #571

lehnerpat opened this issue Dec 31, 2023 · 3 comments
Assignees
Labels
bug Something isn't working confirmed medium

Comments

@lehnerpat
Copy link

Description
After installing an additional OCR language (for example, Japanese) as described in the docs, the additional language can be used in OCR by setting it as the default, but it cannot be used from the web UI because the backend rejects it as an invalid value.

Expected
Additionally installed languages should be usable from web UI, just like the default languages.

Actual
The additional language shows up in the language selection dropdown for running OCR:
CleanShot 2023-12-31 at 17 12 25@2x

But when you click "Start", the backend responds with a 422 error saying the additional language is not an allowed value for the enum.

Additionally, the UI completely ignores this error and doesn't show any error message :(

Full error payload:

{
    "detail": [
        {
            "type": "enum",
            "loc": [
                "body",
                "lang"
            ],
            "msg": "Input should be 'deu','fra','eng','ita','spa','por' or 'ron'",
            "input": "jpn",
            "ctx": {
                "expected": "'deu','fra','eng','ita','spa','por' or 'ron'"
            }
        }
    ]
}

Browser console screenshot:
CleanShot 2023-12-31 at 17 12 41@2x

Info:

  • OS: macOS Sonoma 14.1.2 (23B92), Architecture: Intel (x86_64)
  • Browser: Safari 17.1.2 (19616.2.9.11.12)
  • Database: SQLite
  • Papermerge Version: 3.0

More info about setup:

  • Using custom docker image with Japanese language package for tesseract installed, following instructions: https://docs.papermerge.io/3.0/setup/add-ocr-langs/

    • Dockerfile:

      FROM papermerge/papermerge:3.0
      
      # add Japanese OCR language
      RUN apt install tesseract-ocr-jpn
    • Built with: docker build -t mypaper:3.0 -f Dockerfile .

  • Using Docker Compose, following instructions: https://docs.papermerge.io/3.0/setup/docker-compose/

    • Changed image to use my custom one (mypaper:3.0)
    • Changed username and password
    • Set additional env var PAPERMERGE__OCR__DEFAULT_LANGUAGE: jpn
@lehnerpat lehnerpat added the bug Something isn't working label Dec 31, 2023
@ciur
Copy link
Owner

ciur commented Dec 31, 2023

Thank you for well structured bug report!

The issue happens because currently the language codes are hardcoded:

  1. in backaned
  2. in UI
  3. and here

The fix would be to, well, just extend current set of hardcoded values with another batch of languages (incl. Japanese).

@ciur
Copy link
Owner

ciur commented Jan 12, 2024

PR#300 to include extra language codes (incl. Japanese)

Pull request was merged and it will available as part of Papermerge 3.0.1 release.

@ciur
Copy link
Owner

ciur commented Jan 25, 2024

Fixed in 3.0.2

@ciur ciur closed this as completed Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working confirmed medium
Projects
None yet
Development

No branches or pull requests

2 participants