Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language settings for GUI Batch Tool Conversion are faulty #3869

Closed
mgutt opened this issue Dec 4, 2019 · 2 comments
Closed

Language settings for GUI Batch Tool Conversion are faulty #3869

mgutt opened this issue Dec 4, 2019 · 2 comments

Comments

@mgutt
Copy link

mgutt commented Dec 4, 2019

Its a huge timesaver to use the batch tool conversion to OCR forced subtitles, but there is no setting for the input language so it returns many spelling errors:
2019-12-04 14_00_32

I tried under "Fix common errors" to select "tr" as language, but it does not seem to work?!
2019-12-04 14_42_51

EDIT: Ok, "my" fault. I never used Turkish through the manual OCR method so the Turkish language and dictionary was not installed. Maybe the dropdown of "Fix common errors" should be displayed as follows to make it more clear for the user:

-Auto-
aa (not installed)
...
de
en
...
tr (not installed)

And in the overview there should be an additional column "Language":
File name | Size | Format | Status | Language

By that I can see which language "-Auto-" has detected (P.S. How does "-Auto-" work?)

And I think "Fix common errors" should be renamed to "Fix common OCR errors" to make it clear that this setting includes the language selection.

Maybe the used engine should be part of this setting, too. Because finally I ask myself if the batch tool uses Tesseract 4.1.0 or something else.

EDIT2: Hmm. It seems that the selected language under "Fix common errors" does not influence the used OCR language. I used "de" to batch convert a german subtitle, but an other language was used as it does not contain the German umlauts:

2019-12-04 14_31_14

EDIT3: Ok. I used the manual OCR tool and change everything back to German. Then successfully converted a german subtitle. Then I opened the batch tool, set the language under "Fix common errors" to "tr" and converted the german subtitle again. And it is still correct. This means the batch tool does not respect the language selected under "Fix common errors". Instead it uses the language that was last used by the manual OCR tool.

@niksedk
Copy link
Member

niksedk commented Dec 4, 2019

OCR in batch convert uses the last used OCR language... choosing language in batch ui is not supported atm

@xylographe
Copy link
Member

And in the overview there should be an additional column "Language":
File name | Size | Format | Status | Language

Agreed, this would be a useful enhancement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants