Language settings for GUI Batch Tool Conversion are faulty #3869

mgutt · 2019-12-04T13:40:21Z

Its a huge timesaver to use the batch tool conversion to OCR forced subtitles, but there is no setting for the input language so it returns many spelling errors:

I tried under "Fix common errors" to select "tr" as language, but it does not seem to work?!

EDIT: Ok, "my" fault. I never used Turkish through the manual OCR method so the Turkish language and dictionary was not installed. Maybe the dropdown of "Fix common errors" should be displayed as follows to make it more clear for the user:

-Auto-
aa (not installed)
...
de
en
...
tr (not installed)

And in the overview there should be an additional column "Language":
File name | Size | Format | Status | Language

By that I can see which language "-Auto-" has detected (P.S. How does "-Auto-" work?)

And I think "Fix common errors" should be renamed to "Fix common OCR errors" to make it clear that this setting includes the language selection.

Maybe the used engine should be part of this setting, too. Because finally I ask myself if the batch tool uses Tesseract 4.1.0 or something else.

EDIT2: Hmm. It seems that the selected language under "Fix common errors" does not influence the used OCR language. I used "de" to batch convert a german subtitle, but an other language was used as it does not contain the German umlauts:

EDIT3: Ok. I used the manual OCR tool and change everything back to German. Then successfully converted a german subtitle. Then I opened the batch tool, set the language under "Fix common errors" to "tr" and converted the german subtitle again. And it is still correct. This means the batch tool does not respect the language selected under "Fix common errors". Instead it uses the language that was last used by the manual OCR tool.

The text was updated successfully, but these errors were encountered:

niksedk · 2019-12-04T15:30:47Z

OCR in batch convert uses the last used OCR language... choosing language in batch ui is not supported atm

xylographe · 2019-12-04T18:02:20Z

And in the overview there should be an additional column "Language":
File name | Size | Format | Status | Language

Agreed, this would be a useful enhancement.

niksedk mentioned this issue Apr 8, 2020

Specify Language & Tesseract Version for convert via CLI #4096

Closed

niksedk closed this as completed Aug 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language settings for GUI Batch Tool Conversion are faulty #3869

Language settings for GUI Batch Tool Conversion are faulty #3869

mgutt commented Dec 4, 2019 •

edited

Loading

niksedk commented Dec 4, 2019

xylographe commented Dec 4, 2019

Language settings for GUI Batch Tool Conversion are faulty #3869

Language settings for GUI Batch Tool Conversion are faulty #3869

Comments

mgutt commented Dec 4, 2019 • edited Loading

niksedk commented Dec 4, 2019

xylographe commented Dec 4, 2019

mgutt commented Dec 4, 2019 •

edited

Loading