Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tesseract.tessedit_char_whitelist not working with umlauten #24

Closed
HansHammel opened this issue May 31, 2015 · 1 comment
Closed

tesseract.tessedit_char_whitelist not working with umlauten #24

HansHammel opened this issue May 31, 2015 · 1 comment

Comments

@HansHammel
Copy link

German "Umlaute" (ö,ä,ü,ß,Ö,Ä,Ü) seem to be ignored by the tessedit_char_whitelist option.

@WolfgangFellger
Copy link
Contributor

That is pretty much our use case, and it's working here... Can you give a complete example?

Shots in the dark: Are you using the 'deu' language file? Apart from that, I could imagine a hiccup if your source is not UTF-8-encoded, please check that.

Edit: Just tried again with our application, setting tessedit_char_whitelist = 'ßÄÖÜ' does give me a nice set of gibberish containing those characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants