Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tesseract osd retrain for other undected script which is available in script Dir #18

Open
omesh-sharma opened this issue Oct 30, 2020 · 0 comments

Comments

@omesh-sharma
Copy link

Hey


i am Using Tesseract OCR for the text extraction form the image :



I need your valuable suggestion for the below mentioned points.


  • How can i Retrain osd.traindata file for adding Ethiopic and other scripts , because current osd.traindata file unable to detect few scripts name eg:(ethiopic , gujarati, gurmukhi) but script files for them are available in script directory.


  • which is more accurate for text extraction [LANGUAGE TRAIN DATA FILES] or [SCRIPT TRAIN DATA FILES]


  • Does it make any difference to use the script for text extraction instead of language.traindata in term of text extraction accuracy.


Please Share your valuable comments and suggestions for above mentioned list as per your experience with tesseract.
It'll be very helpful for my final year project.


Contact: sharmaomesh0@gmail.com .

Thanks and regards
Omesh sharma

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant