tesseract-traineddata-script

The script to generate trained data from images

1 Install tesseract 3.05.01 ( released on June 1, 2017) pre-built binary package, https://github.com/tesseract-ocr/tesseract/wiki/Downloads

the download address is https://github.com/tesseract-ocr/tesseract/wiki Binaries for Windows 4.0.0: https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM#400-alpha-for-windows 3.5.1: https://github.com/parrot-office/tesseract/releases/tag/3.5.1 (3rd party - @parrot-office)

Install jTessBoxEditor-2.0 (Don't use jTessBoxEditorFX-2.0 which is still not stable at this monent.) Refer to : http://vietocr.sourceforge.net/training.html The download url: https://sourceforge.net/projects/vietocr/files/jTessBoxEditor/

jTessBoxEditor is a box editor and trainer for Tesseract OCR, providing editing of box data of both Tesseract 2.0x and 3.0x formats and full automation of Tesseract training. It can read images of common image formats, including multi-page TIFF. The program requires Java Runtime Environment 7 or later.

jTessBoxEditorFX is jTessBoxEditor rewritten in JavaFX to address the existing issue of rendering complex scripts in Java Swing. It requires JRE 8u40 or later. (jTessBoxEditorFX is not stable right now, dont' use it....)

Generate .box file based on .tif file before running generateTrainedData.bat, such as:

tesseract.exe test2.tif test2 batch.nochop makebox

Edit .box file using jTessBoxEditor just open .tif file in jTessBoxEditor...
copy generateTrainedData.bat to the same folder whihc contains images and .tif files
run the script with font name, such as:

generateTrainedData.bat test2

=================================================== Issue:

If jTessBoxEditor cann't recognize on the one of page of tif in jTessBoxEditor, then just open .box file, and copy some of lines to specify page, and rename the last number of line as page number, which can't be recognized.
Warning: No shape table file present: shapetable , when runing mftraining.exe

Possible solution: shapeclustering before mftraining.exe ?

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
README.md		README.md
generateTrainedData.bat		generateTrainedData.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tesseract-traineddata-script

About

Uh oh!

Releases

Packages

Languages

License

bright60/tesseract-traineddata-script

Folders and files

Latest commit

History

Repository files navigation

tesseract-traineddata-script

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages