Most of the time when developing an ocr app using tesseract and you’re getting low accuracy it’s hard to determine if the issue is the model/traineddata or the image pre-processing. Off course you can dump the pre-processed image to see if it’s correctly binarized but this take time if you want to compute an accuracy score on thousands of images. To make your life easier this repo contains a command line application for Windows to test the accuracy.

This app is very easy to use:

add your images in tesseractMICR/apps/images
run tesseractMICR/apps/tesseract_recognizer.bat
the predictions will be in tesseractMICR/apps/ocr.txt

This app will:

detect MICR E-13B lines from anywhere on the image
extract the lines, de-skew and de-slant them
binarize the lines
use Tesseract for recognition

You can edit tesseractMICR/apps/tesseract_recognizer.bat to change the path to the images or tessdata folders.

REM Usage: tesseract_recognizer.exe path_to_images_folder path_to_tessdata_folder
REM path_to_images_folder -> relative or absolute path to folder containing the images to process
REM path_to_tessdata_folder -> relative or absolute path to folder containing *.traineddata files
REM example: tesseract_recognizer.exe ./images ../tessdata_fast
REM another example: tesseract_recognizer.exe ./images ../tessdata_best

tesseract_recognizer.exe ./images ../tessdata_fast

The charset used in tesseractMICR/apps/ocr.txt is:

This application is GPGPU accelerated using OpenCL. Make sure to update your drivers.

The accuracy

This was developed as an internal R&D project and never went to production as we ended using Tensorflow.

Even as a PoC (Proof-Of-Concept) it's already more accurate than all commercial products we've tested: LEADTOLS, accusoft, recogniform and abbyy. The repo contains a command line application to compare the accuracy (see above).

You can check our state of the art implementation based on Tensorflow at https://www.doubango.org/webapps/micr/

Getting help

To get help please check our discussion group or twitter account

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
apps		apps
datasets		datasets
tessdata_best		tessdata_best
tessdata_fast		tessdata_fast
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
e13b_mapping.jpg		e13b_mapping.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

The dataset

The models

The recognizer app

The accuracy

Getting help

About

Releases

Packages

Languages

License

DoubangoTelecom/tesseractMICR

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

The dataset

The models

The recognizer app

The accuracy

Getting help

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages