Skip to content

Template based form extractor OCR. Train your own character and alphabet OCR.

Notifications You must be signed in to change notification settings

HarendraKumarSingh/form-extractor-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

form-extractor-ocr

Template based form extractor OCR. Extract handwritten text from bank form scanned image (any form scanned copy), using template matching, indivicual box extraction and OCR. Train your own character and alphabet OCR with pytesseract.

Input scanned form snippet-

Tagged output snippet-

Prediction output with red tagging shows a confidence score < 80% and blue tagging shows a confidence >= 80%.


NOTE: Current model was trained on 10 images of each handwritten characters from (a-z) & (A-Z) and 10 images of each handwritten numbers from (0-9), that's why the prediction accuracy is poor, but with more data the prediction can be improved. Use handwriten character or number images like data/crop_image/* images for training process.

About

Template based form extractor OCR. Train your own character and alphabet OCR.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published