form-extractor-ocr

Template based form extractor OCR. Extract handwritten text from bank form scanned image (any form scanned copy), using template matching, indivicual box extraction and OCR. Train your own character and alphabet OCR with pytesseract.

Input scanned form snippet-

Tagged output snippet-

Prediction output with red tagging shows a confidence score < 80% and blue tagging shows a confidence >= 80%.

NOTE: Current model was trained on 10 images of each handwritten characters from (a-z) & (A-Z) and 10 images of each handwritten numbers from (0-9), that's why the prediction accuracy is poor, but with more data the prediction can be improved. Use handwriten character or number images like data/crop_image/* images for training process.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
main		main
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

form-extractor-ocr

Input scanned form snippet-

Tagged output snippet-

About

Releases

Packages

Languages

HarendraKumarSingh/form-extractor-ocr

Folders and files

Latest commit

History

Repository files navigation

form-extractor-ocr

Input scanned form snippet-

Tagged output snippet-

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages