flask_ocr_pytesseract

A python service that uses pytesseract wrapper for google tesseract to optically read images and return back detected values and fields, through the FLASK api. The Api accepts Base64 encoded images as input, processes the image with Teseract and returns back detected text. This project was meant to focus solely in detecting texts(fields and vlaues) in the govt. id cards like driving licence,pan card,aadhar card(samples for each in Images folder including the images,ther base64 encoding and screenshot for result/extracted text so obtained).

Requirements/libraries:

Testing the application:

clone or download the repo and move into the repo folder
make sure all the requirements/installations are met.(activating an environment is suggested )
run the app.py script: $ python3 app.py
now encode the image to base64 (can use Base64 image encoder) or can try sample base64 urls present in the txt files in the images folder of the repo.
Valid base64 input( "data:image/png;base64,"+ output from encoder (for PNG images)) will be processed and extracted text output will be displayed.(see sample image below)

Tesseract 4.x installation

If you're using ubuntu 16.04 or earlier version then by default tesseract 3 is installed using the commands in the official documentation. I followed the follwing link to install tesseract 4.x and corresponding version of leptonica(1.74 or higher). Link to adding languages to tesseract 4.x You can also install through ppa:

$ sudo add-apt-repository ppa:alex-p/tesseract-ocr 
$ sudo apt-get update
$ sudo apt install tesseract-ocr

Links to Resources followed :

Possible Improvements/Changes:

The tesseract model can be fine tuned and trained to specialise in a specific dataset/domain(like govt. IDs here)
Can detect more than one language present in the image with small changes in the code.
Further improve on the aesthetics and appearances of the template and it's interaction with API.(For example can work on segregating fields and values from text outputs for better display in our case)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Images		Images
templates		templates
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
app.py		app.py
ocr_core.py		ocr_core.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

flask_ocr_pytesseract

Requirements/libraries:

Testing the application:

Tesseract 4.x installation

Links to Resources followed :

Possible Improvements/Changes:

About

Releases

Packages

Languages

Rahul30032/flask_ocr_pytesseract

Folders and files

Latest commit

History

Repository files navigation

flask_ocr_pytesseract

Requirements/libraries:

Testing the application:

Tesseract 4.x installation

Links to Resources followed :

Possible Improvements/Changes:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages