EXTRACT

Introduction

EXTRACT is an optical character recognition engine for various operating systems which extracts texts from an image and converts them to plain text.

This model is a very primitive form of the original google tesseract which extracts texts from an image and converts them to plain text.

Modules/Library REQUIREMENTS:

os
numpy
PIL
sys
keras
cropyble
cv2
shutil

Features

a) Extracts text from input image

b) Works on lowercase,uppercase, number ans special characters.

c) Saves the output in output.txt to allow search.

How To Run the script:

NOTE1:- The trained model is not provided. So for the very first time run the script as it is. Once the model is trained: COMMENT OUT 'Train_Model()' then run the script for further use.

Run the script on your terminal: 'python3 tesseract.py': input image is:

output is (the predicted result is at the bottom):

The input image can be of any number of words example:

output is:

Contributors

Akarsh Malik
Angad Ripudaman Singh Bajwa

Future Work

To add characters of your own, make sure to add them in the train and test dataset
Change the output of the softmax layer in Train_Model function to the total number of trained characters.
Re-train the model
Test your image

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
DATA		DATA
sentences		sentences
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
tesseract.py		tesseract.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EXTRACT

Introduction

Modules/Library REQUIREMENTS:

Features

How To Run the script:

Contributors

Future Work

About

Releases

Packages

Languages

angadbajwa23/EXTRACT

Folders and files

Latest commit

History

Repository files navigation

EXTRACT

Introduction

Modules/Library REQUIREMENTS:

Features

How To Run the script:

Contributors

Future Work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages