Document-Scanner

Document scanning can be broken down into some distinct and simple steps.

Thresholding of image toremove noise in the image.
Appling edge detection(Canny).
Find the contours in the image that represent the document we want to scan.
Applying a perspective transform to obtain a top-down, 90-degree view of the image, just as if we scanned the document.
Finally Applying thresholding to obtain a nice, clean black and white feel to the piece of paper.

OCR

Extract text from the scanned image and displays it.

NLP

Next we can do anything with this text using NLP techniques.

STEPS

Clone this repository.
pip install > requirements.txt
python scanner.py --image <image_name>
python ocr.py --image <image_name>
Run Flask Server for practical application

NOTE:

For better results :

Image should be clicked on 'dark againstbright background' or 'bright against dark background'.
Image size should be (2500-3500)*(2000-3000)
There should be background around four edges of the image.

These points should be taken care for obtaining better result than the normal result. (NOTE)At last user can itself crop its document if not satisfied with the output.(yet to implement)

Results

# More Results at images and output folder.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
images		images
output		output
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ocr.py		ocr.py
requirements.txt		requirements.txt
scanner.py		scanner.py
server_web.py		server_web.py
transform.py		transform.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document-Scanner

OCR

NLP

STEPS

NOTE:

Results

About

Releases

Packages

Languages

License

Sparsh-Bansal/Document-Scanner

Folders and files

Latest commit

History

Repository files navigation

Document-Scanner

OCR

NLP

STEPS

NOTE:

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages