OCR_FOR_PDFS

Optical Character Recognition for Scanned Documents

The program generates text from a scanned document in the form of a pdf, irrespective of the length of the document.
The code uses TesseractOCR to perform the task, and openCV to pre process the image which is generate from pdf2image module.

The accuracy of the OCR can be improved by:

Pre processing of the image using openCV can result in better accuracy.
Using a spell check after the extraction of the text can also improve the flow.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
OCR_FOR_PDFS.py		OCR_FOR_PDFS.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR_FOR_PDFS

Optical Character Recognition for Scanned Documents

About

Releases

Packages

Languages

Hawk453/OCR_FOR_PDFS

Folders and files

Latest commit

History

Repository files navigation

OCR_FOR_PDFS

Optical Character Recognition for Scanned Documents

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages