Skip to content

Optical Character Recognition for Scanned Documents

Notifications You must be signed in to change notification settings

Hawk453/OCR_FOR_PDFS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

OCR_FOR_PDFS

Optical Character Recognition for Scanned Documents

The program generates text from a scanned document in the form of a pdf, irrespective of the length of the document.
The code uses TesseractOCR to perform the task, and openCV to pre process the image which is generate from pdf2image module.

The accuracy of the OCR can be improved by:

  • Pre processing of the image using openCV can result in better accuracy.
  • Using a spell check after the extraction of the text can also improve the flow.

Releases

No releases published

Packages

No packages published

Languages