Skip to content

Text is extracted from scanned PDF document using OCR in python

Notifications You must be signed in to change notification settings

HassanSajjad229/OCR-on-scanned-PDF-PYTHON

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCR-on-scanned-PDF-PYTHON

Text is extracted from scanned PDF document using OCR in python.The pytesseract,opencv and pdf2image libraries are used. Following steps need to be followed to extract text 1# Convert the pdf file to the images. 2# Images are rotated at a designated angle so text extraction would be feasible. 3# Use width,height,top,right cordinates to crop the designated part of image need for extraction. 4# Peform OCR on the images to extract the text. 5# Save the extracted text to a file.

About

Text is extracted from scanned PDF document using OCR in python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages