OCR-on-scanned-PDF-PYTHON

Text is extracted from scanned PDF document using OCR in python.The pytesseract,opencv and pdf2image libraries are used. Following steps need to be followed to extract text 1# Convert the pdf file to the images. 2# Images are rotated at a designated angle so text extraction would be feasible. 3# Use width,height,top,right cordinates to crop the designated part of image need for extraction. 4# Peform OCR on the images to extract the text. 5# Save the extracted text to a file.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
cropping.py		cropping.py
extract-text.txt		extract-text.txt
ocr.py		ocr.py
readpdf.py		readpdf.py
rotateimage.py		rotateimage.py
sample.pdf		sample.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR-on-scanned-PDF-PYTHON

About

Releases

Packages

Languages

HassanSajjad229/OCR-on-scanned-PDF-PYTHON

Folders and files

Latest commit

History

Repository files navigation

OCR-on-scanned-PDF-PYTHON

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages