Skip to content
This repository has been archived by the owner on Nov 25, 2022. It is now read-only.

It classify pdf documents (images) by looking at different sets of unique words.

License

Notifications You must be signed in to change notification settings

carlo98/Image_pdf_classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image_pdf_classifier

Use case: Classify 4 different types of document scanned and saved as pdf. Each file can be oriented at 0, 90°, 180° or 270° degrees.

This solution extracts an image from the pdf file and, through a CNN (model_1.h5), it rotates the newly created image; then it extracts the text using pytesseract and searches for some of the most important words of each file.

Words used to descriminate each file can be set in Classifier/docClassifier.py.

About

It classify pdf documents (images) by looking at different sets of unique words.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages