Skip to content

Euno257/Image-Text-Summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image-Text-Summarizer

It mainly comes in use when the reader is reading Novels, Stories or anything which contains large set of paragraphs. The reader can take an image of the paragraph and input it to the model. And the model will result in a summary of that paragraph. Basically, the model makes reading easy and time saving for the readers.

Technologies Used

  • Optical Text Recognition
  • Natural Language preprocessing

Make a file ocr.py in the project folder.

Setup Virual Environment

$ virtualenv venv --python=python3.6

$ source venv/bin/activate

Install dependencies

  • Pillow pip3 install Pillow
  • Pytesseract pip3 install pytesseract
  • OpenCV pip3 install opencv-python
  • NLTK pip3 install nltk

Run ocr.py

python3 ocr.py --image images/story1.jpg > story.txt

story.txt

This file contains all the text from the image story1.jpg using OCR with pytesseract.

Make a new file summarize.py

summarize.py

In this file we used python's NLTK for removing stop_words, puctuations. And also word & sentence tokenizers from the NLTK library.

Run summarize.py

python3 summarize.py story.txt > summary.txt

summary.txt

This file contains the summary of the the text file story.txt.

About

Extracts text from any image and outputs summary.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages