Primary goal was to enable Blind people to analyze text content via audio outputs
We applied following methodology to achive this goal :
-
Detect and extract a page by applying four-point transform method
-
Identify multiple columns in the page by applying morphological transforms(Erosion + Dilation)
-
Crop the images and pass them sequentially into pytesseract ocr to get an appropriate text output.
-
Converting text to speech.
Example :
-Find a test image : half.jpg
-Find a sequence of cropped-output images in the folder "Crop Outputs"
It shows the accuracy of boundary detection and cropping accuracy+sequencing of the images
Demo video : https://www.youtube.com/watch?v=CcR5tph-pm4