Text detection in natural scenes using opencv

A simple implementation using C++ and python.

Structure

Your file of interests are detect.cpp, classify.py and test.py. The working is a little weird and the control jumps between detect.cpp to the test.py.
The entire process works on detect.cpp except the classification part which is in the classify.py and test.py files.
I have already trained an SVC classifier on training images picked up from ICDAR dataset which is stored in svc.pkl which can classify HOG (Histogram of Oriented Gradients) descriptors of regions to check whether they contain texts.
The opencv HOG is used with the parameters below

HOGDescriptor hog( Size(dim, dim), Size(4, 4), Size(2, 2), Size(2, 2), 2 );

Or you can write your own classifer and change the text.py so that it can properly classify the text regions we pass onto it.

You can send the detected text boxes to any text recognition library like Tesseract and can easily get the text in the boxes.

You need your own text dataset. Go to ICDAR website and download their latest training dataset. It comes with another text file which contains regions in the form of end points of the rectangle of where the text is present in the picture. I have included a sweet function getPoints which accepts a line of the text file containing points and returns a Point variable. The function is in the detect.cpp file
Now that you have the text regions, train the classifier on these regions using any feature detector. I used HOG.
These are the positive regions. For negative regions, you can take the same dataset and pick up random rectangles from the images and take them as negative samples. Make sure your data is now skewed.
Now that you have features (positive and negative), train any classifier on it and save it a pickle.
To extract positive and negative features, I have included two functions readFilesPostive() and readFilesNegative(). They accept location of images and trains a HOG classifier on them and saves all the features in two files positive.txt and negative.txt. You need to change them in accordance to your data.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
CMakeFiles		CMakeFiles
CMakeCache.txt		CMakeCache.txt
CMakeLists.txt		CMakeLists.txt
Makefile		Makefile
README.md		README.md
classify.py		classify.py
cmake_install.cmake		cmake_install.cmake
detect.cpp		detect.cpp
svc.pkl		svc.pkl
test.py		test.py