Frequently Asked Good Questions FAQ

Frequently asked good questions on the openCV custom HOG training code

Questions regarding the code in the repository

Question: I have training images of size different than 64px * 128px and get an error like Error: Image 'neg/xxx.png' dimensions (320 x 240) do not match HOG window size (64 x 128)!
Answer: The default training image size is 64 x 128 px, as also used in the underlying paper by Dalal and Triggs. If you want to use other image dimensions, you have to adjust the HOG parameters such as training / detection window size (e.g. put hog.winSize = Size(320, 240); and other parameters somewhere after HOGDescriptor hog;). This is untested and has strong effects as winstride, blocksize, cellsize, etc. depend on the window size, which may cause the program to return unusable/bad results or even crash. If you want to get started quickly, I recommend testing your code by resizing /cropping your images to match 64x128px, you can easily automate this using Imagemagick / Irfanview or directly in openCV. Note that the objects you want to detect need to be roughly at the same location and have the same size in the images for detection to work properly, so that the training algorithm is able to abstract common underlying feature patterns.

Question: I get Debug Assertion Failure, Expression: _CrtIsValidHeapPointer(pUserData), what to do?
Answer: This seems to be caused by a change in your tool collection (such as compiler) when mixing libraries compiled with the old and new compiler (opencv lib, svm libs, stdlibs, etc.). Try doing a make clean on the project. If you compiled openCV from source before updating e.g. the compiler, try cleaning and recompiling openCV.

Question: How do I compile and run the code in another operating system than Linux?
Answer: The provided code should run in Windows with minor modifications such as adding dirent header (dirent.h) to "Microsoft Visual Studio 9.0\VC\include" folder if you develop using Visual Studio as I have been told. I however do not provide Windows code because I wrote this program in an educational scope, where Windows is not suited because it hides too much of the processes/functionality which I needed to comprehend thoroughly (especially the Kinect-related functionality).

Question: Can you provide the code required to use libSVM instead of SVMlight?
Answer: I will include the code in this repository when I had the time to refactor and test it. For now, just drop me an e-Mail and I will send you code and instructions to use at your own risk.

General questions on training

Question: How many training samples are required to get 'decent' detection results?
Answer: The number of training samples required depend. On one hand it depends on the complexity of the problem to train / the underlying data. The learning algorithm tries to abstract / generalize by determining an inherent scheme in the data. Therefore, the easier it is to find and distinguish such an underlying pattern, the less training samples are required to achieve decent results. On the other hand, it depends on the quality requirements (like precision-recall ratio, ROC, FP-TP ratio, etc.). Often learning algorithms quickly reach an acceptable recognition rate, but the more the results reach the actual teacher/given output (classes), the longer it takes to learn the last bit because the training algorithms learn/improve through training error.

Question: Where can I get some training sample images?
Answer: You can obtain the positive training set of the original HOG paper "Histograms of Oriented Gradients" by Dalal and Triggs (INRIA set, see http://pascal.inrialpes.fr/data/human/ ). I used randomly cropped areas of images without persons in it as negative training set.

Question: Why are Haar or Haar like features assumed to be better than HOG features for a face detection problem?
Answer: I think it is difficult to say which method is best in general, in my opinion it strongly depends on the requirements which methods suits best. Haar-like features with a Boosting classifier are a good choice to match patterns of combined bright and dark areas (like a face in grayscale mode, see images http://www.cognotics.com/opencv/servo_2007_series/part_2/sidebar.html, Figure 4). HOG features are sensitive to distributions of localized oriented gradients. Therefore HOG is well-suited to detect shapes. HOG may be also used for face detection, but I think it is currently primarily used for person detection.
1. I assume that Haar-like features may be able to better resemble the intrinsic face features (like eyes and nose, which seem to be prominent /rich in contrast in Haar feature space) due to the sensitivity of the Haar-like features.
- Additionally, since the Boosting cascade terminates the calculation early for a non-matching (face) candidate (see also http://www.cognotics.com/opencv/servo_2007_series/part_2/sidebar.html, Figure 3), it may save computing resources for many non-matching candidates in comparison to the linear HOG classifier.

Question: How does the use of SVM or a cascade classifier depend on the kind of features I use? In simple terms why Haar features with Cascade classifier and why HOG/ SIFT features with SVM?
Answer: You may change the underlying classifier/learning algorithm within given limits. I recently read a paper where SVMs were used as weak Boosting classifiers (Boosting SVMs), so it really depends on what you are trying to achieve and what the data to be classified looks like. E.g. if you have a 1 and a 0 to classify, a simple single http://en.wikipedia.org/wiki/Perceptron may seem like a suitable choice. The combination of a Boosting cascade classifier and Haar-like features has shown to be a feasible solution, one of the reasons being the point I made in the previous question under 2., so that face detection can be run on mobile phone/camera/embedded devices with decent speed. The choice of a classifier depends on the used features / data to classify: From Wikipedia (http://en.wikipedia.org/wiki/Statistical_classification): "Classifier performance depends greatly on the characteristics of the data to be classified. There is no single classifier that works best on all given problems (a phenomenon that may be explained by the no-free-lunch theorem). Various empirical tests have been performed to compare classifier performance and to find the characteristics of data that determine classifier performance. Determining a suitable classifier for a given problem is however still more an art than a science."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frequently Asked Good Questions FAQ

Frequently asked good questions on the openCV custom HOG training code

Questions regarding the code in the repository

General questions on training

Clone this wiki locally