This notebook is a comprehensive guide that explains the Histogram of Oriented Gradients (HOG) feature descriptor, its underlying concepts, and its application in object (especially pedestrian) detection. It was created to help you understand how HOG works, what makes a good feature descriptor, and how you can implement it using OpenCV. In addition, the notebook includes detailed code examples along with parameter explanations and a practical demonstration of non-maximum suppression (NMS) to refine detection results.
- Introduction
- What is a Feature Descriptor?
- What Makes a Good Feature?
- HOG Feature Descriptor Explained
- Overview of HOG
- Step-by-Step Process:
- Preprocessing (Image Patch Selection & Resizing)
- Gradient Computation (Magnitude and Orientation)
- Histogram Calculation in 8×8 Cells
- Block Normalization (16×16 Blocks)
- Final Feature Vector Construction
- Code Example
- Pedestrian Detection using OpenCV's HOG Descriptor
- Explanation of key parameters:
- winStride: Controls the step size for the sliding window.
- padding: Adds extra context around each window.
- scale: Determines the scaling factor for the image pyramid.
- Non-Maximum Suppression (NMS) parameters for eliminating overlapping detections.
- Parameter Summary Table
- A combined table explaining each parameter’s function and its impact on detection accuracy and speed.
- How to Run
- Requirements and instructions to run the notebook and video examples.
The notebook begins with an explanation of what a feature descriptor is and why it is important in computer vision. It explains how raw images are transformed into compact feature vectors that capture structural and edge information—ideal for recognition and classification tasks such as pedestrian detection.
The notebook details the HOG method by breaking down the following steps:
-
Preprocessing:
Resizing and normalizing the image patch to a fixed size (e.g., 64×128) ensures consistency. -
Gradient Computation:
Using gradient filters (e.g., Sobel operators) to compute the magnitude and orientation for each pixel. This step highlights the image edges and contours. -
Histogram Calculation:
Dividing the image into small cells (typically 8×8 pixels) and computing a 9-bin histogram of gradient orientations for each cell. Each pixel votes for the corresponding orientation bin based on its gradient magnitude. -
Block Normalization:
Grouping cells into larger blocks (typically 2×2 cells, forming a 16×16 block) and normalizing the concatenated histograms to mitigate the effects of varying illumination. -
Feature Vector Construction:
Concatenating all normalized block histograms into a final high-dimensional feature vector (e.g., 3780 dimensions for a 64×128 patch).
The notebook includes a complete code example for pedestrian detection in video using OpenCV. Key highlights include:
-
HOG Descriptor Initialization:
hog = cv2.HOGDescriptor() hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
-
Detection with Parameters:
The code applieshog.detectMultiScalewith the following parameters:- winStride=(8,8):
Sets the sliding window step size, balancing between detection accuracy and processing speed. - padding=(4,4):
Adds extra pixels to each window to capture important contextual information. - scale=1.01:
Creates a finely detailed image pyramid for detecting pedestrians at various sizes.
- winStride=(8,8):
-
Non-Maximum Suppression:
Converts bounding boxes from (x, y, width, height) to (x1, y1, x2, y2) format and applies NMS to remove redundant overlapping boxes. -
Video Display and Exit Mechanism:
The processed video frames are displayed, and the loop terminates when the 'q' key is pressed.
The full code example is provided in the notebook for you to run and experiment with.
| Parameter | Function | Detection Impact |
|---|---|---|
| winStride | Step size for sliding window movement | Smaller → More windows (higher accuracy, slower speed); Larger → Fewer windows (faster, may miss objects) |
| padding | Extra pixels added around each window | Helps include context; too high may add noise |
| scale | Factor for image resizing in the pyramid | Smaller (close to 1) → More levels (improved sensitivity, slower); Larger → Fewer levels (faster, less sensitive) |
| bounding_boxes | List of detected box coordinates (converted to (x1,y1,x2,y2)) | Required input for NMS |
| probs (weights) | Confidence scores for each detection | Used by NMS to select the most confident detection |
| overlapThresh | Maximum allowed overlap for two boxes before suppression | Lower value → Aggressive suppression; higher value → more boxes kept |
- Requirements:
- Python 3.x
- OpenCV
- NumPy
- imutils
- Usage:
- Install dependencies using
pip install opencv-python numpy imutils. - Run the notebook in Jupyter Notebook or any compatible environment.
- For video detection examples, ensure you have a video file (e.g.,
walking.avi) in the working directory. - Press 'q' during video playback to exit.
- Install dependencies using
This notebook is intended to serve as both an educational resource and a practical tool for exploring HOG-based pedestrian detection. It provides detailed explanations, code examples, and parameter tuning insights to help you better understand and implement HOG in your own computer vision projects.
Feel free to modify the parameters and experiment with different settings to see how they affect both detection accuracy and speed.