# [Histogram of Oriented Gradients (HOG) Explained](https://learnopencv.com/histogram-of-oriented-gradients/)

## 1. What is a Feature Descriptor?
A feature descriptor is a representation of an image (or an image patch) that simplifies the image by extracting useful information while discarding extraneous details. Typically, it converts an image of size _width × height × 3_ (for color channels) into a feature vector of fixed length. For instance, in the case of the HOG descriptor used for pedestrian detection, the input image patch is 64×128×3 and the resulting feature vector has a length of 3780.

The key idea is that while the raw pixel values might be useful for viewing an image, they are not optimal for tasks such as image recognition or object detection. Instead, a feature vector produced by descriptors like HOG captures the structural and edge information of an image, which can then be fed into classifiers (e.g., SVM) for high performance in recognition tasks.

## 2. What Makes a Good Feature?
For classification tasks (such as detecting buttons on shirts or distinguishing them from other circular objects like coins), the features must be:
- **Useful:** They capture the essential characteristics of the object (e.g., the edges of a button, its shape, and structure).
- **Discriminative:** They are able to differentiate between similar objects (e.g., buttons versus car tires) despite potential similarities in shape.

In HOG, the “useful” features are the distribution of the directions of gradients (oriented gradients) within the image. Since gradients highlight areas with abrupt changes in intensity (edges and corners), they effectively encode information about object shape while ignoring uniform regions that contain less useful detail.

## 3. How Does the HOG Feature Descriptor Work?
The process of computing the HOG feature descriptor involves several key steps:

### Step 1: Preprocessing
![](https://cdn-ilclanb.nitrocdn.com/IekjQeaQhaYynZsBcscOhxvktwdZlYmf/assets/images/source/rev-f424eed/learnopencv.com/wp-content/uploads/2016/11/hog-preprocessing.jpg)
- **Image Patch Selection and Resizing:**
  The HOG descriptor for pedestrian detection is computed on a fixed-size patch (typically 64×128). The original image patch (which can have various sizes) is cropped and resized to maintain a fixed aspect ratio (e.g., 1:2) before further processing. This standardization ensures consistency in the descriptor.

- **Gamma Correction (Optional):**
  Although gamma correction may be applied to normalize the brightness and contrast, its impact is often minor and is sometimes omitted.

### Step 2: Calculate Gradient Images
![](https://cdn-ilclanb.nitrocdn.com/IekjQeaQhaYynZsBcscOhxvktwdZlYmf/assets/images/source/rev-f424eed/learnopencv.com/wp-content/uploads/2016/11/gradients.png)
- **Gradient Computation:**
  The image gradients are calculated in both horizontal (x) and vertical (y) directions. This can be done using simple gradient filters or operators like Sobel.
  - **Magnitude and Orientation:**
    For each pixel, the gradient magnitude is computed to measure the strength of the edge, and the gradient orientation (angle) is computed to determine the edge direction.
  These gradients help highlight the edges and contours in the image, which are critical for detecting object shapes.

### Step 3: Compute Histograms in 8×8 Cells
![](https://cdn-ilclanb.nitrocdn.com/IekjQeaQhaYynZsBcscOhxvktwdZlYmf/assets/images/source/rev-f424eed/learnopencv.com/wp-content/uploads/2016/11/hog-cells.png)
- **Dividing the Image:**
  The image is divided into small cells, typically 8×8 pixels each.
- **Building the Histogram:**
  For each cell, a histogram of gradient orientations is computed. This histogram typically has 9 bins covering angles from 0° to 180° (unsigned gradients).
  - Each pixel contributes to a bin based on its gradient angle, and the contribution is weighted by its gradient magnitude.
  - For example, if a pixel has a gradient angle of 80° and a magnitude of 2, it adds a vote of 2 to the corresponding bin.
  - If a pixel's angle falls between two bins, its vote may be split proportionally between them.
![](https://cdn-ilclanb.nitrocdn.com/IekjQeaQhaYynZsBcscOhxvktwdZlYmf/assets/images/source/rev-f424eed/learnopencv.com/wp-content/uploads/2016/12/hog-cell-gradients.png)
### Step 4: Block Normalization (16×16 Blocks)
![](https://learnopencv.com/wp-content/uploads/2016/12/hog-16x16-block-normalization.gif)
- **Why Normalize?**
  Gradient magnitudes can vary widely due to changes in illumination. Normalizing the histograms makes the descriptor more robust against lighting variations.
- **Grouping Cells into Blocks:**
  Cells are grouped into larger blocks (typically 2×2 cells, resulting in a 16×16 block).
- **Normalization Process:**
  The concatenated histogram from the block (which forms a feature vector, for instance, a 36-dimensional vector if there are 4 cells each with a 9-bin histogram) is normalized using techniques like L2 normalization. This process ensures that the descriptor is independent of overall brightness.

### Step 5: Construct the Final HOG Feature Vector
- **Concatenation:**
  The normalized vectors from all the blocks are concatenated to form a single, high-dimensional feature vector.
  For the pedestrian detection example, the final feature vector is 3780-dimensional.
- **Result:**
  This feature vector effectively represents the local shape and structure information in the image patch, making it highly suitable for classification tasks.

## 4. Summary
- **HOG as a Feature Descriptor:**
  HOG extracts the distribution of gradient orientations from an image patch, capturing the essential edge and shape information while discarding less useful data.
- **Key Steps Involved:**
  - **Preprocessing:** Resize the image patch to a fixed size while maintaining a uniform aspect ratio.
  - **Gradient Computation:** Calculate the magnitude and direction of gradients to highlight edges.
  - **Histogram Calculation:** Divide the image into 8×8 cells and compute a 9-bin histogram of gradients for each cell.
  - **Block Normalization:** Group cells into 16×16 blocks, concatenate their histograms, and normalize the block vectors to mitigate the effects of varying illumination.
  - **Feature Vector Construction:** Concatenate all normalized block vectors into a single feature vector (e.g., 3780 dimensions for a 64×128 patch).
- **Utility:**
  The resulting HOG feature vector is robust and discriminative, making it effective for tasks like object detection (e.g., pedestrian detection) and image recognition when used with classifiers such as SVM.

This detailed process shows how HOG transforms an image patch into a compact and powerful feature descriptor that is well-suited for various computer vision applications.



In [3]:
import cv2
import numpy as np
from imutils.object_detection import non_max_suppression # Handle overlapping

In [45]:
cap = cv2.VideoCapture(r"C:\Users\alaas\OneDrive\Desktop\practical\Notebooks\Computer Vision\Next-Gen Computer Vision\images\walking.avi")
if not cap.isOpened():
    print("Error: Could not open video file")
    exit()

while True:
    ret, frame = cap.read()
    if not ret:
        print("End of video or error reading frame")
        break

    hog = cv2.HOGDescriptor()

    # Loads a pre-trained SVM detector designed for people detection.
    hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
    (bounding_boxes,weights)=hog.detectMultiScale(frame,winStride=(8,8),padding=(4,4),scale=1.01)

    # Get rid of overlapping bounding boxes You can tweak the overlapThresh value for better results
    # convert the bounding box format from (x, y, width, height) to (x1, y1, x2, y2), which is required by Non-Maximum Suppression (NMS).
    bounding_boxes = np.array([[x, y, x + w, y + h] for (x, y, w, h) in bounding_boxes])
    selection = non_max_suppression(bounding_boxes,  probs=None, overlapThresh=0.8)

    # draw the final bounding boxes
    for (x1, y1, x2, y2) in selection:
        cv2.rectangle(frame,
                     (x1, y1),
                     (x2, y2),
                     (0, 255, 0),
                      4)

    cv2.imshow("Video", frame)

    if cv2.waitKey(30) & 0xFF == ord('q'):
        break  # Exit loop when 'q' is pressed

cap.release()
cv2.destroyAllWindows()


---

## HOG.detectMultiScale Parameters

### winStride
- **What it does:**
  Sets the step size (in pixels) for moving the fixed-size sliding window over the image.
- **Effect on Detection:**
  - **Accuracy:**
    Smaller strides (e.g., (4,4) or (8,8)) mean more overlapping windows, which increases the chance of capturing pedestrians (especially smaller or partially visible ones).
  - **Speed:**
    A larger stride (e.g., (16,16)) evaluates fewer windows, speeding up detection but risking missed detections.

### padding
- **What it does:**
  Adds extra pixels (in both x and y directions) around the sliding window.
- **Effect on Detection:**
  - **Accuracy:**
    Extra padding helps capture additional contextual information; for example, if a pedestrian is near the edge of a window, the padded area can help include crucial body parts.
  - **Caveat:**
    Too much padding might bring in irrelevant background information that can confuse the classifier.

### scale
- **What it does:**
  Specifies the factor by which the image is resized at each layer of the image pyramid.
- **Effect on Detection:**
  - **Accuracy:**
    A small scale factor (close to 1.0, e.g., 1.01) creates many pyramid levels so that even slight differences in object size are captured, which improves sensitivity.
  - **Speed:**
    More pyramid levels increase the computational load. A larger scale factor (e.g., 1.05 or 1.1) reduces the number of scales—and thus speeds up detection—but may miss objects that vary subtly in size.

---

## Non-Maximum Suppression (NMS) Parameters

After running HOG.detectMultiScale, you typically obtain multiple overlapping bounding boxes. NMS is used to retain the most confident ones.

### bounding_boxes
- **What it does:**
  An array (or list) of bounding boxes in the format (x1, y1, x2, y2) that represents the detected regions.
- **Note:**
  Often, you need to convert from (x, y, width, height) to (x1, y1, x2, y2) before applying NMS.

### probs (or weights)
- **What it does:**
  A list of confidence scores for each bounding box detection.
- **Usage:**
  When provided, NMS uses these scores to decide which overlapping box to keep.
- **In our example:**
  If you don’t have separate scores, you can pass `None` so that NMS treats all boxes equally.

### overlapThresh
- **What it does:**
  Defines the threshold for how much two bounding boxes can overlap before one is suppressed.
- **Effect on Detection:**
  - **Accuracy:**
    A lower threshold (e.g., 0.3–0.45) means boxes that overlap more than this percentage are merged or suppressed, reducing false positives from multiple detections of the same person.
  - **Tuning:**
    Adjusting this value helps balance between removing redundant boxes and accidentally suppressing true, nearby detections.

---

## Summary Table

| **Parameter**       | **Function**                                                  | **Detection Impact**                                                       |
|---------------------|---------------------------------------------------------------|----------------------------------------------------------------------------|
| **winStride**       | Step size for sliding window movement                         | Smaller → More windows (higher accuracy, slower speed); Larger → Fewer windows (faster, may miss objects)  |
| **padding**         | Extra pixels added around each window                         | Helps include context; too high may add noise                             |
| **scale**           | Factor for image resizing in the pyramid                      | Smaller (close to 1) → More levels (improved sensitivity, slower); Larger → Fewer levels (faster, less sensitive) |
| **bounding_boxes**  | List of detected box coordinates (converted to (x1,y1,x2,y2))   | Required input for NMS                                                    |
| **probs (weights)** | Confidence scores for each detection                           | Used by NMS to select the most confident detection                         |
| **overlapThresh**   | Maximum allowed overlap for two boxes before suppression       | Lower value → Aggressive suppression; higher value → more boxes kept        |
