## Practice : Detector followed by Tracker

1. Input images from wiiplay.mp4 for level 52 (frame number:19400~20000).
2. Use <i>cv2.HOGDescriptor()</i> to <b>detect</b> pedestrian on the first frame. (frame number=19400)
3. Try to <b>track</b> the detected pedestrian on subsequent frames. (marked as <b>red</b> rectangle)
4. Insted of detection followed by tracking, try to detect pedestrian on each frames without tracking. (marked as <b>green</b> rectangle)
5. Observe the results and compare the difference between these two approaches. 
6. Show your output images.
7. Upload your Jupyter code file (*.ipynb)

In [1]:
import cv2
import numpy as np

### Video Capture and Pedestrian Detection in OpenCV

The process below outlines the steps for capturing video frames and applying pedestrian detection using OpenCV's Histogram of Oriented Gradients (HOG) descriptor and Support Vector Machine (SVM) classifier.

#### Reading the Video

Open the video file `WiiPlay.mp4` for processing.

$$
\text{cap} = \text{cv2.VideoCapture}('WiiPlay.mp4')
$$

#### Defining Start and End Frames

Specify the range of frames between which the processing will be performed.

$$
\text{start\_frame} = 19400 \newline
\text{end\_frame} = 20000
$$

#### Setting the Start Frame
Set the video's starting point to the frame number 'start_frame'

$$
\text{cap.set(cv2.CAP\_PROP\_POS\_FRAMES, start\_frame)}
$$

#### Initializing the HOG Descriptor
The Histogram of Oriented Gradients (HOG) is used to detect objects. The Support Vector Machine (SVM) classifier is set with the default people detector.

$$
\text{hog} = \text{cv2.HOGDescriptor}() \newline
\text{hog.setSVMDetector(cv2.HOGDescriptor\_getDefaultPeopleDetector())}
$$

#### Processing Each Frame:

The loop processes each frame one by one until the end frame is reached.
$$
\text{while True:}
$$

#### Reading the Current Frame:

$$
\text{ret}, \text{img} = \text{cap.read()}
$$

### Detecting Pedestrians

#### Applying HOG Descriptor:
The detectMultiScale method detects objects (pedestrians) in the current frame. It returns rectangles (rects) around detected objects and their corresponding weights.

$$
(\text{rects}, \text{weights}) = \text{hog.detectMultiScale}(\text{img}, \text{winStride}=(4, 4), \text{padding}=(8, 8), \text{scale}=1.05)
$$

#### Mathematical Explanation of HOG + SVM:
HOG extracts features from the image by computing the gradient orientations in localized parts of the image, while the SVM classifier is used to detect pedestrians based on those features

$$
\text{HOG}(I) = \sum_{x, y} \text{Gradient}(I_{x,y}) \newline

\text{SVM}(HOG(I)) = 
\begin{cases}
1 & \text{if pedestrian} \\
0 & \text{otherwise}
\end{cases}
$$

In [4]:
# Open the video file
cap = cv2.VideoCapture('WiiPlay.mp4')

# Define the start and end frame numbers
start_frame = 19400
end_frame = 20000

# Set the current frame to the start frame
current_frame_number = start_frame
cap.set(cv2.CAP_PROP_POS_FRAMES, start_frame)

# Initialize the HOG descriptor/person detector
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

while True:
    # Read the next frame from the video
    ret, img = cap.read()

    # Break the loop if no frame is returned or the end frame is reached
    if not ret or current_frame_number > end_frame:
        break

    # Detect pedestrians in the current frame
    (rects, weights) = hog.detectMultiScale(img, winStride=(4, 4), padding=(8, 8), scale=1.05)

    # Draw rectangles around detected pedestrians
    for (x, y, w, h) in rects:
        cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)

    # Prepare text showing the current frame number
    text = f'Current_Frame: {current_frame_number}'

    # Get the size of the text for positioning
    text_size = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 1, 2)[0]

    # Calculate text position: 10 pixels from the right edge, 30 pixels from the top edge
    text_x = img.shape[1] - text_size[0] - 10
    text_y = 30

    # Draw the text on the frame
    cv2.putText(img, text, (text_x, text_y), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)

    # Display the frame
    cv2.imshow('Frame', img)

    # Increment the frame counter
    current_frame_number += 1

    # Break the loop if 'q' is pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release the video capture object and close all OpenCV windows
cap.release()
cv2.destroyAllWindows()