## Practice : Detector followed by Tracker

1. Input images from wiiplay.mp4 for level 52 (frame number:19400~20000).
2. Use <i>cv2.HOGDescriptor()</i> to <b>detect</b> pedestrian on the first frame. (frame number=19400)
3. Try to <b>track</b> the detected pedestrian on subsequent frames. (marked as <b>red</b> rectangle)
4. Insted of detection followed by tracking, try to detect pedestrian on each frames without tracking. (marked as <b>green</b> rectangle)
5. Observe the results and compare the difference between these two approaches. 
6. Show your output images.
7. Upload your Jupyter code file (*.ipynb)

In [1]:
import cv2
import numpy as np

### Video Capture and Pedestrian Detection in OpenCV

The process below outlines the steps for capturing video frames and applying pedestrian detection using OpenCV's Histogram of Oriented Gradients (HOG) descriptor and Support Vector Machine (SVM) classifier.

#### Reading the Video

Open the video file `WiiPlay.mp4` for processing.

$$
\text{cap} = \text{cv2.VideoCapture}('WiiPlay.mp4')
$$

#### Defining Start and End Frames

Specify the range of frames between which the processing will be performed.

$$
\text{start\_frame} = 19400 \newline
\text{end\_frame} = 20000
$$

#### Setting the Start Frame
Set the video's starting point to the frame number 'start_frame'

$$
\text{cap.set(cv2.CAP\_PROP\_POS\_FRAMES, start\_frame)}
$$

#### Initializing the HOG Descriptor
The Histogram of Oriented Gradients (HOG) is used to detect objects. The Support Vector Machine (SVM) classifier is set with the default people detector.

$$
\text{hog} = \text{cv2.HOGDescriptor}() \newline
\text{hog.setSVMDetector(cv2.HOGDescriptor\_getDefaultPeopleDetector())}
$$

#### Processing Each Frame:

The loop processes each frame one by one until the end frame is reached.
$$
\text{while True:}
$$

#### Reading the Current Frame:

$$
\text{ret}, \text{img} = \text{cap.read()}
$$

### Detecting Pedestrians

#### Applying HOG Descriptor:
The detectMultiScale method detects objects (pedestrians) in the current frame. It returns rectangles (rects) around detected objects and their corresponding weights.

$$
(\text{rects}, \text{weights}) = \text{hog.detectMultiScale}(\text{img}, \text{winStride}=(4, 4), \text{padding}=(8, 8), \text{scale}=1.05)
$$

#### Mathematical Explanation of HOG + SVM:
HOG extracts features from the image by computing the gradient orientations in localized parts of the image, while the SVM classifier is used to detect pedestrians based on those features

$$
\text{HOG}(I) = \sum_{x, y} \text{Gradient}(I_{x,y}) \newline

\text{SVM}(HOG(I)) = 
\begin{cases}
1 & \text{if pedestrian} \\
0 & \text{otherwise}
\end{cases}
$$

In [2]:
cap = cv2.VideoCapture('WiiPlay.mp4')

start_frame_number = 19400
end_frame_number = 20000
cap.set(cv2.CAP_PROP_POS_FRAMES, start_frame_number)

hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

ret, frame = cap.read()
if not ret:
    print("Failed to read the video")
    exit()

boxes, weights = hog.detectMultiScale(frame, winStride=(8, 8))

for (x, y, w, h) in boxes:
    cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)

trackers = cv2.legacy.MultiTracker_create()
for box in boxes:
    tracker = cv2.legacy.TrackerMIL_create()
    trackers.add(tracker, frame, tuple(box))

frame_count = start_frame_number

while True:
    ret, frame = cap.read()
    if not ret or frame_count >= end_frame_number:
        break

    success, tracked_boxes = trackers.update(frame)
    if success:
       
        for box in tracked_boxes:
            x, y, w, h = [int(v) for v in box]
            cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 0, 255), 2)

    detected_boxes, weights = hog.detectMultiScale(frame, winStride=(8, 8))
    for (x, y, w, h) in detected_boxes:
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)

    text = f'Current_Frame: {frame_count}'
    text_size = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 1, 2)[0]
    text_x = frame.shape[1] - text_size[0] - 10
    text_y = 30
    cv2.putText(frame, text, (text_x, text_y), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
    
    cv2.imshow('Result', frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
    frame_count += 1

cap.release()
cv2.destroyAllWindows()

qt.qpa.plugin: Could not find the Qt platform plugin "wayland" in "/home/infor/miniconda3/envs/CV/lib/python3.9/site-packages/cv2/qt/plugins"
