## Game_1 : "find_this_mii"

### Mathematical Explanation

1. **Read Frame and Select ROI (Region of Interest)**:
   $$
   \text{frame} = \text{cap.read}(\text{frame\_number})
   $$
   $$
   \text{ROI} = \text{selectROI}(\text{frame})
   $$
   Here, the user selects a rectangular region $ROI$ in the frame. Let's denote this region by its coordinates $(x, y, w, h)$, where $x$ and $y$ are the top-left coordinates, and $w$ and $h$ are the width and height of the region.

   The template image $T$ is then:
   $$
   T = \text{frame}[y:y+h, x:x+w]
   $$


2. **Frame Processing Loop**:
   For each frame $F_{\text{i}}$ in the video (where $i$ is the frame index), the loop processes until frame number 5000:
   $$
   F_i = \text{cap.read}(i)
   $$
   If \(i > 5000\), exit the loop.

3. **Apply Gaussian Blur**:
   $$
   F_i^{\text{blur}} = \text{GaussianBlur}(F_i, (5, 5), 0)
   $$
   This step applies a Gaussian filter to the frame to reduce noise.

4. **Template Matching**:
   $$
   \text{res} = \text{matchTemplate}(F_i^{\text{blur}}, T, \text{TM\_CCOEFF\_NORMED})
   $$
   The `matchTemplate` function computes the normalized cross-correlation between the template $T$ and the current frame $F_i^blur$. The result is a matrix $res$ where each element represents the correlation coefficient at that point.
   
   $$
   (\min_{\text{val}}, \max_{\text{val}}, \min_{\text{loc}}, \max_{\text{loc}}) = \text{minMaxLoc}(\text{res})
   $$
   
   This function finds the minimum and maximum values and their locations in the result matrix. We are interested in $loc$, the location of the highest correlation.

   Let:
   $$
   \text{top\_left} = \max_{\text{loc}}
   $$
   $$
   \text{bottom\_right} = (\text{top\_left}[0] + w, \text{top\_left}[1] + h)
   $$

In summary, the mathematical operations involve:
- Extracting a template image from a specific region in a frame.
- Applying Gaussian blur to reduce noise.
- Using normalized cross-correlation to find the best match of the template in subsequent frames.
- Identifying the location of the best match and drawing a rectangle around it.

### Requirements

1A. Input images from video file WiiPlay.mp4 with level 15 (frame number between 4820 and 5000).<br> \
1B. (5pts) Acquire a <b>face template</b> from the first frame (frame number = 4820).<br>\
1C. (10pts) Try to detect the face the same as the template on subsequent frames, draw a <b>red</b> rectangle around the detected face, and show the output images in the <b>"find_this_mii"</b> window.<br>

In [1]:
#game_1 : "find_this_mii"

import cv2
import numpy as np

# Load the video
cap = cv2.VideoCapture('WiiPlay.mp4')

# Set the current video frame to 4820
cap.set(cv2.CAP_PROP_POS_FRAMES, 4820)

# Capture a single frame from the video
ret, frame = cap.read()

# Select a region of interest (ROI) manually from the captured frame for template matching
r = cv2.selectROI(frame)
template = frame[int(r[1]):int(r[1]+r[3]), int(r[0]):int(r[0]+r[2])]

while(cap.isOpened()):
    ret, frame = cap.read()

    # Break the loop if frame cannot be read or the current frame exceeds 5000
    if not ret or cap.get(cv2.CAP_PROP_POS_FRAMES) > 5000:
        break

    # Apply Gaussian Blur to the frame to reduce noise for better template matching
    frame = cv2.GaussianBlur(frame, (5, 5), 0)

    # Perform template matching to find the template in the frame
    res = cv2.matchTemplate(frame, template, cv2.TM_CCOEFF_NORMED)

    # Find the location of the template in the frame
    min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)

    top_left = max_loc
    h, w, _ = template.shape
    bottom_right = (top_left[0] + w, top_left[1] + h)

    # Draw a rectangle around the template in the frame
    cv2.rectangle(frame, top_left, bottom_right, (0, 0, 255), 2)

    # Display the frame with the rectangle around the template
    cv2.imshow('find_this_mii', frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Select a ROI and then press SPACE or ENTER button!
Cancel the selection process by pressing c button!


- [Demo Video of game_1 : "find_this_mii](https://drive.google.com/file/d/1VIsyU3AEM1pPiEQ2tlJHoIHCoZ2YTVhm/view?usp=sharing)

## Game_2 : "find_two_look_alike"

### Steps:
1. **Preprocessing:**
   - Convert the frame to the HSV color space to detect skin colors.
   - Apply a binary mask to isolate skin-colored areas.

2. **Morphological Operations:**
   - Erode and dilate the mask to remove noise.
   - Apply Gaussian blur to smooth the mask.

3. **Contour Detection:**
   - Find contours in the mask.
   - Filter contours based on area and aspect ratio to identify potential faces.

4. **Face Similarity Detection:**
   - Compare each detected face with every other detected face using template matching.
   - Mark pairs of faces that are similar.


## Mathematical Explanation of Face Detection:

#### Convert to HSV:
- Frame $F$ is converted from BGR to HSV.
  $$
  F_{HSV} = \text{HSV}(F)
  $$

#### Binary Mask:
- Create a binary mask $M$ using threshold values for skin color.
  $$
  M = \begin{cases}
  1, & \text{if } \text{lower\_skin} \leq F_{HSV} \leq \text{upper\_skin} \\
  0, & \text{otherwise}
  \end{cases}
  $$

#### Morphological Operations:
- Erode $M$ with a kernel $K$:
  $$
  M' = \text{erode}(M, K)
  $$
- Dilate $M'$ with $K$:
  $$
  M'' = \text{dilate}(M', K)
  $$
- Apply Gaussian blur:
  $$
  M_{smooth} = \text{GaussianBlur}(M'', (3, 3), 0)
  $$

#### Contour Detection:
- Find contours $C$ in $M_{smooth} $:
  $$
  C = \text{contours}(M_{smooth})
  $$
- For each contour   $c \in C$:
  - Compute bounding rectangle $( x, y, w, h )$.
  - Compute area $A$ and aspect ratio $R$:
    $$
    A = w \times h
    $$
    $$
    R = \frac{w}{h}
    $$
  - Filter based on area and aspect ratio:
    $$
    \text{if } \text{min\_area} < A < \text{max\_area} \text{ and } \text{aspect\_ratio\_range}[0] < R < \text{aspect\_ratio\_range}[1] \text{, append to faces}
    $$

## Mathematical Explanation of Face Similarity Matching:

- For each pair of faces $f_i, f_j$:
  - Extract face regions $f_i$ and $f_j$ from frame.
  - Resize $f_j$ to match dimensions of $f_i$:
    $$
    F_{j,resized} = \text{resize}(F_j, (w_i, h_i))
    $$
  - Convert to grayscale:
    $$
    F_{i,gray} = \text{gray}(F_i)
    $$
    $$
    F_{j,gray} = \text{gray}(F_{j,resized})
    $$
  - Perform template matching:
    $$
    \text{result} = \text{matchTemplate}(F_{i,gray}, F_{j,gray}, \text{TM\_CCOEFF\_NORMED})
    $$
  - Find maximum correlation value $max_val$:
    $$
    \text{max\_val} = \max(\text{result})
    $$
  - If $max_val$ > 0.32, consider faces similar and highlight them.

### Final Output:
The video frame is displayed with rectangles around detected faces and similar-looking faces highlighted with a different color.

### Requirements

2A. Input images from video file WiiPlay.mp4 with level 8 (frame number between 2180 and 2380).<br>\
2B. (5pts) Detect <b>pedestrians</b> on each frame and draw a <b>green</b> rectangle around your detection.<br>\
2C. (5pts) Detect <b>faces</b> on each frame and draw a <b>blue</b> rectangle around your detection.<br>\
2D. (10pts) Try to find two faces look like each other, draw a <b>red</b> rectangle around each of the two faces, and show the output images in the <b>"find_two_look_alike"</b> window.<br><br>

In [10]:
#game_2 : "find_two_look_alike"

import cv2
import numpy as np

def detect_faces(frame, lower_skin, upper_skin, min_area, max_area, aspect_ratio_range):
    """
    Detects faces in the given frame based on skin color and filters them by area and aspect ratio.

    Args:
        frame (numpy.ndarray): The image frame in which to detect faces.
        lower_skin (numpy.ndarray): The lower bound of the HSV color range for skin detection.
        upper_skin (numpy.ndarray): The upper bound of the HSV color range for skin detection.
        min_area (int): The minimum area of a detected face.
        max_area (int): The maximum area of a detected face.
        aspect_ratio_range (tuple): The range of acceptable aspect ratios for detected faces.

    Returns:
        tuple: A tuple containing a list of detected faces and the processed frame.
    """

    # Convert the frame to the HSV color space
    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

    # Create a binary mask of the skin
    mask = cv2.inRange(hsv, lower_skin, upper_skin)
    
    # Create a structuring element for morphological operations
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (11, 11))

    # Erode and dilate the mask to remove noise
    mask = cv2.erode(mask, kernel, iterations=2)
    mask = cv2.dilate(mask, kernel, iterations=2)

    # Apply Gaussian blur to smooth the mask
    mask = cv2.GaussianBlur(mask, (3, 3), 0)
    
    # Find contours in the mask
    contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    faces = []
    for contour in contours:

        # Get the bounding rectangle for each contour
        x, y, w, h = cv2.boundingRect(contour)
        aspect_ratio = w / float(h)
        area = cv2.contourArea(contour)

        # Filter contours based on aspect ratio and area
        if aspect_ratio_range[0] < aspect_ratio < aspect_ratio_range[1] and min_area < area < max_area:
            faces.append((x, y, w, h))

            # Draw a rectangle around the detected face
            cv2.rectangle(frame, (x-10, y-10), (x+w+10, y+h+10), (255, 0, 0), 2)
    
    return faces, frame

def find_similar_faces(faces, frame):
    """
    Find similar-looking faces in the given frame using template matching.

    Args:
        faces (list): A list of detected faces as (x, y, w, h) tuples.
        frame (numpy.ndarray): The image frame in which to find similar faces.

    Returns:
        list: A list of pairs of similar faces as ((x1, y1, w1, h1), (x2, y2, w2, h2)) tuples.
    """
    similar_faces = []

    for i in range(len(faces)):
        (x1, y1, w1, h1) = faces[i]
        face1 = frame[y1:y1+h1, x1:x1+w1]

        for j in range(i + 1, len(faces)):
            (x2, y2, w2, h2) = faces[j]
            face2 = frame[y2:y2+h2, x2:x2+w2]

            # Resize face2 to match the size of face1
            face2_resized = cv2.resize(face2, (w1, h1))

            # Convert faces to grayscale
            face1_gray = cv2.cvtColor(face1, cv2.COLOR_BGR2GRAY)
            face2_gray = cv2.cvtColor(face2_resized, cv2.COLOR_BGR2GRAY)

            # Perform template matching
            result = cv2.matchTemplate(face1_gray, face2_gray, cv2.TM_CCOEFF_NORMED)
            _, max_val, _, _ = cv2.minMaxLoc(result)

            # Check if the faces are similar based on a threshold
            if max_val > 0.32:
                similar_faces.append(((x1, y1, w1, h1), (x2, y2, w2, h2)))

    return similar_faces

cap = cv2.VideoCapture('WiiPlay.mp4')

# Initialize HOG descriptor for people detection
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

cv2.startWindowThread()

# Set the range of frames to process
start_frame = 2180
end_frame = 2380

cap.set(cv2.CAP_PROP_POS_FRAMES, start_frame)

current_frame = start_frame

# Define HSV color range for skin detection
lower_skin = np.array([0, 48, 80], dtype='uint8')
upper_skin = np.array([20, 255, 255], dtype='uint8')
min_area = 300
max_area = 1000
aspect_ratio_range = (0.3, 3)

# Initialize the faces list
faces = []

while True:
    ret, frame = cap.read()

    # Break the loop if the frame is not read correctly or the end frame is reached
    if not ret or current_frame > end_frame:
        break
    
    # Convert the frame to grayscale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Detect people in the frame using HOG
    boxes, weights = hog.detectMultiScale(frame, winStride=(8, 8))
    boxes = np.array([[x, y, x + w, y + h] for (x, y, w, h) in boxes])

    for (xA, yA, xB, yB) in boxes:
        cv2.rectangle(frame, (xA, yA), (xB, yB), (0, 255, 0), 2)

    # Detect faces in the frame
    faces, frame = detect_faces(frame, lower_skin, upper_skin, min_area, max_area, aspect_ratio_range)

    # Find similar faces in the frame
    similar_faces = find_similar_faces(faces, frame)

    for(face1, face2)in similar_faces:
        (x1, y1, w1, h1) = face1
        (x2, y2, w2, h2) = face2
        cv2.rectangle(frame, (x1-10, y1-10), (x1+w1+10, y1+h1+10), (0, 0, 255), 2)
        cv2.rectangle(frame, (x2-10, y2-10), (x2+w2+10, y2+h2+10), (0, 0, 255), 2)

    # Display the frame 
    cv2.imshow('find_two_look_alike', frame)
    
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

    current_frame += 1

cap.release()
cv2.destroyAllWindows()

- [Demo Video of game_2: "find_two_look_alike"](https://drive.google.com/file/d/1MxuWQbj5SEjfAsjeWfBkNV5UmdNABRRv/view?usp=sharing)

## Game_3 : "find_the_fastest_character"

### Steps:
1. **Preprocessing:**
   - Load the video and set the frame range to analyze.
   - Initialize the HOG descriptor for detecting people.

2. **Detection and Tracking:**
   - Detect people in the first frame and initialize bounding boxes.
   - Create and initialize a multi-object tracker.

3. **Frame-by-Frame Analysis:**
   - Read each frame and update the tracker.
   - Calculate the speed of each tracked person using the Euclidean distance.
   - Identify and highlight the fastest person in the current frame.

4. **Visualization:**
   - Draw bounding boxes around all detected and tracked people.
   - Display the current frame number on the video.

### Mathematical Explanation of Speed Calculation:

#### Euclidean Distance:
- For each tracked person, calculate the speed between the current frame $(x, y)$ and the previous frame $(\text{prev\_x}, \text{prev\_y})$:
  $$
  \text{speed} = \sqrt{(x - \text{prev\_x})^2 + (y - \text{prev\_y})^2}
  $$

#### Steps:
- Initialize positions:
  - Let $ (x, y)$ be the current position.
  - Let $ (\text{prev\_x}, \text{prev\_y})$ be the previous position.

- Calculate speed for each tracked person:
  $$
  \text{speed} = \sqrt{(x - \text{prev\_x})^2 + (y - \text{prev\_y})^2}
  $$

- Identify the fastest speed:
  - Track the maximum speed recorded so far.
  - Update the fastest speed if the current speed is greater.

### Final Output:
The video frame is displayed with:
- Bounding boxes around all detected and tracked people.
- The fastest person highlighted with a different color.
- The current frame number displayed on the video.


### Requirements

3A. Input images from video file WiiPlay.mp4 with level 9 (frame number between 2480 and 2600).<br>\
3B. (5pts) <b>Detect </b>faces(or pedestrians) on the first frame and draw a <b>blue</b> rectangle around your detection.<br>\
3C. (10pts) <b>Track </b>faces(or pedestrians) on subsequent frames and draw a <b>green</b> rectangle around your tracking.<br>\
3D. (5pts) Try to find out the fastest character, draw a <b>red</b> rectangle around the fastest character, and show the output images in the <b>"find_the_fastest_character"</b> window.<br><br>

In [8]:
#game_3 : "find_the_fastest_character"

import cv2
import numpy as np

cap = cv2.VideoCapture('WiiPlay.mp4')

# Define the frame range to analyze
start_frame = 2480
end_frame = 2600

# Set the video to start at the specified frame
cap.set(cv2.CAP_PROP_POS_FRAMES, start_frame)

# Initialize the HOG descriptor/person detector
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

ret, frame = cap.read()
if not ret:
    print("Failed to read the video")
    exit()

# Detect people in the first frame
boxes, weights = hog.detectMultiScale(frame, winStride=(8, 8))

# Draw bounding boxes around detected people
for (x, y, w, h) in boxes:
    cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)

# Initialize the multi-object tracker
trackers = cv2.legacy.MultiTracker_create()
for box in boxes:
    tracker = cv2.legacy.TrackerMIL_create()
    trackers.add(tracker, frame, tuple(box))

current_frame = start_frame

fastest_speed = 0
fastest_box = None

# Store previous positions of detected people
prev_position = [tuple(box) for box in boxes]

while True:
    ret, frame = cap.read()

    if not ret or current_frame >= end_frame:
        break

    # Update tracker and get updated positions
    success, tracked_boxes = trackers.update(frame)

    if success:
        speeds = []

        # Calculate speed for each tracked person
        for i, box in enumerate(tracked_boxes):
            x, y, w, h = [int(v) for v in box]

            # Calculate speed based on the distance moved
            prev_x, prev_y, _, _ = prev_position[i]

            # Calculate speed as the Euclidean distance between the current and previous positions
            speed = ((x - prev_x) ** 2 + (y - prev_y) ** 2) ** 0.5

            # Append the speed to the list
            speeds.append(speed)

            # Update the previous position
            prev_position[i] = (x, y, w, h)
            cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 0, 255), 2)
        
        # Identify the fastest speed and corresponding person
        max_speed = max(speeds)
        if max_speed > fastest_speed:
            fastest_speed = max_speed
            fastest_box = tracked_boxes[speeds.index(max_speed)]

    # Highlight the fastest person
    if fastest_box is not None:

        # Draw bounding box for the fastest person
        x, y, w, h = [int(v) for v in fastest_box]
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 0, 255), 2)

        # Draw bounding boxes for all tracked people
        for box in tracked_boxes:
            x, y, w, h = [int(v) for v in box]
            cv2.rectangle(frame, (x, y), (x + w, y + h), (255, 0, 0), 2)
    
    # Detect people in the current frame and draw bounding boxes
    detected_boxes, weights = hog.detectMultiScale(frame, winStride=(8, 8))
    for (x, y, w, h) in detected_boxes:
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)

    # Display the current frame number on the video
    text = f'Current_Frame: {current_frame}'
    text_size = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 1, 2)[0]
    text_x = frame.shape[1] - text_size[0] - 10
    text_y = 30
    cv2.putText(frame, text, (text_x, text_y), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
    
    # Show the frame
    cv2.imshow('find_the_fastest_character', frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

    current_frame += 1

cap.release()
cv2.destroyAllWindows()

- [Demo Video of game_3: "find_the_fastest_character"](https://drive.google.com/file/d/1VhatQnO62YvWtV7ovntemcju5bkoZnKt/view?usp=sharing)

## Game_4 : "find_two_odds"

### Steps:
1. **Preprocessing:**
   - Capture frames from the video between specified start and end frames.
   - Convert frames to grayscale and resize them for faster processing.

2. **Optical Flow Calculation:**
   - Use the Farneback method to calculate the optical flow between consecutive frames.
   - Compute the flow vector and its magnitude for each pixel.

3. **Flow Direction Analysis:**
   - Identify points with significant flow magnitude.
   - Calculate the direction of the flow vector at these points.
   - Store points with significant motion along with their flow direction.

4. **Odd Character Detection:**
   - Compute the median direction of all significant flow directions.
   - Calculate the deviation of each point's direction from the median direction.
   - Identify the two points with the largest deviations and highlight them.

5. **Visualization:**
   - Draw arrows on the frame to indicate the direction of motion.
   - Highlight the two most deviated points with red rectangles.
   - Display the frame with annotations and the optical flow magnitude.

### Mathematical Explanation of Optical Flow Calculation:

#### Optical Flow Vector Calculation:
- The optical flow vector at pixel $(x, y)$ is:
  $$
  \text{opt\_flow}(x, y) = \begin{bmatrix}
  u(x, y) \\
  v(x, y)
  \end{bmatrix}
  $$
  where $u(x, y)$ and $v(x, y)$ are the horizontal and vertical components of the flow.

#### Magnitude of Optical Flow:
- The magnitude of the flow vector is:
  $$
  \|\text{opt\_flow}(x, y)\| = \sqrt{u(x, y)^2 + v(x, y)^2}
  $$

### Mathematical Explanation of Flow Direction Analysis:

#### Flow Direction Calculation:
- The direction of the flow vector at pixel $(x, y)$ is:
  $$
  \theta(x, y) = \arctan2(v(x, y), u(x, y))
  $$

#### Identifying Significant Motion:
- Points with flow magnitude greater than a threshold (e.g., 0.4) are considered to have significant motion.

#### Median Direction and Deviation:
- The median direction of significant flow directions is:
  $$
  \theta_{\text{median}} = \text{median}(\theta_1, \theta_2, \ldots, \theta_n)
  $$
- The deviation of each point's direction from the median direction is:
  $$
  \text{deviation}_i = |\theta_i - \theta_{\text{median}}|
  $$

### Odd Character Detection:
- Identify the two points with the largest deviations:
  $$
  \text{odd\_indices} = \text{argsort}(\text{deviations})[-2:]
  $$

### Final Output:
The video frame is displayed with arrows indicating the direction of motion, and the two most deviated points highlighted with red rectangles. The optical flow magnitude is also displayed as a separate image for reference.


### Requirements
4A. Input images from video file WiiPlay.mp4 with level 6 (frame number between 1650 and 1800).<br>\
4B. (10pts) Compute and show <b>optical flows</b> on each frame using <b>blue</b> arrows.<br>\
4C. (5pts) Try to detect two odd character who face the opposite direction from everyone else, draw a <b>red</b> rectangle around each of the two character, and show the output images in the <b>"find_two_odds"</b> window.<br><br>

In [12]:
#game_4 : "find_two_odds"

import cv2
import numpy as np

def display_flow(img, flow, stride=10):

    # Get the dimensions of the image
    height, width = img.shape[:2]

    # List to store points with significant optical flow
    odd_characters = []

    # Iterate through the image with a step size of `stride` pixels
    for y in range(0, height, stride):
        for x in range(0, width, stride):

            # Get the flow vector at the current point
            flow_at_point = flow[y, x]

            # Check if the magnitude of flow is significant
            if np.linalg.norm(flow_at_point) > 0.4:

                # Calculate the direction of flow
                direction = np.arctan2(flow_at_point[1], flow_at_point[0])

                # Store the point and its direction
                odd_characters.append((x, y, direction))

                # Start point of the arrow
                pt1 = (x, y)

                # Calculate the end point of the arrow
                delta = flow_at_point.astype(np.int32)[::-1]

                # Scale the arrow for better visibility and draw it on the image
                pt2 = (pt1[0] + delta[0] * 2, pt1[1] + delta[1] * 2)
                cv2.arrowedLine(img, pt1, pt2, (255, 0, 0), 1, cv2.LINE_AA, 0, 0.1)

    # If there are more than 2 points with significant optical flow
    if len(odd_characters) > 2:

        # Extract the directions of the points
        directions = np.array([c[2] for c in odd_characters])

        # Calculate the median direction
        median_direction = np.median(directions)

        # Calculate the deviation from the median direction
        deviations = np.abs(directions - median_direction)

        # Get the indices of the two most deviated points
        odd_indices = deviations.argsort()[-2:]

        # Draw a red rectangle around the two most deviated points
        for idx in odd_indices:
            x, y, _ = odd_characters[idx]
            cv2.rectangle(img, (x-30, y-30), (x+30, y+30), (0, 0, 255), 3)

    # Calculate the magnitude of the optical flow
    norm_opt_flow = np.linalg.norm(flow, axis=2)

    # Normalize for display
    norm_opt_flow = cv2.normalize(norm_opt_flow, None, 0, 1, cv2.NORM_MINMAX)

    # Display the image with arrows and boxes
    cv2.imshow('find_two_odds', img)

    # Display the optical flow magnitude
    cv2.imshow('optical flow magnitude', norm_opt_flow)

    # Check if the user pressed 'q' to quit
    if cv2.waitKey(100) & 0xFF == ord('q'):
        return 1
    else:
        return 0

cap = cv2.VideoCapture("WiiPlay.mp4")

# Set the frame range to analyze
start_frame = 1650
end_frame = 1800

cap.set(cv2.CAP_PROP_POS_FRAMES, start_frame)

# Read the first frame
_, prev_frame = cap.read()

# Convert to grayscale
prev_frame = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)

# Resize the frame for faster processing
prev_frame = cv2.resize(prev_frame, (0,0), None, 0.5, 0.5)

# Flag to indicate the first frame
first_frame = True

current_frame = start_frame

# Initialize FPS counter
fps = 0

while True:
    status_cap, frame = cap.read()
    if not status_cap or current_frame >= end_frame:
        break

    frame = cv2.resize(frame, (0,0), None, 0.5, 0.5)

    # Start the timer for FPS calculation
    timer = cv2.getTickCount()

    # Convert the frame to grayscale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    
    if first_frame:

        # Calculate the initial optical flow using Farneback method
        opt_flow = cv2.calcOpticalFlowFarneback(prev_frame, gray, None, 
                                                pyr_scale=0.4, levels=6, winsize=20, 
                                                iterations=15, poly_n=7, poly_sigma=1.5, 
                                                flags=cv2.OPTFLOW_FARNEBACK_GAUSSIAN)
        # Unset the first frame flag
        first_frame = False
    else:
        # Calculate the optical flow using the previous flow as initial estimate
        opt_flow = cv2.calcOpticalFlowFarneback(prev_frame, gray, opt_flow, 
                                                pyr_scale=0.4, levels=6, winsize=20, 
                                                iterations=15, poly_n=7, poly_sigma=1.5, 
                                                flags=cv2.OPTFLOW_USE_INITIAL_FLOW)
    # Calculate FPS
    fps = cv2.getTickFrequency() / (cv2.getTickCount() - timer)

    # Display FPS on the frame
    cv2.putText(frame, f'FPS: {int(fps)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
    
    # Update the previous frame
    prev_frame = np.copy(gray)

    # Display the flow and check if 'q' is pressed  
    if display_flow(frame, opt_flow):
        break

    current_frame += 1

    # Check if 'q' is pressed to quit
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

![Demo of picture game_4](game_4-1.png "Demo of picture game_4 ")
![Demo of picture game_4](game_4-2.png "Demo of picture game_4")

### Requirements

5A. Input continuous images from 'car.mp4'.<br>\
5B. (6pts) For each frame, detect every car using YOLOv8 trained data 'yolov8n.pt'. (mark with red rectangles)<br>\
5C. (6pts) For each car, detect a licence plate using 'license_plate_detector.pt'. (mark with blue rectangle)<br>\
5D. (6pts) For each licence plate, OCR using Tesseract. Print the recognized licence plate number above each detected licence plate. (putText() in green color)<br>\
5E. (12pts) Use whatever you learned this semester to improve the result. Write a simple report on your method and observations.<br><br>

## Game_5 : "Licence Plate Recognition"

### Steps:
1. **Preprocessing:**
   - Capture frames from the video between specified start and end frames.
   - Convert frames to grayscale and apply preprocessing techniques to enhance image quality.

2. **Vehicle Detection:**
   - Use a YOLO model to detect vehicles in each frame.
   - Store the bounding boxes and confidence scores for detected vehicles.

3. **License Plate Detection:**
   - Use another YOLO model to detect license plates within the detected vehicles.
   - Extract the regions corresponding to the license plates.

4. **Optical Character Recognition (OCR):**
   - Preprocess the extracted license plate regions to improve OCR performance.
   - Use Tesseract OCR to extract text from the license plate regions.

5. **Optical Character Recognition (OCR):**
   - Compute the confidence scores of the detected license plates.
   - Identify the two license plates with the lowest confidence scores and highlight them.

6. **Optical Character Recognition (OCR):**
   - Draw bounding boxes around detected vehicles and license plates.
   - Highlight the two license plates with the lowest confidence scores with red rectangles.
   - Display the frame with annotations and the OCR results.

## Mathematical Explanation of Image Preprocessing:

#### Grayscale Conversion
The color image $ I_{\text{rgb}}$ is converted to a grayscale image $ I_{\text{gray}}$ using the following formula:

$$ I_{\text{gray}}(x, y) = 0.299 \cdot I_{\text{rgb}}(x, y, R) + 0.587 \cdot I_{\text{rgb}}(x, y, G) + 0.114 \cdot I_{\text{rgb}}(x, y, B) $$

#### Histogram Equalization
Enhances the contrast of the grayscale image $ I_{\text{gray}}$:

$$ I_{\text{equalized}}(x, y) = \frac{I_{\text{gray}}(x, y) - \min(I_{\text{gray}})}{\max(I_{\text{gray}}) - \min(I_{\text{gray}})} \times 255 $$

#### Bilateral Filtering
Applies a bilateral filter to smooth the image $ I_{\text{equalized}} $ while preserving edges:

$$ I_{\text{filtered}}(x, y) = \frac{1}{W} \sum_{(i,j) \in S} I_{\text{equalized}}(i, j) \cdot \exp\left( -\frac{(i-x)^2 + (j-y)^2}{2\sigma_s^2} - \frac{|I_{\text{equalized}}(i, j) - I_{\text{equalized}}(x, y)|^2}{2\sigma_r^2} \right) $$

#### Adaptive Thresholding (Otsu's Method)
Converts the filtered image $ I_{\text{filtered}} $ into a binary image $ I_{\text{binary}} $:

$$ I_{\text{binary}}(x, y) = 
\begin{cases} 
255 & \text{if } I_{\text{filtered}}(x, y) \geq T_{\text{Otsu}} \\
0 & \text{if } I_{\text{filtered}}(x, y) < T_{\text{Otsu}}
\end{cases} $$

#### Morphological Operations
**Dilation**: Expands white regions in $ I_{\text{binary}} $ by a structuring element $ K $:

$$ I_{\text{dilated}} = I_{\text{binary}} \oplus K $$

**Erosion**: Shrinks white regions in $ I_{\text{dilated}} $ by the same structuring element $K$:

$$ I_{\text{eroded}} = I_{\text{dilated}} \ominus K $$

These operations help in reducing noise and closing gaps in the binary image.

## Mathematical Explanation of Object Detection:

#### YOLO Model for Cars
The YOLO (You Only Look Once) model processes the frame and provides detections in the form of bounding boxes $(x_1, y_1, x_2, y_2)$, confidence scores, and class IDs.

For each detected object:
$$ \text{score} = \text{confidence score}, \quad \text{class\_id} \in \text{vehicles} $$

#### YOLO Model for License Plates
The YOLO model detects license plates similarly, providing bounding boxes and scores.

For each detected license plate:
$$ \text{score} = \text{confidence score} $$

Crop the region defined by $(x_1, y_1, x_2, y_2)$ from the frame and preprocess it for OCR.


## Mathematical Explanation of Optical Character Recognition (OCR)

#### Tesseract OCR
Processes the preprocessed license plate image to extract text:

$$ \text{license\_plate\_text} = \text{Tesseract}(I_{\text{processed}}) $$

Append the bounding box coordinates and extracted text to the license plate annotations list.

## Annotation Drawing 

#### Draw Bounding Boxes and Text
For each annotation $(x_1, y_1, x_2, y_2, \text{text})$:

$$ \text{cv2.rectangle}(I_{\text{frame}}, (x_1, y_1), (x_2, y_2), \text{color}, 2) $$

If text is provided:

$$ \text{cv2.putText}(I_{\text{frame}}, \text{text}, (x_1, y_1 - 10), \text{font}, \text{scale}, \text{color}, \text{thickness}) $$

## Video Processing Loop

#### Read Frames
Read each frame from the video and process every $n$-th frame to reduce computation load:

$$ \text{frame\_count} \mod 5 = 0 $$


### Final Output:
The video frame is displayed with bounding boxes around detected vehicles and license plates. The two license plates with the lowest confidence scores are highlighted with red rectangles. The OCR results are also displayed.

In [5]:
#game_5: Licence Plate Recognition

import cv2
import pytesseract
import numpy as np
from ultralytics import YOLO

# Function to draw bounding boxes and text
def draw_annotations(frame, annotations, color):
    """
    Draws bounding boxes and text on the given frame.

    Args:
        frame (numpy.ndarray): The image frame on which to draw.
        annotations (list): List of annotations, each containing (x1, y1, x2, y2, text).
        color (tuple): Color of the bounding boxes (B, G, R).

    """

    for (x1, y1, x2, y2, text) in annotations:

        # Draw rectangle (bounding box) around detected object
        cv2.rectangle(frame, (int(x1), int(y1)), (int(x2), int(y2)), color, 2)

        if text:
            # Calculate width and height of the text box
            (w, h), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 1.5, 3)

            # Draw a filled rectangle as background for text
            cv2.rectangle(frame, (int(x1), int(y1) - 30), (int(x1) + w, int(y1)), (0, 0, 0), -1)

            # Put the text on top of the background
            cv2.putText(frame, text, (int(x1), int(y1) - 10), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 255, 0), 3)

# Function to preprocess image for OCR
def preprocess_image(image):
    """
    Preprocesses the image to enhance OCR performance.

    Args:
        image (numpy.ndarray): The image to preprocess.

    Returns:
        numpy.ndarray: The preprocessed binary image.
    """

    # Convert the image to grayscale
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    
    # Enhance the contrast of the grayscale image
    enhanced = cv2.equalizeHist(gray)

    # Apply bilateral filter to remove noise and keep edges sharp
    filtered = cv2.bilateralFilter(enhanced, 9, 75, 75)

    # Apply adaptive thresholding to get a binary image
    _, binary = cv2.threshold(filtered, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

    # Create a kernel for dilation and erosion
    kernel = np.ones((3, 3), np.uint8)

    # Apply dilation to fill small holes
    binary = cv2.dilate(binary, kernel, iterations=1)

    # Apply erosion to remove noise
    binary = cv2.erode(binary, kernel, iterations=1)

    return binary

# Specify the Tesseract executable path
# If your os is Windows, the path should be 'C:/Program Files/Tesseract-OCR/tesseract.exe'
pytesseract.pytesseract.tesseract_cmd = '/usr/bin/tesseract'

# Load models
car_detector = YOLO('yolov8n.pt')
license_plate_detector = YOLO('license_plate_detector.pt')

# Load video
cap = cv2.VideoCapture('car.mp4')

# Define the list of vehicle class IDs (as per your model's class mapping)
vehicles = [2, 3, 5, 7]

frame_count = 0

# Read frames
while True:
    ret, frame = cap.read()
    if not ret:
        break
    
    frame_count += 1
    
    # Process every nth frame to reduce computation
    if frame_count % 5 != 0:
        continue

    # Detect vehicles
    car_detections = car_detector(frame)[0]
    car_annotations = []

    for detection in car_detections.boxes.data.tolist():

        x1, y1, x2, y2, score, class_id = detection
        if int(class_id) in vehicles:

            # Append car detection details to annotations list
            car_annotations.append((x1, y1, x2, y2, f'Car: {int(score * 100)}%'))

    # Detect license plates in the frame
    license_plate_detections = license_plate_detector(frame)[0]
    license_plate_annotations = []

    for license_plate in license_plate_detections.boxes.data.tolist():
        x1, y1, x2, y2, score, class_id = license_plate

        # Crop the detected license plate region
        license_plate_crop = frame[int(y1):int(y2), int(x1): int(x2), :]

        # Preprocess the cropped license plate image for OCR
        processed_image = preprocess_image(license_plate_crop)

        # Perform OCR on the processed license plate image
        license_plate_text = pytesseract.image_to_string(processed_image, config='--psm 13').strip()

        # Append license plate detection details to annotations list
        license_plate_annotations.append((x1, y1, x2, y2, license_plate_text))
    
    # Red rectangles for cars
    draw_annotations(frame, car_annotations, (0, 0, 255))

    # Blue rectangles for license plates
    draw_annotations(frame, license_plate_annotations, (255, 0, 0))  

    # Resize the frame for display
    frame_resized = cv2.resize(frame, (800, 450))
    
    # Display the annotated frame
    cv2.imshow('Video', frame_resized)

    # Display the preprocessed image for OCR
    cv2.imshow('License Plate Detection', processed_image)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()







0: 384x640 22 cars, 1 bus, 2 trucks, 104.9ms
Speed: 7.3ms preprocess, 104.9ms inference, 878.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 license_plates, 97.8ms
Speed: 4.2ms preprocess, 97.8ms inference, 3.1ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 22 cars, 1 bus, 2 trucks, 145.8ms
Speed: 10.6ms preprocess, 145.8ms inference, 1.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 license_plates, 80.0ms
Speed: 3.4ms preprocess, 80.0ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 24 cars, 1 bus, 2 trucks, 104.8ms
Speed: 5.6ms preprocess, 104.8ms inference, 1.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 license_plates, 80.0ms
Speed: 2.6ms preprocess, 80.0ms inference, 0.7ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 23 cars, 1 bus, 2 trucks, 102.1ms
Speed: 9.1ms preprocess, 102.1ms inference, 1.1ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 licens

6. (5pts) Any comments regarding the final exam? Which steps you believe you have completed? Which steps bother you?<br> 
7. (5pts) Any suggestion to teaching assistants to improve this class? Any suggestion to teacher to improve this class?<br>


### My Answer

6. In the final exam, I completed all the first questions, and I was left with the most difficult functions that had not yet been optimized. 

   There are five questions in total, and I think the most difficult part of the second to fifth questions is to deal with the frame of the video, because in addition to the basic computer vision basics, such as filtering, noise reduction, edge detection, and morphology, we also need to overcome the problems of the film itself to improve the saturation and other related technologies, so that we can present the best results

7. After a semester, I think that the TA system of advanced computer vision is working very well, and basically every TA has     helped me to grasp the basic key knowledge in this class

   As for the suggestion for this class, I think for the final exam, we can take the students to team up to participate in the CVPR Data CV Challenge, which is organized by a very famous computer vision workshop to test whether you can produce a good computer vision work in a limited time

   
   - [CVPR](https://sites.google.com/view/vdu-cvpr24/home)

## Reference
- [OpenCV Tutorial](https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html)

- [Cv2.threshold Parameters document](https://docs.opencv.org/2.4/modules/imgproc/doc/miscellaneous_transformations.html?highlight=threshold#threshold)

- [Bilateral Filtering theory](https://people.csail.mit.edu/sparis/publi/2009/fntcgv/Paris_09_Bilateral_filtering.pdf)

- [OpticalFlowFarneback Parameters](https://docs.opencv.org/3.4/dc/d6b/group__video__track.html#ga5d10ebbd59fe09c5f650289ec0ece5af)

- [OpenCV Tutorial](https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html)

- [Template Matching function reference code and theory](https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_template_matching/py_template_matching.html)

- [OpenCV selectROI function reference code and theory](https://www.geeksforgeeks.org/python-opencv-selectroi-function/)

- [detect_faces function reference code and theory](https://github.com/ageitgey/face_recognition/blob/master/examples/facerec_from_video_file.py)

- [Eroding and Dilating reference code and theory](https://docs.opencv.org/3.4/db/df6/tutorial_erosion_dilatation.html)

- [Histogram Equalization reference code and theory](https://docs.opencv.org/4.x/d5/daf/tutorial_py_histogram_equalization.html)

- [find_similar_faces function theory](https://www.idiap.ch/~marcel/labs/faceverif.php)

- [find_similar_faces function reference code](https://pyimagesearch.com/2019/03/11/liveness-detection-with-opencv/)

- [preprocess_image function reference code](https://stackoverflow.com/questions/70942221/bilateral-filter-error-in-opencv-for-some-images)