## Homework 3
1. Input BGR images from webcam.
2. Detect your face, mouth, and eyes. 
3. Input BGRA images from files "mustache.png" and "hat.png" (hint: cv2.imread("mustache.png", cv2.IMREAD_UNCHANGED) to read 4 channels)
4. Perform <b> Alpha Blending </b> to add mustache and hat on the right position and orientation of your face.
5. The overlaid mustache and hat should be translated, rotated , and scaled according to the movement of your face. 
6. Show your output images.
7. Any idea on how to detect face features better? Try it and compare the results. (hint: modules of <i>dlib</i> or <i>mediapipe</i>)
8. (5pts bonus) Open/Close your mouth to toggle mustache on/off.
9. (5pts bonus) Blink your eye to toggle hat on/off.
10. Upload your Jupyter code file (*.ipynb)

## requirements.txt

#### opencv-python==4.9.0.80
#### mediapipe==0.10.13
#### numpy==1.26.4

In [1]:
import cv2
import numpy as np
import mediapipe as mp

In [2]:
def list_available_cameras(max_index):
    """
    Lists the available camera indices.

    Args:
        max_index (int): The maximum index to check for available cameras.

    Returns:
        list: A list of indices where cameras are available.
    """

    # Store the indices of available cameras
    available_cameras = []

    # Iterate through the indices
    for i in range(max_index):
        cap = cv2.VideoCapture(i)

        # Check if the camera is available
        if cap.isOpened():
            # Add the index to the list of available cameras
            available_cameras.append(i)
            cap.release()        

    return available_cameras

max_camera_index = 10 # Maximum index to check for available cameras
available_cameras = list_available_cameras(max_camera_index) # Call the function

[ WARN:0@1.260] global cap_v4l.cpp:997 open VIDEOIO(V4L2:/dev/video1): can't open camera by index
[ERROR:0@1.357] global obsensor_uvc_stream_channel.cpp:159 getStreamChannelGroup Camera index out of range
[ WARN:0@1.444] global cap_v4l.cpp:997 open VIDEOIO(V4L2:/dev/video3): can't open camera by index
[ERROR:0@1.445] global obsensor_uvc_stream_channel.cpp:159 getStreamChannelGroup Camera index out of range
[ WARN:0@1.445] global cap_v4l.cpp:997 open VIDEOIO(V4L2:/dev/video4): can't open camera by index
[ERROR:0@1.446] global obsensor_uvc_stream_channel.cpp:159 getStreamChannelGroup Camera index out of range
[ WARN:0@1.446] global cap_v4l.cpp:997 open VIDEOIO(V4L2:/dev/video5): can't open camera by index
[ERROR:0@1.447] global obsensor_uvc_stream_channel.cpp:159 getStreamChannelGroup Camera index out of range
[ WARN:0@1.447] global cap_v4l.cpp:997 open VIDEOIO(V4L2:/dev/video6): can't open camera by index
[ERROR:0@1.448] global obsensor_uvc_stream_channel.cpp:159 getStreamChannelGroup C

In [3]:
# Print the available cameras

print("Available Cameras:", available_cameras)

Available Cameras: [0, 2]


## Mathematical Explanation of the Resizing Process

### Original Dimensions:
- Let $h$ be the original height of the image.
- Let $w$ be the original width of the image.

### Target Width:
- Let $W_{\text{target}}$ be the desired width of the resized image.

### Scaling Factor:
The scaling factor $S$ is calculated as the ratio of the target width to the original width:
$$
S = \frac{W_{\text{target}}}{w}
$$

### New Dimensions:
To maintain the aspect ratio, both the width and height of the image must be scaled by the same factor $S$.

#### New Width:
The new width $W_{\text{new}}$ will be:
$$
W_{\text{new}} = S \times w = W_{\text{target}}
$$

#### New Height:
The new height $H_{\text{new}}$ will be:
$$
H_{\text{new}} = S \times h
$$
Substituting $S$:
$$
H_{\text{new}} = \left(\frac{W_{\text{target}}}{w}\right) \times h = \frac{W_{\text{target}} \times h}{w}


In [4]:
def resize_overlay(overlay_img, target_width):
    """
    Resizes the overlay image to the target width while maintaining the aspect ratio.

    Args:
        overlay_img (numpy.ndarray): The image to be resized.
        target_width (int): The desired width of the resized image.

    Returns:
        numpy.ndarray: The resized image.
    """

    # Get the width and height of the overlay image
    h, w = overlay_img.shape[:2]

    # Calculate the scaling factor
    scaling_factor = target_width / float(w)

    # Resize the image and return it
    return cv2.resize(overlay_img, None, fx=scaling_factor, fy=scaling_factor, interpolation=cv2.INTER_AREA)

## Mathematical Explanation of the Overlay Process

### Positioning:
- Let $(x, y)$ be the position where the top-left corner of the overlay image (`img_overlay`) should be placed on the background image (`img`).

### Bounding Coordinates:
Calculate the coordinates of the region in the background image (`img`) that will be affected by the overlay:
- $y_1 = \max(0, y)$
- $y_2 = \min(\text{height of img}, y + \text{height of img\_overlay})$
- $x_1 = \max(0, x)$
- $x_2 = \min(\text{width of img}, x + \text{width of img\_overlay})$

Calculate the coordinates of the corresponding region in the overlay image (`img_overlay`):
- $y_{1o} = \max(0, -y)$
- $y_{2o} = \min(\text{height of img\_overlay}, \text{height of img} - y)$
- $x_{1o} = \max(0, -x)$
- $x_{2o} = \min(\text{width of img\_overlay}, \text{width of img} - x)$

### Blending Condition:
Ensure that the overlay region is valid (i.e., the regions overlap):
- $y_1 < y_2$
- $x_1 < x_2$
- $y_{1o} < y_{2o}$
- $x_{1o} < x_{2o}$

If any of these conditions are not met, the function returns without modifying the image.

### Alpha Blending:
For each color channel $c$, blend the overlay image with the background image using the alpha mask:
$$
img[y_1:y_2, x_1:x_2, c] = \alpha_{mask}[y_{1o}:y_{2o}, x_{1o}:x_{2o}] \cdot img_{overlay}[y_{1o}:y_{2o}, x_{1o}:x_{2o}, c] + (1.0 - \alpha_{mask}[y_{1o}:y_{2o}, x_{1o}:x_{2o}]) \cdot img[y_1:y_2, x_1:x_2, c]
$$
Here, $\alpha_{mask}$ represents the alpha mask values, which control the transparency. Each pixel in the region is a blend of the overlay pixel and the background pixel, weighted by the alpha mask.


In [5]:
def overlay_image_alpha(img, img_overlay, pos, alpha_mask):
    """
    Overlays img_overlay on top of img at the position specified by pos and uses alpha_mask
    to blend the images.

    Args:
        img (numpy.ndarray): The background image.
        img_overlay (numpy.ndarray): The image to overlay.
        pos (tuple): The (x, y) position to place the overlay image on the background image.
        alpha_mask (numpy.ndarray): The alpha mask to control transparency.

    Returns:
        None: The function modifies img in place.
    """

    # Unpack the position where the overlay image will be placed
    x, y = pos

    # Determine the coordinates of the region in the background image
    y1, y2 = max(0, y), min(img.shape[0], y + img_overlay.shape[0])
    x1, x2 = max(0, x), min(img.shape[1], x + img_overlay.shape[1])

    # Determine the coordinates of the region in the overlay image.
    y1o, y2o = max(0, -y), min(img_overlay.shape[0], img.shape[0] - y)
    x1o, x2o = max(0, -x), min(img_overlay.shape[1], img.shape[1] - x)

    # If there is no overlap, return without modifying the image.
    if y1 >= y2 or x1 >= x2 or y1o >= y2o or x1o >= x2o:
        return
    
    # Blend the overlay image and the background image.
    for c in range(img.shape[2]):
        # Iterate over each color channel.        
        img[y1:y2, x1:x2, c] = (alpha_mask[y1o:y2o, x1o:x2o] * img_overlay[y1o:y2o, x1o:x2o, c] +
                                (1.0 - alpha_mask[y1o:y2o, x1o:x2o]) * img[y1:y2, x1:x2, c])

## Mathematical Explanation of the Angle Calculation

To calculate the angle between two points $(x_1, y_1)$ and $(x_2, y_2)$ in a 2D plane, we use trigonometry:

### Differences in Coordinates:
- Let $\Delta x$ be the difference in the x-coordinates:
$$
\Delta x = x_2 - x_1
$$
- Let $\Delta y$ be the difference in the y-coordinates:
$$
\Delta y = y_2 - y_1
$$

### Arctangent Function:
The arctangent function, specifically $\text{arctan2}(\Delta y, \Delta x)$, computes the angle $\theta$ between the positive x-axis and the line connecting the two points. The $\text{arctan2}$ function is used instead of the standard $\text{arctan}$ because it takes into account the signs of both $\Delta x$ and $\Delta y$, providing the correct quadrant for the angle.

The angle $\theta$ in radians is given by:
$$
\theta = \text{arctan2}(\Delta y, \Delta x)
$$

### Conversion to Degrees:
To convert the angle from radians to degrees, we use the formula:
$$
\theta_{\text{degrees}} = \theta \times \frac{180}{\pi}
$$

Combining the formulas, the angle in degrees can be calculated as:
$$
\theta_{\text{degrees}} = \text{arctan2}(\Delta y, \Delta x) \times \frac{180}{\pi}
$$

This process ensures that the calculated angle correctly represents the direction from the first point to the second point, measured counterclockwise from the positive x-axis.


In [6]:
def calculate_angle(point1, point2):
    """
    Calculates the angle between two points in degrees.

    Args:
        point1 (tuple): The (x, y) coordinates of the first point.
        point2 (tuple): The (x, y) coordinates of the second point.

    Returns:
        float: The angle between the two points in degrees.
    """

    # Calculate the difference in x-coordinates and y-coordinates
    dx = point2[0] - point1[0]
    dy = point2[1] - point1[1]
    
    # Calculate the angle in radians using arctan2 and convert it to degrees.
    return np.degrees(np.arctan2(dy, dx))

## Mathematical Explanation of Detecting an Open Mouth

To determine whether the mouth is open based on facial landmarks, we use the vertical positions of specific points on the upper and lower lips. Here is a step-by-step mathematical explanation:

### Identify Relevant Landmarks:
- Let $U_1$ and $U_2$ be the y-coordinates of two points on the upper lip.
- Let $L_1$ and $L_2$ be the y-coordinates of two points on the lower lip.

In the provided code, these correspond to:
- $U_1 = \text{landmarks}[13].y$
- $U_2 = \text{landmarks}[312].y$
- $L_1 = \text{landmarks}[14].y$
- $L_2 = \text{landmarks}[317].y$

### Calculate Average Heights:
The average height of the upper lip ($U_{\text{avg}}$):
$$
U_{\text{avg}} = \frac{U_1 + U_2}{2}
$$

The average height of the lower lip ($L_{\text{avg}}$):
$$
L_{\text{avg}} = \frac{L_1 + L_2}{2}
$$

### Compute the Vertical Distance:
The vertical distance between the upper and lower lips ($D$):
$$
D = L_{\text{avg}} - U_{\text{avg}}
$$

### Determine if Mouth is Open:
Define a threshold $T$ (in this case, $T = 0.03$). The mouth is considered open if the vertical distance $D$ is greater than the threshold $T$:
$$
\text{Mouth is open if } D > T
$$

This can be mathematically written as:
$$
\text{Mouth is open if } \left(\frac{L_1 + L_2}{2} - \frac{U_1 + U_2}{2}\right) > 0.02
$$

In summary, the function calculates the average vertical positions of specified landmarks on the upper and lower lips and checks if the difference between these averages exceeds a certain threshold. If it does, the function concludes that the mouth is open.


In [7]:
def is_mouth_open(landmarks):
    """
    Determines if the mouth is open based on the positions of facial landmarks.

    Args:
        landmarks (list): A list of facial landmarks with each landmark having x and y attributes.

    Returns:
        bool: True if the mouth is open, False otherwise.
    """

    # Indices of landmarks corresponding to the upper and lower lips
    upper_lip_indices = [13, 312]
    lower_lip_indices = [14, 317]

    # Calculate the average y-coordinate of the upper lip landmarks.
    upper_lip_height = (landmarks[upper_lip_indices[0]].y + landmarks[upper_lip_indices[1]].y) / 2

    # Calculate the average y-coordinate of the lower lip landmarks.
    lower_lip_height = (landmarks[lower_lip_indices[0]].y + landmarks[lower_lip_indices[1]].y) / 2

    # Determine if the mouth is open by checking if the distance between the upper and lower lips is greater than a threshold.
    return lower_lip_height - upper_lip_height > 0.02

## Mathematical Explanation of Detecting a Closed Eye

To determine whether an eye is closed based on facial landmarks, we use the vertical positions of specific points on the eye. Here is a step-by-step mathematical explanation:

### Identify Relevant Landmarks:
- Let $T$ be the y-coordinate of the top landmark of the eye.
- Let $B$ be the y-coordinate of the bottom landmark of the eye.

In the provided code, these correspond to:
- $T = \text{landmarks}[\text{eye\_indices}[0]].y$
- $B = \text{landmarks}[\text{eye\_indices}[1]].y$

### Compute the Vertical Distance:
The vertical distance between the top and bottom landmarks of the eye ($D$):
$$
D = B - T
$$

### Determine if the Eye is Closed:
Define a threshold $T_h$ (in this case, $T_h = 0.04$). The eye is considered closed if the vertical distance $D$ is less than the threshold $T_h$:
$$
\text{Eye is closed if } D < T_h
$$

This can be mathematically written as:
$$
\text{Eye is closed if } (B - T) < 0.02
$$

In summary, the function calculates the vertical distance between the top and bottom landmarks of the eye and checks if this distance is below a certain threshold. If it is, the function concludes that the eye is closed.


In [8]:
def is_eye_closed(landmarks, eye_top_index, eye_bottom_index):
    """
    Determines if an eye is closed based on the positions of facial landmarks.

    Args:
        landmarks (list): A list of facial landmarks with each landmark having x and y attributes.
        eye_top_index (int): The index of the top landmark of the eye.
        eye_bottom_index (int): The index of the bottom landmark of the eye.

    Returns:
        bool: True if the eye is closed, False otherwise.
    """
    
    # Get the y-coordinate of the top landmark of the eye.
    eye_top = landmarks[eye_top_index].y

    # Get the y-coordinate of the bottom landmark of the eye.
    eye_bottom = landmarks[eye_bottom_index].y

    """
    Determine if the eye is closed by checking if the vertical distance between
    the top and bottom landmarks is less than a threshold.
    """
    return eye_bottom - eye_top < 0.02

## Mathematical Explanation of the Special Face Overlay Process

The process involves detecting face landmarks, checking eye and mouth states, and toggling overlays (mustache and hat) based on those states. Here's a step-by-step mathematical explanation:

### Face Detection:

#### Bounding Box Calculation:
Let $(x_{rel}, y_{rel}, w_{rel}, h_{rel})$ be the relative coordinates and dimensions of the bounding box as provided by the face detection model. Convert these relative coordinates to absolute pixel coordinates based on the image dimensions $(ih, iw)$:
$$
x = x_{rel} \times iw, \quad y = y_{rel} \times ih
$$
$$
w = w_{rel} \times iw, \quad h = h_{rel} \times ih
$$

### Landmark Detection:
Extract the coordinates of specific landmarks (e.g., eye points, mouth points) and convert them to absolute pixel coordinates:
$$
\text{landmark}[i] = (\text{landmarks}[i].x \times iw, \text{landmarks}[i].y \times ih)
$$

### Eye State Detection:
Define the indices for the top and bottom points of the eye:
$$
T_{\text{eye}} = \text{landmarks}[\text{eye\_indices}[0]].y, \quad B_{\text{eye}} = \text{landmarks}[\text{eye\_indices}[1]].y
$$
Calculate the vertical distance $D_{\text{eye}}$:
$$
D_{\text{eye}} = B_{\text{eye}} - T_{\text{eye}}
$$
Determine if the eye is closed by checking if $D_{\text{eye}}$ is less than a threshold $T_h$:
$$
\text{Eye is closed if } D_{\text{eye}} < 0.02
$$

### Mouth State Detection:
Define the indices for the upper and lower points of the lips:
$$
U_{\text{lip}} = \frac{\text{landmarks}[13].y + \text{landmarks}[312].y}{2}, \quad L_{\text{lip}} = \frac{\text{landmarks}[14].y + \text{landmarks}[317].y}{2}
$$
Calculate the vertical distance $D_{\text{lip}}$:
$$
D_{\text{lip}} = L_{\text{lip}} - U_{\text{lip}}
$$
Determine if the mouth is open by checking if $D_{\text{lip}}$ is greater than a threshold $T_h$:
$$
\text{Mouth is open if } D_{\text{lip}} > 0.02
$$

### Overlay Toggle Logic:
- Toggle the mustache when the mouth state changes from open to closed.
- Toggle the hat when the eye state changes from closed to open.

### Resizing Overlays:
Calculate the widths of the mustache and hat based on the bounding box width $w$:
$$
\text{mustache\_width} = 0.6 \times w, \quad \text{hat\_width} = w
$$

### Overlay Positioning:
Calculate the positions for the mustache and hat based on the bounding box and landmark positions:
$$
\text{mustache\_pos} = (x + 0.2w, y + 0.5h - 20), \quad \text{hat\_pos} = (x, y - \text{hat\_height})
$$

### Alpha Blending:
Blend the resized and positioned overlays with the frame using the alpha masks to ensure proper transparency:
$$
\text{frame}[y:y+h, x:x+w, c] = \alpha \times \text{overlay}[y_o:y_o+h_o, x_o:x_o+w_o, c] + (1 - \alpha) \times \text{frame}[y:y+h, x:x+w, c]
$$

This blending is performed for each color channel $c$.

### Summary
The script involves detecting the face and facial landmarks, determining the state of the eyes and mouth to toggle overlays, resizing and positioning the overlays appropriately, and blending them onto the original image using alpha masks for transparency. The mathematical operations ensure accurate positioning, resizing, and blending based on the detected landmarks and bounding box dimensions.


In [9]:
# Special solution for bonus challenge

# Initialize MediaPipe Face Detection and Face Mesh
mp_face_detection = mp.solutions.face_detection
face_detection = mp_face_detection.FaceDetection(model_selection=1, min_detection_confidence=0.5)
mp_face_mesh = mp.solutions.face_mesh
face_mesh = mp_face_mesh.FaceMesh(static_image_mode=False, max_num_faces=1, min_detection_confidence=0.5)

# Load mustache and hat images with alpha channels
mustache_img = cv2.imread("mustache.png", cv2.IMREAD_UNCHANGED)
hat_img = cv2.imread("hat.png", cv2.IMREAD_UNCHANGED)

cap = cv2.VideoCapture(0)

# Initialize toggle states and detection states
mustache_on = True
mouth_is_open = False

hat_on = True
hat_is_closed = False

# Indices of the top and bottom landmarks of the eyes
left_eye_top_index = 386
left_eye_bottom_index = 374
right_eye_top_index = 159
right_eye_bottom_index = 145

if not cap.isOpened():
    print("Error: Could not open camera.")
    exit()

# Loop to process video frames
while cap.isOpened():
    success, frame = cap.read()
    if not success:
        print("Error: Could not read frame.")
        continue

    # Convert the frame to RGB format
    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

    # Detect face mesh landmarks
    results = face_mesh.process(frame_rgb)

    # Check if face landmarks are detected
    if results.multi_face_landmarks:

        # Iterate over the detected faces
        for face_landmarks in results.multi_face_landmarks:

            # Get the landmarks
            landmarks = face_landmarks.landmark
            # Calculate the angle for mustache rotation
            nose_tip = (landmarks[1].x, landmarks[1].y)
            face_center = (landmarks[5].x, landmarks[5].y)
            angle = calculate_angle(nose_tip, face_center)


            # Check if the eye is closed to toggle the hat
            if not hat_is_closed and is_eye_closed(landmarks, left_eye_top_index, left_eye_bottom_index):
                # Eye has closed
                hat_is_closed = True
            elif hat_is_closed and not is_eye_closed(landmarks, left_eye_top_index, left_eye_bottom_index):
                # Eye was closed and is now open, toggle the hat
                hat_on = not hat_on

            # Reset the state
                hat_is_closed = False 
            
            # Check if the mouth is open to toggle the mustache
            if not mouth_is_open and is_mouth_open(landmarks):
                # Mouth has opened
                mouth_is_open = True

            # Check if the mouth is closed to toggle the mustache
            elif mouth_is_open and not is_mouth_open(landmarks):
                # Mouth was open and is now closed, toggle the mustache
                mustache_on = not mustache_on

                # Reset the state
                mouth_is_open = False

    # Detect faces in the frame
    face_detection_results = face_detection.process(frame_rgb)

    if face_detection_results.detections:
        for detection in face_detection_results.detections:

            # Get the bounding box coordinates of the detected face
            bboxC = detection.location_data.relative_bounding_box

            # Get the frame dimensions
            ih, iw, _ = frame.shape

            # Convert the bounding box coordinates to pixel values
            x, y, w, h = int(bboxC.xmin * iw), int(bboxC.ymin * ih), int(bboxC.width * iw), int(bboxC.height * ih)

            # If mustache is toggled on, overlay the mustache
            if mustache_on:
                # Resize the mustache image to fit the face
                mustache_width = int(w * 0.6)

                # Resize the mustache image
                resized_mustache_img = resize_overlay(mustache_img, mustache_width)

                # Extract the alpha channel from the resized mustache image
                resized_mustache_alpha = resized_mustache_img[:, :, 3] / 255.0

                # Remove the alpha channel
                resized_mustache_img = resized_mustache_img[:, :, :3]

                # Adjust the y-coordinate to place the mustache on the upper lip
                mustache_pos = (x + int(w * 0.2), y + int(h * 0.5) - 20)

                # Overlay the mustache
                overlay_image_alpha(frame, resized_mustache_img, mustache_pos, resized_mustache_alpha)
            
            # If hat is toggled on, overlay the hat
            if hat_on:

                # Resize the hat image to fit the face
                hat_width = int(w * 1)

                # Resize the hat image
                resized_hat_img = resize_overlay(hat_img, hat_width)

                # Extract the alpha channel from the resized hat image
                resized_hat_alpha = resized_hat_img[:, :, 3] / 255.0

                # Remove the alpha channel
                resized_hat_img = resized_hat_img[:, :, :3]

                # Adjust the y-coordinate to place the hat above the head
                hat_pos = (x, y - resized_hat_img.shape[0])

                # Overlay the hat
                overlay_image_alpha(frame, resized_hat_img, hat_pos, resized_hat_alpha)
    
    # Display the result    
    cv2.imshow('Result', frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

I0000 00:00:1716855767.928676    4958 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1716855767.931288    5119 gl_context.cc:357] GL version: 3.2 (OpenGL ES 3.2 Mesa 24.0.5-1ubuntu1), renderer: Mesa Intel(R) UHD Graphics 620 (KBL GT2)
I0000 00:00:1716855767.943650    4958 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1716855767.944924    5130 gl_context.cc:357] GL version: 3.2 (OpenGL ES 3.2 Mesa 24.0.5-1ubuntu1), renderer: Mesa Intel(R) UHD Graphics 620 (KBL GT2)
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
qt.qpa.plugin: Could not find the Qt platform plugin "wayland" in "/home/infor/miniconda3/envs/CV/lib/python3.9/site-packages/cv2/qt/plugins"


### Result

#### Normal solution

![Normal result](Result.png)


#### 2.Open/Close your mouth to toggle mustache on/off.
![Mouth open](mouth_open.png)
![Mouth close](mouth_close.png)

#### 3.Blink your eye to toggle hat on/off.

<https://drive.google.com/file/d/1O6_QgJU7tw5S52NGcVsT7j8S64rZUqoq/view?usp=drive_link>

## Any idea on how to detect face features better?

initial I use FaceMesh to detect my face, put the hat and mustache on my face, and second cell I use FaceDetection, finally to compare the result.

## Result
According to the result,I find in the same solution, parameter, algorithm and alpha value, If we use FaceMesh can solve the  bonus challenge, on the other hand, only use FaceDetection just put the hat and mustache on my face, it is cant detect my eye top, eye bottom, upper lip and lower lip, why? now I will explain the different between MediaPipe FaceMesh and FaceDetection



## Comparison of MediaPipe FaceMesh and FaceDetection

**FaceMesh** and **FaceDetection** are both powerful tools within the MediaPipe framework designed for different purposes related to facial analysis. Here's a comparison of their features, applications, and performance:

### MediaPipe FaceMesh

**Purpose:**
- Provides a detailed 3D mesh of the face with 468 landmarks, offering high precision in capturing facial geometry.

**Key Features:**
- **High Precision Landmarks**: It captures 468 points on the face, making it suitable for applications requiring fine-grained details.
- **3D Output**: Outputs 3D coordinates, allowing for applications involving depth information.
- **Facial Expressions and Features**: Can accurately capture expressions, eyebrow movement, eye blinking, and lip movements.
- **Real-Time Performance**: Optimized for real-time applications with a high frame rate.

**Applications:**
- **Augmented Reality**: Applying virtual makeup, glasses, masks, and other facial accessories.
- **Animation and Avatars**: Driving 3D avatars and animations with detailed facial expressions.
- **Medical and Health**: Analyzing facial symmetry for medical applications, including dental and orthodontic assessments.
- **Facial Recognition**: Enhancing facial recognition systems by providing detailed landmark information.

**Performance:**
- **Complexity**: Higher computational cost due to the large number of landmarks.
- **Accuracy**: High accuracy and precision in landmark detection.
- **Use Cases**: Suitable for applications needing detailed facial geometry and high precision.

### MediaPipe FaceDetection

**Purpose:**
- Provides fast and reliable face detection, identifying the presence and location of faces in images or videos.

**Key Features:**
- **Bounding Box Detection**: Outputs bounding boxes for detected faces.
- **Fast and Efficient**: Optimized for speed and efficiency, making it suitable for real-time applications.
- **Confidence Scores**: Provides confidence scores for detected faces.
- **Simpler Output**: Outputs fewer landmarks compared to FaceMesh (6 key points), focusing on face location and orientation.

**Applications:**
- **Basic Face Detection**: Identifying faces in images and videos for security cameras, social media filters, and photo organization.
- **Initial Step for Further Processing**: Often used as a preprocessing step for other applications like face recognition, emotion detection, or applying FaceMesh for detailed analysis.
- **Real-Time Applications**: Suitable for applications requiring quick face detection without the need for detailed facial landmarks.

**Performance:**
- **Complexity**: Lower computational cost due to simpler detection mechanism.
- **Speed**: Very fast and efficient, suitable for real-time processing.
- **Use Cases**: Ideal for applications needing quick and reliable face detection without the need for detailed facial landmarks.

### Summary

- **FaceMesh** is suited for applications requiring detailed facial geometry and high precision, such as augmented reality, animation, and medical analysis.
- **FaceDetection** is ideal for applications needing fast and efficient face detection, such as security systems, social media filters, and as a preprocessing step for more detailed facial analysis.

Actually, **FaceDetection** can be used to quickly identify faces in a scene, and **FaceMesh** can be applied to those detected faces for detailed analysis and rendering. This combination allows for both efficiency and precision in various facial analysis applications.

In [10]:
# use FaceDetection to detect the face in the frame

cap = cv2.VideoCapture(0)

mp_face_detection = mp.solutions.face_detection
mp_drawing = mp.solutions.drawing_utils
face_detection = mp_face_detection.FaceDetection(model_selection=0, min_detection_confidence=0.5)

# Load mustache and hat images with alpha channels
mustache_img = cv2.imread("mustache.png", cv2.IMREAD_UNCHANGED)
hat_img = cv2.imread("hat.png", cv2.IMREAD_UNCHANGED)

# Initialize toggle states and detection states
mustache_on = True
mouth_is_open = False

hat_on = True
hat_is_closed = False

# Indices of the top and bottom landmarks of the eyes
left_eye_top_index = 386
left_eye_bottom_index = 374
right_eye_top_index = 159
right_eye_bottom_index = 145

if not cap.isOpened():
    print("Error: Could not open camera.")
    exit()

# Loop to process video frames
while cap.isOpened():
    success, frame = cap.read()
    if not success:
        print("Error: Could not read frame.")
        continue

    # Convert the frame to RGB format
    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

    # Detect face mesh landmarks
    results = face_mesh.process(frame_rgb)

    # Check if face landmarks are detected

    # Detect faces in the frame

    face_detection_results = face_detection.process(frame_rgb)

    if face_detection_results.detections:
        for detection in face_detection_results.detections:

            # Get the bounding box coordinates of the detected face
            bboxC = detection.location_data.relative_bounding_box

            # Get the frame dimensions
            ih, iw, _ = frame.shape

            # Convert the bounding box coordinates to pixel values
            x, y, w, h = int(bboxC.xmin * iw), int(bboxC.ymin * ih), int(bboxC.width * iw), int(bboxC.height * ih)

            # If mustache is toggled on, overlay the mustache
            if mustache_on:
                # Resize the mustache image to fit the face
                mustache_width = int(w * 0.6)

                # Resize the mustache image
                resized_mustache_img = resize_overlay(mustache_img, mustache_width)

                # Extract the alpha channel from the resized mustache image
                resized_mustache_alpha = resized_mustache_img[:, :, 3] / 255.0

                # Remove the alpha channel
                resized_mustache_img = resized_mustache_img[:, :, :3]

                # Adjust the y-coordinate to place the mustache on the upper lip
                mustache_pos = (x + int(w * 0.2), y + int(h * 0.5) - 20)

                # Overlay the mustache
                overlay_image_alpha(frame, resized_mustache_img, mustache_pos, resized_mustache_alpha)
            
            # If hat is toggled on, overlay the hat
            if hat_on:

                # Resize the hat image to fit the face
                hat_width = int(w * 1)

                # Resize the hat image
                resized_hat_img = resize_overlay(hat_img, hat_width)

                # Extract the alpha channel from the resized hat image
                resized_hat_alpha = resized_hat_img[:, :, 3] / 255.0

                # Remove the alpha channel
                resized_hat_img = resized_hat_img[:, :, :3]

                # Adjust the y-coordinate to place the hat above the head
                hat_pos = (x, y - resized_hat_img.shape[0])

                # Overlay the hat
                overlay_image_alpha(frame, resized_hat_img, hat_pos, resized_hat_alpha)
            
    # Display the result
    cv2.imshow('Result', frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()


I0000 00:00:1716855787.149405    4958 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1716855787.151012    5237 gl_context.cc:357] GL version: 3.2 (OpenGL ES 3.2 Mesa 24.0.5-1ubuntu1), renderer: Mesa Intel(R) UHD Graphics 620 (KBL GT2)


## Reference
- [Face position](https://chtseng.wordpress.com/2022/03/03/mediapipe_face_mesh%E7%9A%84%E4%BD%BF%E7%94%A8/)

- [Overlay image](https://stackoverflow.com/questions/40895785/using-opencv-to-overlay-transparent-image-onto-another-image)

- [Numpy](https://numpy.org/)

- [MediaPipe](https://ai.google.dev/edge/mediapipe/solutions/guide)

- [FaceMesh](https://i-know-python.com/facemesh-with-mediapipe-and-python/)

- [resize_overlay function reference code and theory](https://answers.opencv.org/question/232691/why-is-setting-frame-height-and-width-prevents-saving-the-webcam-stream-to-disk/)

- [overlay_image_alpha function reference code and theory](https://stackoverflow.com/questions/14063070/overlay-a-smaller-image-on-a-larger-image-python-opencv)

- [calculate_angle function reference code and theory](https://stackoverflow.com/questions/42258637/how-to-know-the-angle-between-two-vectors)

- [is_mouth_open and is_eye_closed function reference code and theory](https://towardsdatascience.com/how-to-detect-mouth-open-for-face-login-84ca834dff3b)
