# **<center><font style="color:rgb(100,109,254)">Module 6: AI Video Director For Automating Multi-Camera Setup</font> </center>**

<center>
    <img src='https://drive.google.com/uc?export=download&id=19tHZtvNS8ot5c9jvsbjPRk2SkI-8_1_Z' width=800> 
</center>
    

## **<font style="color:rgb(134,19,348)"> Module Outline </font>**

The module can be split into the following parts:

- *Lesson 1: Extract Eyes and Nose Keypoints*

- ***Lesson 2:* Create an AI Director for Automating a Multi-Camera Setup in OpenCV** *(This Tutorial)*

- *Lesson 3: Utilize the AI Director for Automating a Multi-Camera Setup in OBS*


**Please Note**, these Jupyter Notebooks are not for sharing; do read the Copyright message below the Code License Agreement section which is in the last cell of this notebook.
-Taha Anwar

Alright, let's get started.

### **<font style="color:rgb(134,19,348)"> Import the Libraries</font>**

First, we will import the required libraries.

In [11]:
import cv2
import numpy as np
import mediapipe as mp
import matplotlib.pyplot as plt
from previous_lesson import detectFacialLandmarks, getFaceKeypoints
from importlib.metadata import version
print(f"Mediapipe version: {version('mediapipe')}, it should be 0.8.9.1")

Mediapipe version: 0.8.10.1, it should be 0.8.9.1


## **<font style="color:rgb(134,19,348)">Initializations</font>**

After that, in this step, we will perform all the initializations required to build the application.

### **<font style="color:rgb(134,19,348)">Cameras Indexes List</font>**

So first, we will have to initialize a list containing the indexes of the cameras that we want to use in the application. Obviously, You should have at least two webcams to test this application.

In [12]:
# Initialize  a list to store the indexes of the cameras.
CAMERAS_INDEXES = [0, 1]

### **<font style="color:rgb(134,19,348)">Face Landmarks Detection Model</font>**


After that, we will have to initialize the **`mp.solutions.face_mesh`** class and then set up the **`mp.solutions.face_mesh.FaceMesh()`** function with appropriate arguments (for each webcam), as we have been doing in the previous lessons.

In [13]:
# Initialize the mediapipe face mesh class.
mp_face_mesh = mp.solutions.face_mesh

# Initialize a list to store the facemesh functions for different webcam feeds.
facemesh_functions = []

# Iterate over the number of times equal to the number of cameras.
for i in range(len(CAMERAS_INDEXES)):
    
    # Setup the face landmarks function for the camera.
    facemesh_functions.append(mp_face_mesh.FaceMesh(static_image_mode=False, max_num_faces=1, 
                                                    refine_landmarks=True, 
                                                    min_detection_confidence=0.5,
                                                    min_tracking_confidence=0.3))

## **<font style="color:rgb(134,19,348)">Create a Function to Calculate Head Pose Score</font>**

Now we will create a function **`getHeadScore()`**, that will utilize the nose and eyes landmarks to calculate the difference between the nose tip landmark and the mid-point between the eyes center landmarks. This difference will be the lowest for the camera towards which the person will be looking, so we will call this difference as head pose score which will help us to automate switching between cameras in real-time.

In [14]:
def getHeadScore(keypoints):
    '''
    This function calculates the difference between the nose tip and both eyes mid-point.
    Args:
        keypoints: A tuple containing the nose, left eye center, right eye center landmarks.
    Returns:
        difference_norm: The normalized difference between the nose tip and both eyes mid-point.
    '''
    
    # Get the nose tip, left eye center, and right eye center keypoints.
    nose_tip, left_eye_center, right_eye_center = keypoints
    
    # Get the x-coordinates of the nose tip, left eye center, and right eye center keypoints.
    nose_x, _ = nose_tip
    left_eye_x, _ = left_eye_center
    right_eye_x, _ = right_eye_center
    
    # Calculate the mid-point of the x-coordinates of the left eye center, and right eye center.
    mid_x = (left_eye_x + right_eye_x)/2
    
    # Get the difference betweeen the x-coordinates of the nose tip
    # and mid-point of the left eye center, and right eye center.
    difference = abs(nose_x - mid_x)
    
    # Get the x-coordinate distance between the left eye center, and right eye center.
    eyes_distance_x = abs(left_eye_x - right_eye_x)
            
    # Normalize the difference by dividing it with the distance between the left, and right eye.
    # This is done so that the difference in distance of different cameras 
    # from the person does not effect the score (difference).
    difference_norm = difference / eyes_distance_x
    
    # Return the normalized difference.
    return difference_norm

Now we will utilize the function **`getHeadScore()`** created above, to get the head pose score (difference) for each camera and will select the camera towards which the person in the feed is looking. Note that each camera feed should only have one same person in it to get this application to work properly.

In [15]:
# Initialize a list to store the VideoCapture objects of different webcams.
cameras_readers = []

# Iterate over the indexes of the cameras.
for camera_id, camera_index in enumerate(CAMERAS_INDEXES):
    
    # Append a VideoCapture object into the list.
    #cameras_readers.append(cv2.VideoCapture(camera_index))
    cameras_readers.append(cv2.VideoCapture(camera_index))

    # Set the webcam feed width and height.
    cameras_readers[camera_id].set(3,1280)
    cameras_readers[camera_id].set(4,960)
    
    # Create a named window for resizing purposes.
    cv2.namedWindow(f'Camera {camera_id}', cv2.WINDOW_NORMAL)

# This Will Make the Window have the same size for all cameras.
win_name =  'Selected Camera Video'
cv2.namedWindow(win_name, cv2.WINDOW_NORMAL)
cv2.resizeWindow(win_name, 1600, 1000)
   
# Iterate until a termination (break) statement is executed.
while True:
    
    # Initialize a list to store a frame of the webcam towards which user is looking.
    frame_to_show = []
    
    # Initialize a variable to store the minimum score across all the webcam feeds.
    min_score = 1000
    
    # Iterate over the VideoCapture objects. 
    for camera_id, camera_reader in enumerate(cameras_readers):
        
        # Read a frame.
        ok, frame = camera_reader.read()

        # Check if frame is not read properly then 
        # continue to the next iteration to read the next frame.
        if not ok:
            continue

        # Flip the frame horizontally for natural (selfie-view) visualization.
        frame = cv2.flip(frame, 1)
    
        # Perform Face landmarks detection.
        frame, face_landmarks = detectFacialLandmarks(frame, facemesh_functions[camera_id], 
                                                      draw=False, display=False)

        # Check if the Face landmarks in the frame are detected.
        if len(face_landmarks)>0:

            # Get the nose, left eye center, and right eye center landmarks.
            frame, keypoints = getFaceKeypoints(frame, face_landmarks, draw=False, display=False)
            
            # Calculate the difference between the nose tip and both eyes mid-point.
            score = getHeadScore(keypoints)
            
            # Check if the calculated score is less than the minimum score.
            if score < min_score:
                
                # Update the frame (to show) and the minimum score.
                frame_to_show = frame
                min_score = score
        
        # Display the frame of the webcam feed we are iterating upon.
        cv2.imshow(f'Camera {camera_id}', frame)
    
    # Check if the frame (to show) variable has a valid value.
    if len(frame_to_show) > 0:
        
        # Display the frame (with minimum score) of the webcam i.e., towards which user is looking.
        cv2.imshow('Selected Camera Video', frame_to_show)

    # Wait for 1ms. If a key is pressed, retreive the ASCII code of the key.
    k = cv2.waitKey(1) & 0xFF    

    # Check if 'ESC' is pressed and break the loop.
    if(k == 27):
        break
        
# Iterate over the VideoCapture objects. 
for camera_reader in cameras_readers:
    
    # Release the VideoCapture Object.                  
    camera_reader.release()

# Close the windows.
cv2.destroyAllWindows()

Nice! working perfectly fine.

### **<font style="color:rgb(255,140,0)"> Code License Agreement </font>**
```
Copyright (c) 2022 Bleedai.com

Feel free to use this code for your own projects commercial or noncommercial, these projects can be Research-based, just for fun, for-profit, or even Education with the exception that you’re not going to use it for developing a course, book, guide, or any other educational products.

Under *NO CONDITION OR CIRCUMSTANCE* you may use this code for your own paid educational or self-promotional ventures without written consent from Taha Anwar (BleedAI.com).

```
