# **<center><font style="color:rgb(100,109,254)">Module 8: Emotion Recognition + AI Filters</font> </center>**

<center>
    <img src='https://drive.google.com/uc?export=download&id=1ekabh-KWOZhj8UPjf5AbZLzQ767z52_T' width=800> 
    <br/>
    <a href='https://www.shutterstock.com/image-photo/emotion-detected-by-artificial-intelligence-ai-1898196328'>Image Credits</a>
</center>
    

## **<font style="color:rgb(134,19,348)"> Module Outline </font>**

The module can be split into the following parts:

- *Lesson 1: Introduction to Support Vector Machine Theory.*

- *Lesson 2: Train an Emotion Recognition SVM on FER Dataset.*

- ***Lesson 3:* Create your own Dataset for Emotion Recognition.** *(This Tutorial)*

- *Lesson 4: Create AI Filters With Emotion Recognition Based Triggers.*


**Please Note**, these Jupyter Notebooks are not for sharing; do read the Copyright message below the Code License Agreement section which is in the last cell of this notebook.
-Taha Anwar

Alright, let's get started.

### **<font style="color:rgb(134,19,348)"> Import the Libraries</font>**

First, we will import the required libraries.

In [1]:
import os
import cv2
import pickle
import itertools
import numpy as np
import pandas as pd
import mediapipe as mp
from sklearn import svm
from sklearn.model_selection import train_test_split
from previous_lesson import detectFacialLandmarks, predictEmotion
from importlib.metadata import version
print(f"Mediapipe version: {version('mediapipe')}, it should be 0.8.9.1")

Mediapipe version: 0.8.10.1, it should be 0.8.9.1


## **<font style="color:rgb(134,19,348)">Initialize the Face Landmarks Detection Model</font>**

After that, as we had done in the previous lesson, we will initialize the **`mp.solutions.face_mesh`** class and set up the **`mp.solutions.face_mesh.FaceMesh()`** function (for images and videos as well) with appropriate arguments.

In [2]:
# Initialize the mediapipe face mesh class.
mp_face_mesh = mp.solutions.face_mesh

# Setup the face landmarks function for images.
face_mesh_images = mp_face_mesh.FaceMesh(static_image_mode=True, max_num_faces=1,
                                         refine_landmarks=True, min_detection_confidence=0.3)

# Setup the face landmarks function for videos.
face_mesh_videos = mp_face_mesh.FaceMesh(static_image_mode=False, max_num_faces=1,
                                         refine_landmarks=True, min_detection_confidence=0.8, 
                                         min_tracking_confidence=0.6)

## **<font style="color:rgb(134,19,348)">Create a Function to Calculate Size of a Face Part</font>**

In this lesson, instead of just passing all the `468` Face landmarks to the model (and letting the model figure out the pattern all by itself), we will try to remove the extra landmarks (like nose landmarks don't add much value to differentiate expressions) and extract some meaningful info (like size of mouth, and eyes) from the landmarks beforehand passing them into the model for training and for this purpose now we will create a function **`getSize()`** that will utilize detected landmarks to calculate the size of a face part. To isolate the landmarks of a face part we will use the frozenset objects (attributes of the **`mp.solutions.face_mesh`** class), which contain the required indexes.

- **`mp_face_mesh.FACEMESH_FACE_OVAL`** contains indexes of face outline.
- **`mp_face_mesh.FACEMESH_LIPS`** contains indexes of lips.
- **`mp_face_mesh.FACEMESH_LEFT_EYE`** contains indexes of left eye.
- **`mp_face_mesh.FACEMESH_RIGHT_EYE`** contains indexes of right eye.
- **`mp_face_mesh.FACEMESH_LEFT_EYEBROW`** contains indexes of left eyebrow.
- **`mp_face_mesh.FACEMESH_RIGHT_EYEBROW`** contains indexes of right eyebrow.

After retrieving the landmarks of the face part, we will simply pass it to the function [**`cv2.boundingRect()`**](https://docs.opencv.org/4.5.3/d3/dc0/group__imgproc__shape.html#ga103fcbda2f540f3ef1c042d6a9b35ac7) to get the width and height of the face part. The function **`cv2.boundingRect(landmarks)`** returns the coordinates **(`x1`, `y1`, `width`, `height`)** of a bounding box enclosing the object (face part), given the landmarks but we will only need the **`height`** and **`width`** of the bounding box.

In [3]:
def getSize(image, face_landmarks, INDEXES):
    '''
    This function calculates the height and width of a face part utilizing its landmarks.
    Args:
        image:          The image of the person whose face part size is to be calculated.
        face_landmarks: The detected face landmarks of the person whose face part size is to 
                        be calculated.
        INDEXES:        The indexes of the face part landmarks, whose size is to be calculated.
    Returns:
        width:                The calculated width of the face part of the face whose landmarks indexes were passed.
        height:               The calculated height of the face part of the face whose landmarks indexes were passed.
        normalized_landmarks: A list containing the normalized landmarks of the face part whose size is calculated.
    '''
    
    # Retrieve the height and width of the image.
    image_height, image_width, _ = image.shape
    
    # Convert the indexes of the landmarks of the face part into a list.
    # Also convert it into a set, to remove the duplicate indexes.
    INDEXES_LIST = set(list(itertools.chain(*INDEXES)))
    
    # Initialize a list to store the landmarks of the face part.
    landmarks = []
    
    # Initialize a list to store the normalized landmarks of the face part.
    normalized_landmarks = []
        
    # Iterate over the indexes of the landmarks of the face part. 
    for INDEX in INDEXES_LIST:
        
        # Append the landmark into the list.
        landmarks.append(face_landmarks[INDEX])
        
        # Normalize the landmark and append it into the list.
        normalized_landmarks.append((face_landmarks[INDEX][0]/image_width,
                                     face_landmarks[INDEX][1]/image_height))
        
    # Calculate the width and height of the face part.
    _, _, width, height = cv2.boundingRect(np.array(landmarks))
    
    # Retrurn the calculated width, height and the normalized landmarks of the face part.
    return width, height, normalized_landmarks

## **<font style="color:rgb(134,19,348)">Create a Function to Extract Facial Landmarks</font>**

Now we will create a function **`extractKeypoints()`**, that will utilize the **`getSize()`** function created above, to calculate size and extract landmarks of different face parts (that add value to differentiate expressions like eyes and mouth, etc.), along with some other useful info (like the distance between eyes and eyebrows).

In [4]:
def extractKeypoints_v2(image, face_mesh):
    '''
    This function will extract the Facial Landmarks (after normalization) of different face parts in an image.
    Args:
        image:     The input image of the person whose facial landmarks needs to be extracted.
        face_mesh: The Mediapipe's face landmarks detection function required to perform the landmarks detection.
    Returns:
        extracted_landmarks: An array containing the extracted normalized facial landmarks (x and y coordinates).
    '''
    
    # Perform Face landmarks detection.
    image, face_landmarks = detectFacialLandmarks(image, face_mesh, draw=False, display=False)
    
    # Initialize a list to store the extracted landmarks.
    extracted_keypoints = []
    
    # Check if the Face landmarks in the frame are detected.
    if len(face_landmarks)>0:
        
        # Get the width, height, and the landmarks of the face outline.
        face_width, face_height, face_outline_landmarks = getSize(image, face_landmarks, 
                                                                  mp_face_mesh.FACEMESH_FACE_OVAL)

        # Get the width, height, and the landmarks of the left and right eye.
        left_eye_width, left_eye_height, left_eye_landmarks = getSize(image, face_landmarks, mp_face_mesh.FACEMESH_LEFT_EYE)
        right_eye_width, right_eye_height, right_eye_landmarks = getSize(image, face_landmarks, 
                                                                         mp_face_mesh.FACEMESH_RIGHT_EYE)
        
        # Get the landmarks of the left and right eyebrow.
        _, _, left_eyebrow_landmarks = getSize(image, face_landmarks, mp_face_mesh.FACEMESH_LEFT_EYEBROW)
        _, _, right_eyebrow_landmarks = getSize(image, face_landmarks, mp_face_mesh.FACEMESH_RIGHT_EYEBROW)
        
        # Get the width, height, and the landmarks of the mouth.
        mouth_width, mouth_height, mouth_landmarks = getSize(image, face_landmarks, mp_face_mesh.FACEMESH_LIPS)
        
        # Calculate the center of the left and right eyebrow.
        left_eyebrow_center = np.array(left_eyebrow_landmarks).mean(axis=0)
        right_eyebrow_center = np.array(right_eyebrow_landmarks).mean(axis=0)
        
        # Calculate the center of the left and right eye.
        left_eye_center = np.array(left_eye_landmarks).mean(axis=0)
        right_eye_center = np.array(right_eye_landmarks).mean(axis=0)
        
        # Calculate the y-coordinate distance from the center of the left and right eyes to the left and right eyebrows respectively.
        left_eye_eyebrow_dist =  abs(left_eye_center[1]-left_eyebrow_center[1])
        right_eye_eyebrow_dist =  abs(right_eye_center[1]-right_eyebrow_center[1])
        
        # Extend the face outline landmarks into the list.
        extracted_keypoints.extend(face_outline_landmarks)
        
        # Extend the left and right eyebrow landmarks into the list.
        extracted_keypoints.extend(left_eyebrow_landmarks)
        extracted_keypoints.extend(right_eyebrow_landmarks)
        
        # Extend the left and right eye landmarks into the list.
        extracted_keypoints.extend(left_eye_landmarks)
        extracted_keypoints.extend(right_eye_landmarks)
        
        # Extend the mouth landmarks into the list.
        extracted_keypoints.extend(mouth_landmarks)
        
        # Extend the different normalized face parts sizes and the distance between eyes and eyebrows into the list.
        extracted_keypoints.extend([(left_eye_width/face_width, left_eye_height/face_height),
                                    (right_eye_width/face_width, right_eye_height/face_height),
                                    (mouth_width/face_width, mouth_height/face_height),
                                    (left_eye_eyebrow_dist/face_height, right_eye_eyebrow_dist/face_height)])
        
    # Convert the list into an float type array.
    extracted_keypoints = np.array(extracted_keypoints, dtype=np.float64)
    
    # Return the extracted normalized facial landmarks.
    return extracted_keypoints

Now that we have all the functions we need to extract the landmarks, we can move on to initialize the parameters like the expressions which we want our model to predict and the total number of images from which we want to extract the landmarks.

In [5]:
# Specify the path where you want to store the landmarks dataset.
DATASET_DIR = 'Landmarks'

# Check if the directory doesnot already exist.
if not os.path.exists(DATASET_DIR):
    
    # Create the directory.
    os.mkdir(DATASET_DIR)

# Specify the classes with which we are gonna be working with.
expressions = ['neutral', 'happiness', 'anger', 'surprise']

# Specify the total number of images for which you want to extract the landmarks.
# This must be a multiple of the total number of classes.
total_images = 1200

# Raise an AssertionError exception, if the total number of images is not a multiple of the number of classes.
# This is done to make sure that the equal number of images landmarks for each class are extracted.
assert total_images%len(expressions) == 0, f'{total_images} must be a multiple of {len(expressions)}'

# Display the success message.
print('Initialization Completed.')

Initialization Completed.


## **<font style="color:rgb(134,19,348)">Data Collection</font>**

Now finally, its time to start collecting the data, we will utilize the function **`extractKeypoints_v2()`** to get the required landmarks from the specified number of frames/images of a real-time webcam feed for each expression (on which we want to train our SVM) and store the landmarks dataset into the disk.

In [15]:
# Initialize the VideoCapture object to read from the webcam.
camera_video = cv2.VideoCapture(0)
camera_video.set(3,1280)
camera_video.set(4,960)

# Iterate over the specified classes.
for expression in expressions:
    
    # Iterate over the images indexes for the class (expression), we are iterating upon.
    for image_index in range(total_images//len(expressions)):
        
        # Read a frame.
        ok, frame = camera_video.read()
        
        # Check if frame is not read properly.
        if not ok:
            
            # Subtract 1 from the images indexes continue to the next iteration to read the next frame.
            image_index -= 1
            continue
            
        # Flip the frame horizontally for natural (selfie-view) visualization.
        frame = cv2.flip(frame, 1)

        # Get the height and width of the frame of the webcam video.
        frame_height, frame_width, _ = frame.shape
        
        # Extract the required face keypoints of the person in the frame.
        extracted_keypoints = extractKeypoints_v2(frame, face_mesh_videos)
        
        # Check if the keypoints were not extracted successfully.
        if len(extracted_keypoints) == 0:
            
            # Continue to the next iteration to read the next frame.
            continue
        
        # Flatten the extracted keypoints array.
        extracted_keypoints = extracted_keypoints.flatten()
        
        # Write the current image index and the expression, we are iterating upon on the frame.
        cv2.putText(frame, f'{expression.upper()}, Expression Image # {image_index}', (10, frame_height-30),
                    cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)
        
        # Check if the image index is zero i.e., we are on the first image for an expression.
        if image_index == 0: 
            
            # Write the instructions to start collection data on the frame.
            cv2.putText(frame, f'Press any key to Start Collecting {expression.upper()} Data.', (10, 30), 
                       cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0, 255), 4, cv2.LINE_AA)
           
            # Display the frame.
            cv2.imshow('Data Collection', frame)
            
            # Wait until a key is pressed.
            cv2.waitKey(0)
        
        # Get the directory path inside which we have to store the landmarks for the expression, we are iterating upon.
        class_landmarks_dir = os.path.join(DATASET_DIR, expression)
        
        # Check if the directory doesnot already exist.
        if not os.path.exists(class_landmarks_dir):
            
            # Create the directory.
            os.mkdir(class_landmarks_dir)

        # Save the extracted landmarks inside a .npy file.
        np.save(os.path.join(class_landmarks_dir, str(image_index)), extracted_keypoints)
        
        # Display the frame.
        cv2.imshow('Data Collection', frame)

        # Wait for 10ms. If a key is pressed, retreive the ASCII code of the key.
        k = cv2.waitKey(10) & 0xFF

        # Check if 'ESC' is pressed and break the loop.
        if(k == 27):
            break
                    
# Release the VideoCapture Object and close the windows.
camera_video.release()
cv2.destroyAllWindows()

## **<font style="color:rgb(134,19,348)">Load the Dataset</font>**

Now that we have the landmarks dataset stored in our disk, we can load the dataset anytime we need. We will utilize the [**`numpy.load()`**](https://numpy.org/doc/stable/reference/generated/numpy.load.html#numpy-load) function to serve the purpose.

In [16]:
# Initialize lists to store the landmarks and labels.
landmarks, labels = [], []

# Iterate over the classes. 
for class_index, expression in enumerate(expressions):
    
    # Get the directory path of the expression, we are iterating upon. 
    expression_dir = os.path.join(DATASET_DIR, expression)
    
    # Get the names of the files in which the landmarks are stored.
    landmarks_files = os.listdir(expression_dir)
    
    # Iterate over the files names.
    for file_index, file_name in enumerate(landmarks_files):
        
        # Load the landmarks from a .npy file.
        image_landmarks = np.load(os.path.join(expression_dir, file_name))
        
        # Append the landmarks into the list.
        landmarks.append(image_landmarks)
        
        # Append the label into the list.
        labels.append(expression)

# Display the success message.
print("Data loaded.")        

Data loaded.


## **<font style="color:rgb(134,19,348)"> Split the dataset into Train and Test Set</font>**

Now, as you already know from our previous lesson, we need a test set to evaluate our model's performance after training. So now we will split our dataset into train and test subsets randomly using the function [**`sklearn.model_selection.train_test_split()`**](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn-model-selection-train-test-split).

In [17]:
# Split the dataset into random train and test subsets.
train_landmarks, test_landmarks, train_labels, test_labels = train_test_split(landmarks, labels, test_size=0.05)

## **<font style="color:rgb(134,19,348)">Train the Model</font>**

Now that we have the dataset ready, we can start training our SVM.

In [18]:
# Initializing the SVM Model.
model = svm.SVC(kernel='poly', degree=3, C = 1.0, probability=True)

# Start training the model on the training dataset.
model.fit(train_landmarks, train_labels)
print("Training Completed")

Training Completed


## **<font style="color:rgb(134,19,348)">Evaluate the Model</font>**

Now, after completing the training process, we can pass the test dataset to the model to evaluate its performance, as we had done in the previous lesson.

In [19]:
# Get the mean accuracy on the given test data and labels, and display it.
score = model.score(test_landmarks, test_labels)
print('Accuracy of the Model is {:.2f}%'.format(score*100))

Accuracy of the Model is 100.00%


## **<font style="color:rgb(134,19,348)">Save the Model</font>**

The evaluation results are quite satisfying, so we can now move on to saving the model into our disk.

In [20]:
# Save the model.
pickle.dump(model, open('model/face_expression_v2.sav', 'wb'))

## **<font style="color:rgb(134,19,348)">Predict Emotions On Real-Time Web-cam Feed</font>**

Now let's see how the trained model will perform on a real-time webcam feed.

In [21]:
# Initialize the VideoCapture object to read from the webcam.
camera_video = cv2.VideoCapture(0)
camera_video.set(3,1280)
camera_video.set(4,960)

# Create named window for resizing purposes.
cv2.namedWindow('Emotion Recognition', cv2.WINDOW_NORMAL)

# Load the model from disk.
loaded_model = pickle.load(open('model/face_expression_v2.sav', 'rb'))

# Iterate until the webcam is accessed successfully.
while camera_video.isOpened():
   
    # Read a frame.
    ok, frame = camera_video.read()
    
    # Check if frame is not read properly then continue to the next iteration to read the next frame.
    if not ok:
        continue
        
    # Flip the frame horizontally for natural (selfie-view) visualization.
    frame = cv2.flip(frame, 1)
    
    # Get the height and width of the frame of the webcam video.
    frame_height, frame_width, _ = frame.shape
    
    # Extract the required face keypoints of the person in the frame.
    face_landmarks = extractKeypoints_v2(frame, face_mesh_videos)
    
    # Check if the keypoints were extracted successfully.
    if len(face_landmarks) > 0:
        
        # Predict the face expression of the person inside the frame.
        frame, current_expression = predictEmotion(frame, face_landmarks, loaded_model, threshold=0.8, draw=False, display=False)
        
        # Write the predicted expression of the person on the frame.
        cv2.putText(frame, f'Prediction: {current_expression.upper()}', (10, 30),cv2.FONT_HERSHEY_PLAIN, 2, (0,255,0), 2)
    
    # Display the frame.
    cv2.imshow("Emotion Recognition", frame)
    
    # Wait for 1ms. If a key is pressed, retreive the ASCII code of the key.
    k = cv2.waitKey(1) & 0xFF
    
    # Check if 'ESC' is pressed and break the loop.
    if(k == 27):
        break
         
# Release the VideoCapture Object and close the windows.
camera_video.release()
cv2.destroyAllWindows()




# Additional comments:
#       - In this lesson, we created our own dataset
#       - The final outcome has a very impressive prediction.

Working pretty well! so the process of collecting the whole dataset from scratch was all worth the effort.

### **<font style="color:rgb(255,140,0)"> Code License Agreement </font>**
```
Copyright (c) 2022 Bleedai.com

Feel free to use this code for your own projects commercial or noncommercial, these projects can be Research-based, just for fun, for-profit, or even Education with the exception that you’re not going to use it for developing a course, book, guide, or any other educational products.

Under *NO CONDITION OR CIRCUMSTANCE* you may use this code for your own paid educational or self-promotional ventures without written consent from Taha Anwar (BleedAI.com).

```


