<h1>Translating Realtime Human Facial Expressions to an Emoji through a Trained CNN Algorithm <h1>

<h2>Project Overview</h2>

1. Project Purpose/Description
2. Tool/Environment Setup 
3. Theory Exploration (ML, NN, CNNs)
4. More imports & Data Proprocessing
5. Create Model
6. Compile Model
7. Create your emojis
8. Implement GUI 
9. Testing & Improving Accuracy 
10. Debugging
11. Reflection

<h2><mark>#1</mark> Project Goals/Description</h2>

The goal of this project is to create a model capable of detecting human emotion through a realtime web cam and match the expression with a corresponding emoji. 

For that we use a dataset containing more than 28700 images that is already classified in one of these 7 categories: angry, disgust, fear, happy, neutral, sad, and surprise. 

We are going to create a machine learning algorithm, specifically a Convolutional Neural Network (CNN), with the platform Tensorflow to train the model based on this data to recognize facial expressions and map those same emotions on an emoji. 

> Integrating the model with the frontend should result in a functionality that looks like this!

![](https://i.imgur.com/qZwnblY.png)


<h2><mark>#2</mark> Tool/Environment Setup</h2>

<h3>Some tools/topics covered</h3>

- Language: Python

- Deep Neural Networks (Tensorflow)

- Python Packages (Keras)

<h3>1. VSCode Environment</h3>

(a) Create a new folder in File Explorer and name it *Project Name* in your C Drive (Directly in your OS folder)

(b) Open the folder in VSCODE 

(c) Create a new folder called "src" and two new files called "train.py" and "emoji.py". 

(d) Now create 2 subfolders under "src" called "data" and "emojis". 

(e) Navigate to [this dataset](https://www.kaggle.com/datasets/msambare/fer2013) on Kaggle and download it. We will be using this dataset to train our model so look around and familiarize yourself with what this data is!

(f) Download and extract the data into the "data" folder. You should now be able to see two subset folders labeled "train" and "test" folders with many pictures under the "data" folder. 

We will be filling in the emojis folder later. This is all you need to set up for now!


<h3>2. Modules to Install</h3>

- <b>OpenCV</b>: Otherwie known as Open Source Computer Vision. A library that provides a set of tools/functions to process/analyze images and videos 

- <b>Numpy</b>: Python library that allows us to use multi-dimensional rrays to store large datasets and use optimized mathematical functions for data analysis

- <b>Tensorflow</b>: A very useful tool for machine learning. Takes data, builds a model, trains it, and then lets us use the trained model to make predictions!

- <b>Keras</b>: A high-level neural networks API integrated into Tensorflow

Run these commands in terminal to install. These packages will later be used when compiling and training the model. 

> FOR WINDOWS 

    pip install opencv-python

    pip install numpy==1.22

    pip install tensorflow==2.12.0 
    
    pip install keras==2.12.0


<h2><mark>#3</mark> Theory Exploration: Machine Learning & Neural Networks </h2>

<h3>Machine Learning</h3>

![](https://i.imgur.com/w8bT2HJ.png)

- The term <nark>machine learning</mark> has become a buzz word used by all those interested or knowledgable about the tech world. But what really is it? 

- To put it simply, machine learning is like <b>teaching a computer to learn things by itself</b>. Just like how a child is able to recognize what a dog is after many experiences of seeing or playing with a dog, if we show a computer lots of pictures of animals and tell it which animal is which, the computer will learn to recognize those animals by itself when given new pictures. 

- Thus, machine learning is a way for computers to detect patterns and make predictions based on data rather than being explicitly programmed to do a certain task. 

<h3>Neural Networks</h3>

![](https://i.imgur.com/3bORFz5.png)

- A <mark>Neural Network</mark> is a type of machine learning model. It has 3 main types of layers: input, hidden, and output. It is designed <b>to work like a human brain by processing information through layers of connected neurons</b>. 

    - Each neuron recieves input, processes it, and then sends an output to the next layer of neurons. 

    - Each layer learns to identify increasingly complex features and patterns, building on the features learned by the previous layers. For example, in an image recognition task the 1st layer might learn to identify simple features such as edges/corners and in the next layer it might learn to identify more complex features such as curves or textures that are made up of these simple features. 

- In the picture above is an example of a <mark>Deep Neural Network</mark>, which is just a neural network with more than 2 hidden layers. These hidden layers are where most of the computations are made to identify patterns in the data and make predictions. The more # of hidden layers, the more the neural network is able to learn and recognize more COMPLEX patterns in the input data. 

<h3>Why are we using a Deep Convolutional Neural Network (CNN)? </h3>

![](https://i.imgur.com/3RO81Ua.png)

- A <mark>Convolutional Neural Network</mark> is a type of neural network that is <b>IDEAL for image classification</b> because it is specifically designed to recognize patterns and features within images 
    - Usually after the convolutional layers, there are <b>pooling layers</b> that look at small areas of the image, and then take the max/avg value in that area. This reduces the number of pixels in the image while keeping the most important info about the features for pattern recognition! 



<h2><mark>#4</mark> More imports & Data Preprocessing </h2>

Navigate to the <b>train.py file</b>

<h3>1. Import Packages </h3>

> 
    import numpy as np 
    from tensorflow import keras                                    
    from keras.models import Sequential, load_model                   
    from keras.layers import Dense, Dropout, Flatten
    from keras.layers import Conv2D
    from keras.optimizers import Adam
    from keras.layers import MaxPooling2D
    from keras.preprocessing.image import ImageDataGenerator

We will learn about what these functions do soon~

----------------------------------------

<b>Before we can even start making our model, we need to pre-process our data. We will be rescaling, applying filters, and resizing the images to be compatible for NN training</b>

<h3>2. Train data </h3>

#Define the directories where training/testing data located
>
    train_dir = 'data/train'
    value_dir = 'data/test'

#Divides image pixel values by 255 to scale down pixel values to normalized range between 0 and 1 for NN training

    train_datagen = ImageDataGenerator(rescale=1./255)
    value_datagen = ImageDataGenerator(rescale=1./255)


#Loads images from train_dir
>  
    train_generator = train_datagen.flow_from_directory(
        train_dir,

#Resize images to 48 x 48 pixels 
>
        target_size = (48, 48),

#Number of images processed in each batch 
>
        batch_size = 64,

#Convert images to grayscale (reduce dimensionality of input data from RGB to intensity)
>
        color_mode = "grayscale",

#Labels for images are categorical values (Ex. Happy, Sad, Surprised, etc)
>
        class_mode = 'categorical'
        )
#Same process for test data    
>    
    value_generator = value_datagen.flow_from_directory(
        train_dir,
        target_size = (48, 48),
        batch_size = 64,
        color_mode = "grayscale",
        class_mode = 'categorical'
        )



<h2><mark>#5</mark> Create Model </h2>

Continue in the train.py file. 

We can now start building our Convolutional Neural Network layer by layer using the sequential model.

In order to create an <b>accurate</b> model, we are going to implement many convolutional layers to detect complex patterns, pooling layers to downsample the data, regularization techniques to prevent overfitting, flatten layers to prepare for fully connected layers, and dense layers to prepare for classification using activation functions.
<hr>

>
    emotion_model = Sequential()

#Adding 2 convolutional layers that are responsible for detecting local patterns in the input data
#1st layer has 32 filters of size 3x3 pixels and applies the ReLU activation function, 2nd is the same except has 64 filters that allows the model to extract more complex patterns
> 
    emotion_model.add(Conv2D(32, kernel_size=(3,3), activation='relu', input_shape=(48,48,1)))
    emotion_model.add(Conv2D(64, kernel_size = (3,3), activation = 'relu'))

#Pooling layers: Downsample data to look at small areas of the image (reduces spatial dimensions while retaining important features)
>
    emotion_model.add(MaxPooling2D(pool_size=(2,2)))

#Regularization: Randomly sets input units to 0 to prevent overfitting (learn noise rather than actual signal)
>
    emotion_model.add(Dropout(0.25))

#More convolutional/pooling layers and regularization
>
    emotion_model.add(Conv2D(128, kernel_size=(3,3), activation='relu'))
    emotion_model.add(MaxPooling2D(pool_size=(2,2)))
    emotion_model.add(Conv2D(128, kernel_size=(3,3), activation = 'relu'))
    emotion_model.add(MaxPooling2D(pool_size=(2,2)))
    emotion_model.add(Dropout(0.25))

#Reshapes ouput from previous layers into 1D vector to prepare for the fully connected layers
>
    emotion_model.add(Flatten())

#1st dense layer: 1024 neurons fully connected layer
>
    emotion_model.add(Dense(1024, activation='relu'))
    emotion_model.add(Dropout(0.5))

#2nd dense layer: 7 neurons which represents the # of possible output classes (emotions)
#Uses softmax activation function to convert final layer's raw predicted values into a probability distribution over the different classes for classification
>
    emotion_model.add(Dense(7, activation='softmax'))

<h2><mark>#6</mark> Compile the Model </h2>


#Prepare model for training by defining how it will measure loss, update its weights, and evaluate its prediction performance
>
    emotion_model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.0001, decay=1e-6), metrics=['accuracy'])

#Train the model using a generator
>
    emotion_model_info = emotion_model.fit_generator(
        train_generator,

        #Number of batches processed in each epoch
        steps_per_epoch =28709 // 64,

        #Number of times the entire dataset is passed through the model for training
        epochs=50,

        validation_data=validation_generator,

        #Number of batches to be processed for validation in each epoch
        validation_steps=7178 // 64 
    )

#Save learned parameters
>
    emotion_model.save_weights('model.h5')

<hr>

<b>Congratulations! All the code is in place, and you are now ready to compile and train your model!</b>

- Go into your terminal and cd into the src folder
- Run the line below and watch your neural network slowly but surely go through the epoch 50 times! 
>
    python train.py

- The output should look something like this. Watch the loss decrease and the accuracy increase !

Start: 
>
![](https://i.imgur.com/0WzdMVU.png)

End: 
>
![](https://i.imgur.com/SkgIxW5.png)

<h2><mark>#7</mark> Fun! Create your personalized emojis :) </h2>

<b>Now is time to design our emojis that will match with our realtime human facial detection</b>

- Follow this [link](https://getavataaars.com/?accessoriesType=Blank&avatarStyle=Transparent&clotheColor=Red&clotheType=BlazerSweater&eyeType=Default&hairColor=Black&mouthType=Default&topType=LongHairNotTooLong) and create 7 seperate emojis for each of the 7 emotions
- Save the images as angry.png, disgusted.png, fearful.png, happy.png, neutral.png, sad.png, surprised.png and place them in the "emojis" folder

Here are some examples for inspo:

Happy

![](https://i.imgur.com/eLvInGM.png)

Surprised

![](https://i.imgur.com/Kxz21Ci.png)

Disgusted

![](https://i.imgur.com/nVGEWFB.png)


<h2><mark>#8</mark> Implement GUI </h2>

Now we will work in the <b>emoji.py</b>

<h3>1. Import Packages</h3>

>
import tkinter as tk 
from tkinter import * 
import cv2 
from PIL import Image, ImageTk
import os
from cv2 import CAP_V4L2
import numpy as np
import cv2 
from keras.models import Sequential 
from keras.layers import Dense, Dropout, Flatten 
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
import threading 
import time

<h3>2. Copy Code </h3>

Copy the following from the train.py file
>

    emotion_model = Sequential()

    emotion_model.add(Conv2D(32, kernel_size=(3,3), activation='relu', input_shape=(48,48,1)))
    emotion_model.add(Conv2D(64, kernel_size = (3,3), activation = 'relu'))

    emotion_model.add(MaxPooling2D(pool_size=(2,2)))

    emotion_model.add(Dropout(0.25))

    emotion_model.add(Conv2D(128, kernel_size=(3,3), activation='relu'))
    emotion_model.add(MaxPooling2D(pool_size=(2,2)))
    emotion_model.add(Conv2D(128, kernel_size=(3,3), activation = 'relu'))
    emotion_model.add(MaxPooling2D(pool_size=(2,2)))
    emotion_model.add(Dropout(0.25))

    emotion_model.add(Flatten())

    emotion_model.add(Dense(1024, activation='relu'))
    emotion_model.add(Dropout(0.5))

    emotion_model.add(Dense(7, activation='softmax'))
    emotion_model.load_weights('model.h5')


<h3>3. Create dictionaries to later access emotion text and emojis</h3>

> 

    #Disable use of OpenCV
    cv2.ocl.setUseOpenCL(False)

    #Create dictionary of emotions 
    emotion_dict = {
        0: "   Angry   ", 
        1: "   Disgusted   ", 
        2: "   Fearful   ", 
        3: "   Happy   ", 
        4: "   Neutral   ", 
        5: "   Sad   ", 
        6: "   Surprised   "}

    #Generate path
    cur_path = os.path.dirname(os.path.abspath(__file__))

    #Navigate from current path into emojis folder and pick corresponding emotion
    emoji_dist = {
        0: cur_path+"/emojis/angry.png",
        1: cur_path+"/emojis/disgusted.png",
        2: cur_path+"/emojis/fearful.png",
        3: cur_path+"/emojis/happy.png",
        4: cur_path+"/emojis/neutral.png",
        5: cur_path+"/emojis/sad.png",
        6: cur_path+"/emojis/surprised.png",
    }

<h3> 4. Initialize variables and arrays </h3>

>
    #Stores the last captured video frame
    global last_frame1
    #Initializes array of 0s that will pass in image's RGB values 
    last_frame1 = np.zeros((480, 640, 3), dtype=np.uint8)
    global cap1 
    #Initialize a list with a single element (index from emotion dictionary) 
    show_text = [0] 
    #Default emoji index (neutral)
    show_text[0] = 4
    #Event to synchronize subject and avatar threads
    switch_thread_event = threading.Event()
    #Event to signal the reads when to stop execution
    stop_event = threading.Event()

    # Debug counters
    subject_count = 0 
    avatar_count = 0

<h3> 5. Create function to capture and read subject in frame</h3>

> 
    
#Function to capture video frames from webcam and detect emotions on the subject's face
def show_subject():
    global subject_count

    #While program still running
    while not stop_event.is_set():
        # Wait for the switch thread event to be set
        if not switch_thread_event.wait(5):
            print("Subject Timeout occurred!")
            break

        #Open webcam
        cap1 = cv2.VideoCapture(0)

        if not cap1.isOpened():
            print("Can't find the camera")
        else:
            print("Opened Camera")

        # frame 1 captures the video frame by frame, flag1 returns frame status 
        flag1, frame1 = cap1.read()

        #Resize frame for faster processing
        frame1 = cv2.resize(frame1, (600,500))

        #Haarcascade classifier detects face in frame using pretrained info ab facial features
        bounding_box = cv2.CascadeClassifier('C:\Emojify\data\haarcascade_frontalface_default.xml')
        #Converted to grayscale for better face detection accuracy
        gray_frame = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)

        #Note: Adjust scaleFactor and minNeighbors for prediction accuracy 
        #detectMultiScale function returns the coordinates and dimensions of the detected faces as rectangles
        num_faces = bounding_box.detectMultiScale(gray_frame, scaleFactor = 1.1, minNeighbors=7)
        
        #For each detected face, a rectangle is drawn around it on the frame
        for (x, y, w, h) in num_faces: 
            cv2.rectangle(frame1, (x, y-50), (x+w, y+h+10), (255, 0, 0), 2)
            roi_gray_frame = gray_frame[y:y+h, x:x+w]
            #Region of interest resized to the expected input size for the emotion recognition model
            #Face image converted into a numpy array
            cropped_img = np.expand_dims(np.expand_dims(cv2.resize(roi_gray_frame, (48, 48)), -1), 0)
            #Face image is passed through the pre-trained emotion recognition model 
            #Predict function returns a probability distribution over different emotion classes
            prediction = emotion_model.predict(cropped_img)
            #Index of the highest probability in the prediction array is calculated to determine the predicted emotion class
            maxindex = int(np.argmax(prediction))
            #Retrieve emotion label & display (subject)
            #Corresponding emotion label is retrieved from the emotion_dict dictionary & displayed in window
            cv2.putText(frame1, emotion_dict[maxindex], (x+40, y-60), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2, cv2.LINE_AA)

            show_text[0]=maxindex

            #For debugging
            current_time_ms = time.time_ns() // 10**6
            subject_count = subject_count + 1 
            print("Current time for subject", current_time_ms, show_text[0], subject_count)

        print("flag1", flag1)

        #If frame is not returned
        if flag1 is None:
            print("Major error! Frame is not returned!")
        #If frame captured successfully 
        elif flag1 == True:
            global last_frame1
            last_frame1 = frame1.copy()
            pic = cv2.cvtColor(last_frame1, cv2.COLOR_BGR2RGB)
            img = Image.fromarray(pic)
            #Represents image element in GUI
            imgtk = ImageTk.PhotoImage(image=img)
            #Updates displayed image
            lmain.imgtk = imgtk
            lmain.configure(image=imgtk)

        # After loop release webcam to be used in this program
        #cap1.release()
        # Destroy all the windows
        cv2.destroyAllWindows()
        #Never prints bc webcam not released 
        print('webcam destroyed')

        #Once process frame, pause for a bit
        time.sleep(0.3)
        #Update window
        root.update()

        # Reset the switch_thread_event
        switch_thread_event.clear()
        switch_thread_event.set()

    print("Subject thread is finished")

<h3> 6. Create avatar function to sink subject emotions with displayed emoji</h3>

>

    def show_avatar():
    global avatar_count

    while not stop_event.is_set():
        # Wait for the switch_thread_event to be set
        if not switch_thread_event.wait(5):
            print("Avatar Timeout occurred!")
            break

        #More debugging
        emoji_index = show_text[0]
        avatar_count = avatar_count + 1 
        current_time_ms = time.time_ns() // 10**6
        print("Current time for avatar", current_time_ms, emoji_index, avatar_count)

        frame2 = cv2.imread(emoji_dist[emoji_index])
        pic2 = cv2.cvtColor(frame2, cv2.COLOR_BGR2RGB)
        img2 = Image.fromarray(pic2)
        imgtk2 = ImageTk.PhotoImage(image=img2)
        lmain2.imgtk2 = imgtk2
        lmain3.configure(text = emotion_dict[emoji_index], font = ('arial', 45, 'bold'))
        lmain2.configure(image=imgtk2)
        
        time.sleep(0.3)
        root.update()

        # Reset the switch_thread_event
        switch_thread_event.clear()
        switch_thread_event.set()
    
    print ("Avatar thread is finished")

<h3> 7. Wrapper functions to call stop and switch tread functions</h3>

>
        
    def stop_threads():
        global stop_event
        global switch_thread_event
        #Stop events, clear threads
        stop_event.set()
        switch_thread_event.clear()

    def wrapper_quit():
        stop_threads()
        print("After stop camera thread ", stop_event.is_set(), switch_thread_event.is_set())
        #Close GUI window
        root.destroy()

<h3> 8. if main & using tkInter to display frontend by placing and packing labels</h3>

>
    
#Only be executed if script run directly 
if __name__ == '__main__':
    frame_number = 0
    root = tk.Tk()
    
    #Create labels to contain images/video
    #Human video
    lmain = tk.Label(master = root, padx = 50, bd = 10)
    #Emoji
    lmain2 = tk.Label(master = root, bd = 25)
    #Quit button for entire program
    lmain3 = tk.Label(master=root, bd = 20, fg = "#CDCDCD", bg = 'blue', font=("Arial", 30))
    
    #Packing and placing in location 
    lmain.pack(side=LEFT)
    lmain.place(x = 30, y = 100)
    lmain3.pack()
    lmain3.place(x = 1000, y = 600)
    lmain2.pack(side=RIGHT)
    lmain2.place(x = 700, y = 100)

    root.title("Translating Realtime Human Facial Expressions to an Emoji using a Trained CNN")
    root.geometry("1400x900+100+10")
    root['bg'] = 'black'
    switch_thread_event.set() 
    subject_thread = threading.Thread(target = show_subject)
    avatar_thread = threading.Thread(target = show_avatar)
    #When button pressed, function specified by command parameter will be executed
    exitButton = Button(root, text = 'Quit', fg = "red", command = wrapper_quit, font = ('arial', 30, 'bold')).pack(side = TOP)

    subject_thread.start()
    avatar_thread.start()

    print("Before main loop")
    root.mainloop()





<h2><mark>#9</mark> Testing & Improving Accuracy</h2>

<b> Do a happy dance! You have now officially coded the base to start testing your final product!</b>

- Run python3 emoji.py in terminal and watch magic happen!

Some advise about <mark>improving accuracy</mark>
- Subject: Wear plain clothes (avoid graphics), Hair out of face, One person in frame at a time, ONLY use face to change expression (face detection XML file doesn't take into account hands or any other body parts)
- Adjust scalefactor and minNeighbors in emoji.py 
- Consider adding more diverse photos in data set and retraining the model 

NOTES about the model
1. Accuracy of certain facial expressions 
- Happy (Curve of mouth, Eyes are smaller) - GOOD 
- Surprised (Form an O with mouth, big eyes) - GOOD 
- Neutral (Flat line of mouth, eyebrows even, eyes not big) - OKAY 
- Sad (Mouth curved down, smaller eyes) - OKAY 
- Anger (exaggerated frustrated eyebrows and mouth curved downword) - OKAY
- Disgusted - BAD RARELY DETECTS 
- Fearful (Wide eyes, mouth open) - BAD, EASILY CONFUSED WITH SURPRISED



Extra notes

- Dense layer: Each neuron in the layer is connected to EVERY neuron in the previous layer
- Drop out Layer: Randomly deactivates neurons to prevent a model that becomes too specialized and preforms extremely well on the training data but fails to generalize and make accurate predictions on new, unseen data 
- Flatten layer: Takes complex, structured data (images) and makes its impler by converting it into a flat, 1D array. Useful when transitioning from convolutionl/pooling layers to subsequent layers to process the data as a simpler, linear sequence. Easier for NN to learn patterns and make predictions 
- Keras: Sequential vs Functional
    - Sequential (Create model layer by layer)
    - Functional (A layer can connect to any layer, much more complex)
- CNN: Simple pplication of filter to an input that results in an activation. Certain inputs and thresholds. When Input meets those thresholds there is an activation 
- Certain type of input --> repeats itself --> feature map forms
- Activation function: mathematical func that determines whether the neuron should be activated based on the input it recieves. Introduces non-linearity to the network, allowing it to learn and model complex relationships between input data and output predictions 
