# **Facial Emotion Detection**

## **Problem Definition**

**The context:**<br> *Why is this problem important to solve?*<br><br>
Affective computing or Emotion AI stands for the study and development of technologies that can read human emotions by means of analyzing body gestures, facial expressions, voice tone and so forth and react accordingly to them.
Facial Emotion Recognition (FER) is critical in the fields of human-machine interaction. Recent research has suggested that roughly 50% of communication of sentiments takes place through facial expressions and other visual cues. Hence, training a model to identify facial emotions accurately is an essential step towards the development of emotionally intelligent behaviors in machines with AI capabilities.
Some automatic facial expression recognition applications that requires human behaviors understanding include healthcare, branding exposure, customer services, and travel recommendations.

<br><br>



**The objectives:**<br> *What is the intended goal?*<br><br>

This project aims to use Deep Learning and Artificial Intelligence techniques to create a computer vision model that can accurately detect facial emotions. The model should be able to perform multi- class classification on images of facial expressions, to classify the expressions according to the associated emotion.


<br> <br>

**The key questions:** *What are the key questions that need to be answered?*<br> <br>

Accurate Facial Emotion Recognition by computer vision models remain challenging due to the heterogeneity of human faces poses and some naturalistic conditions. Thus, how we build a model with high accuracy becomes the primary purpose throughout the who experimentation.


<br> <br>

**The problem formulation:** *What are we trying to solve using data science?*
<br> <br>

Deep learning models, such as CNNs, have shown potential for being able to accurately identify emotions due to their computational efficiency and feature extraction capabilities. These benefits make them good for image classification.
We aim to conduct various experiments to explore different ways of optimizing the convolutional neural network, in order to improve its accuracy. For example, we will be trying different optimization algorithms and tuning learning rate schedulers. We found that by thoroughly tuning the model and training hyperparameters, they were able to achieve state-of-the-art results.





## **About the dataset**

The data set consists of 3 folders, i.e., 'test', 'train', and 'validation'. 
Each of these folders has four subfolders:

**‘happy’**: Images of people who have happy facial expressions.<br>
**‘sad’**: Images of people with sad or upset facial expressions.<br>
**‘surprise’**: Images of people who have shocked or surprised facial expressions.<br>
**‘neutral’**: Images of people showing no prominent emotion in their facial expression at all.<br>




## **Mounting the Drive**

**NOTE:**  Please use Google Colab from your browser for this notebook. **Google.colab is NOT a library that can be downloaded locally on your device.**

In [None]:
# Mounting the drive
from google.colab import drive
drive.mount('/content/drive')

## **Importing the Libraries**

In [None]:
import zipfile
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import os

# Importing Deep Learning Libraries

from tensorflow.keras.preprocessing.image import load_img, img_to_array
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Dense, Input, Dropout, GlobalAveragePooling2D, Flatten, Conv2D, BatchNormalization, Activation, MaxPooling2D, LeakyReLU
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam, SGD, RMSprop

### **Let us load the data**

In [None]:
# Storing the path of the data file from the Google drive
path = '/content/drive/MyDrive/content/Facial_emotion_images.zip'

# The data is provided as a zip file so we need to extract the files from the zip file
with zipfile.ZipFile(path, 'r') as zip_ref:
    zip_ref.extractall()

In [None]:
picture_size = 48
folder_path = "Facial_emotion_images/"

## **Visualizing our Classes**

Let's look at our classes. 

**Write down your observation for each class. What do you think can be a unique feature of each emotion, that separates it from the remaining classes?**

### **Happy**

In [None]:
expression = 'happy'

plt.figure(figsize= (8,8))
for i in range(1, 10, 1):
    plt.subplot(3, 3, i)

    img = load_img(folder_path + "train/" + expression + "/" +
                  os.listdir(folder_path + "train/" + expression)[i], target_size = (picture_size, picture_size))
    plt.imshow(img)   

plt.show()

**Observations and Insights:**
Muscle around the eyes tightened, “crows feet” wrinkles around the eyes, cheeks raised, lip corners raised diagonally.

### **Sad**

In [None]:
# Write your code to visualize images from the class 'sad'.

expression = 'sad'

plt.figure(figsize= (8,8))
for i in range(1, 10, 1):
    plt.subplot(3, 3, i)

    img = load_img(folder_path + "train/" + expression + "/" +
                  os.listdir(folder_path + "train/" + expression)[i], target_size = (picture_size, picture_size))
    plt.imshow(img)   

plt.show()

**Observations and Insights:**
 Drooping eyelids, downcast eyes, lowered lip corners, and slanting inner eyebrows have an arresting effect on observers. However, the social functions of sad expressions are not well understood.

### **Neutral**

In [None]:
# Write your code to visualize images from the class 'neutral'.

expression = 'neutral'

plt.figure(figsize= (8,8))
for i in range(1, 10, 1):
    plt.subplot(3, 3, i)

    img = load_img(folder_path + "train/" + expression + "/" +
                  os.listdir(folder_path + "train/" + expression)[i], target_size = (picture_size, picture_size))
    plt.imshow(img)   

plt.show()

**Observations and Insights:**
A neutral face is a blank expression that implies a lack of perceptible emotion. Most of the time an emotionless face is defined by straight-lined mouths, unfocused eyes, and slack cheeks. Though it communicates negativity to some, others see it as a reflection of calmness.

### **Surprised**

In [None]:
expression = 'surprise'

plt.figure(figsize= (8,8))
for i in range(1, 10, 1):
    plt.subplot(3, 3, i)

    img = load_img(folder_path + "train/" + expression + "/" +
                  os.listdir(folder_path + "train/" + expression)[i], target_size = (picture_size, picture_size))
    plt.imshow(img)   

plt.show()

**Observations and Insights:**
In surprise, our eyes are wide open, eyebrows are raised, and jaws drop open.

## **Checking Distribution of Classes**

In [None]:
# Getting count of images in each folder within our training path
num_happy = len(os.listdir(folder_path + "train/happy"))
print("Number of images in the class 'happy':", num_happy)


# Write the code to get the number of training images from the class 'sad'.
num_sad = len(os.listdir(folder_path+ "train/sad"))
print("Number of images in the class 'sad':", num_sad)

# Write the code to get the number of training images from the class 'neutral'.
num_neutral = len(os.listdir(folder_path+ "train/neutral"))
print("Number of images in the class 'neutral':", num_neutral)

# Write the code to get the number of training images from the class 'surprise'.
num_surprise = len(os.listdir(folder_path+ "train/surprise"))
print("Number of images in the class 'surprise':", num_surprise)

In [None]:
# Code to plot histogram
plt.figure(figsize = (10, 5))

data = {'Happy': num_happy, 'Sad': num_sad, 'Neutral': num_neutral, 'Surprise' : num_surprise}

df = pd.Series(data)

plt.bar(range(len(df)), df.values, align = 'center')

plt.xticks(range(len(df)), df.index.values, size = 'small')

plt.show()

**Observations and Insights:**

By checking the distribution of four classes, we observe that the frequency for three categories "Happy", "Sad", and "Neutral" is roughly distributed evenly with approximately 4000 samples. However, the number for the category "SURPRISE" is slightly less than the others with only around 3000 samples.This further implies that the imbalance issue is not particularly serious since the distribution across four categories is not drastically uneven.

## **Data Augmentation: Creating our Data Loaders**

In this section, we are creating data loaders that we will use as inputs to our Neural Network. A sample of the required code has been given with respect to the training data. Please create the data loaders for validation and test set accordingly.

**You have two options for the color_mode. You can set it to color_mode = 'rgb' or color_mode = 'grayscale'. You will need to try out both and see for yourself which one gives better performance.**

In [None]:
batch_size  = 32
img_size = 48

datagen_train = ImageDataGenerator(horizontal_flip = True,
                                    brightness_range = (0., 2.),
                                    rescale = 1./255,
                                    shear_range = 0.3)

train_set = datagen_train.flow_from_directory(folder_path + "train",
                                              target_size = (img_size, img_size),
                                              color_mode = 'grayscale',
                                              batch_size = batch_size,
                                              class_mode = 'categorical',
                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                              shuffle = True)


datagen_validation = ImageDataGenerator(rescale=1./255)

validation_set = datagen_validation.flow_from_directory(folder_path + "validation",
                                              target_size = (img_size, img_size),
                                              color_mode = 'grayscale',
                                              batch_size = batch_size,
                                              class_mode = 'categorical',
                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                              shuffle = True)

datagen_test = ImageDataGenerator(rescale=1./255)


test_set = datagen_validation.flow_from_directory(folder_path + "test",
                                              target_size = (img_size, img_size),
                                              color_mode = 'grayscale',
                                              batch_size = batch_size,
                                              class_mode = 'categorical',
                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                              shuffle = True)

## **Model Building**

**Think About It:**
* *Are Convolutional Neural Networks the right approach? Should we have gone with Artificial Neural Networks instead?*

ANNs and CNNs are both deep learning models. 
For image datasets, the Convolutional layers work better than fully-connected layers because this kind of filters have the features of locality, which can result in translation invariance while detecting images.
However, ANN doesn't have this filter, which means ANN models could potentially see the same feature in differnt positions or from different angles as the differnt things.

<br><br>
*  *What are the advantages of CNNs over ANNs and are they applicable here?*


The greatest benefit of CNN is that we can think of each filter as one particular information dectector. A filter/detector in CNN, as an Object detection algorithm, is used to detect features from all over the image.Therefore, it looks to extract the information from the entire image and is not restricted to a particular region.



### **Creating the Base Neural Network**

Our Base Neural network will be a fairly simple model architecture.

* We want our Base Neural Network architecture to have 3 convolutional blocks.
* Each convolutional block must contain one Conv2D layer followed by a maxpooling layer and one Dropout layer. We can play around with the dropout ratio.
* Add first Conv2D layer with **64 filters** and a **kernel size of 2**. Use the 'same' padding and provide the **input_shape = (48, 48, 3) if you are using 'rgb' color mode in your dataloader or else input shape = (48, 48, 1) if you're using 'grayscale' colormode**. Use **'relu' activation**.
* Add MaxPooling2D layer with **pool size = 2**.
* Add a Dropout layer with a dropout ratio of 0.2.
* Add a second Conv2D layer with **32 filters** and a **kernel size of 2**. Use the **'same' padding** and **'relu' activation.**
* Follow this up with a similar Maxpooling2D layer like above and a Dropout layer with 0.2 Dropout ratio to complete your second Convolutional Block.
* Add a third Conv2D layer with **32 filters** and a **kernel size of 2**. Use the **'same' padding** and **'relu' activation.** Once again, follow it up with a Maxpooling2D layer and a Dropout layer to complete your third Convolutional block.
* After adding your convolutional blocks, add your Flatten layer.
* Add your first Dense layer with **512 neurons**. Use **'relu' activation function**.
* Add a Dropout layer with dropout ratio of 0.4.
* Add your final Dense Layer with 4 neurons and **'softmax' activation function**
* Print your model summary


In [None]:
# Initializing a Sequential Model
model1 = Sequential()

# Add the first Convolutional block
model1.add(Conv2D(filters=64,kernel_size=(2,2),activation="relu",padding="same",input_shape=(48,48,1)))
model1.add(MaxPooling2D(pool_size=(2,2)))

model1.add(Dropout(0.2))

# Add the second Convolutional block
model1.add(Conv2D(filters=32,kernel_size=(2,2),activation="relu",padding="same"))
model1.add(MaxPooling2D(pool_size=2))
model1.add(Dropout(0.2))

# Add the third Convolutional block
model1.add(Conv2D(filters=32,kernel_size=2,activation="relu",padding="same"))
model1.add(MaxPooling2D(pool_size=2))
model1.add(Dropout(0.2))


# Add the Flatten layer
model1.add(Flatten())

# Add the first Dense layer
model1.add(Dense(512, activation="relu"))
model1.add(Dropout(0.4))

# Add the Final layer
model1.add(Dense(4, activation="softmax"))

model1.summary()

### **Compiling and Training the Model**

In [None]:
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

checkpoint = ModelCheckpoint("./model1.h5", monitor='val_acc', verbose=1, save_best_only=True, mode='max')

early_stopping = EarlyStopping(monitor = 'val_loss',
                          min_delta = 0,
                          patience = 3,
                          verbose = 1,
                          restore_best_weights = True
                          )

reduce_learningrate = ReduceLROnPlateau(monitor = 'val_loss',
                              factor = 0.2,
                              patience = 3,
                              verbose = 1,
                              min_delta = 0.0001)

callbacks_list = [early_stopping, checkpoint, reduce_learningrate]

epochs = 20

In [None]:
# Write your code to compile your model1. Use categorical crossentropy as your loss function, Adam Optimizer with 0.001 learning rate, and set your metrics to 'accuracy'. 
from tensorflow.keras.losses import categorical_crossentropy

model1.compile(loss=categorical_crossentropy, optimizer=Adam(learning_rate=0.001),metrics=["accuracy"])






In [None]:
# Write your code to fit your model1. Use train_set as your training data and validation_set as your validation data. Train your model for 20 epochs.
import time
start=time.time()
history = model1.fit(train_set,
                     validation_data=validation_set,
                     epochs=20)

stop=time.time()
time=str(start-stop)
print("model1 requires"+time+"training time")

In [None]:
# Plotting the accuracies


list_ep = [i for i in range(1, 21)]

plt.figure(figsize = (8, 8))

plt.plot(list_ep, history.history['accuracy'], ls = '--', label = 'accuracy')

plt.plot(list_ep, history.history['val_accuracy'], ls = '--', label = 'val_accuracy')

plt.ylabel('Accuracy')

plt.xlabel('Epochs')

plt.legend()

plt.show()

In [None]:
# Plotting the loss

list_ep = [i for i in range(1, 21)]

plt.figure(figsize = (8, 8))

plt.plot(list_ep, history.history['loss'], ls = '--', label = 'loss')

plt.plot(list_ep, history.history['val_loss'], ls = '--', label = 'val_loss')

plt.ylabel('Loss')

plt.xlabel('Epochs')

plt.legend()

plt.show()

**Observations and Insights:**
The epochs history shows that accuracy gradually increases on both training and validation set. However, the overall performance is poor. Additionally, it is a bit rare that the validation accuracy exceed the training accuracy. 

One possible reason is that the selection of the validation and train data. In this scenario, we don't jnow whether theses two sets selected randomly. It is generally better to have these sets selected randomly from the overall dataset. That way the propability distribution in validation set will closely match the distribution of the training set.


The other reason could be the effect of dropout layers. The design for dropout layers is to reduce the problem of overfitting. However, dropout layers in this case actively reduce the accuracy of the training model while they are not active in reducing the validation accuracy. In this case, the overfitting problem doesn't exist, but the underfitting problem appear due to the overuse of dropout layers.  

### **Evaluating the Model on the Test Set**

In [None]:
model1.evaluate(test_set)

In [None]:
# Plot the confusion matrix and generate a classification report for the model
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
test_set = datagen_test.flow_from_directory(folder_path + "test",
                                                              target_size = (img_size,img_size),
                                                              color_mode = 'grayscale',
                                                              batch_size = 128,
                                                              class_mode = 'categorical',
                                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                                              shuffle = True) 
test_images, test_labels = next(test_set)

# Write the name of your chosen model in the blank
pred = model1.predict(test_images)
pred = np.argmax(pred, axis = 1) 
y_true = np.argmax(test_labels, axis = 1)

# Printing the classification report
print(classification_report(y_true, pred))

# Plotting the heatmap using confusion matrix
cm = confusion_matrix(y_true, pred)

cmn = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
plt.figure(figsize = (8, 5))
sns.heatmap(cmn, annot = True,  fmt = '.2f', xticklabels = ['happy', 'sad', 'neutral', 'surprise'], yticklabels = ['happy', 'sad', 'neutral', 'surprise'])
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()

### **Creating the second Convolutional Neural Network**

In the second Neural network, we will add a few more Convolutional blocks. We will also use Batch Normalization layers.

* This time, each Convolutional block will have 1 Conv2D layer, followed by a BatchNormalization, LeakuRelU, and a MaxPooling2D layer. We are not adding any Dropout layer this time.
* Add first Conv2D layer with **256 filters** and a **kernel size of 2**. Use the 'same' padding and provide the **input_shape = (48, 48, 3) if you are using 'rgb' color mode in your dataloader or else input shape = (48, 48, 1) if you're using 'grayscale' colormode**. Use **'relu' activation**.
* Add your BatchNormalization layer followed by a LeakyRelU layer with Leaky ReLU parameter of **0.1**
* Add MaxPooling2D layer with **pool size = 2**.
* Add a second Conv2D layer with **128 filters** and a **kernel size of 2**. Use the **'same' padding** and **'relu' activation.**
* Follow this up with a similar BatchNormalization, LeakyRelU, and Maxpooling2D layer like above to complete your second Convolutional Block.
* Add a third Conv2D layer with **64 filters** and a **kernel size of 2**. Use the **'same' padding** and **'relu' activation.** Once again, follow it up with a BatchNormalization, LeakyRelU, and Maxpooling2D layer to complete your third Convolutional block.
* Add a fourth block, with the Conv2D layer having **32 filters**.
* After adding your convolutional blocks, add your Flatten layer.
* Add your first Dense layer with **512 neurons**. Use **'relu' activation function**.
* Add the second Dense Layer with **128 neurons** and use **'relu' activation** function.
* Add your final Dense Layer with 4 neurons and **'softmax' activation function**
* Print your model summary

In [None]:
# Creating sequential model
model2 = Sequential()
 
# Add the first Convolutional block
model2.add(Conv2D(filters=256,kernel_size=(2,2),padding="same", activation="relu", input_shape=(48,48,1)))
model2.add(BatchNormalization())
model2.add(LeakyReLU(0.1))
model2.add(MaxPooling2D(pool_size=(2,2)))


# Add the second Convolutional block
model2.add(Conv2D(filters=128,kernel_size=(2,2),padding="same", activation="relu"))
model2.add(BatchNormalization())
model2.add(LeakyReLU(0.1))
model2.add(MaxPooling2D(pool_size=(2,2)))


# Add the third Convolutional block
model2.add(Conv2D(filters=64,kernel_size=(2,2),padding="same", activation="relu"))
model2.add(BatchNormalization())
model2.add(LeakyReLU(0.1))
model2.add(MaxPooling2D(pool_size=(2,2)))


# Add the fourth Convolutional block
model2.add(Conv2D(filters=32,kernel_size=(2,2),padding="same"))

# Add the Flatten layer
model2.add(Flatten())

# Adding the Dense layers
model2.add(Dense(512, activation="relu"))
model2.add(Dense(128, activation="relu"))
model2.add(Dense(4, activation="softmax"))

model2.summary()

### **Compiling and Training the Model**



In [None]:
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

checkpoint = ModelCheckpoint("./model2.h5", monitor='val_loss', verbose = 1, save_best_only = True, mode = 'max')

early_stopping = EarlyStopping(monitor="val_loss",
                               min_delta=0,
                               patience=3,
                               verbose=1,
                               restore_best_weights=True
                               ) 
# Write your code here. You may play around with the hyperparameters if you wish.

reduce_learningrate = ReduceLROnPlateau(monitor="val_loss",
                                        factor=0.2,
                                        patience=3,
                                        verbose=1,
                                        min_delta=0.0001)

# Write your code here. You may play around with the hyperparameters if you wish.

callbacks_list = [early_stopping, checkpoint, reduce_learningrate]

epochs = 20

In [None]:
# Write your code to compile your model2. Use categorical crossentropy as the loss function, Adam Optimizer with 0.001 learning rate, and set metrics as 'accuracy'. 

from tensorflow.keras.losses import categorical_crossentropy


model2.compile(loss=categorical_crossentropy, optimizer=Adam(learning_rate=0.001), metrics=["accuracy"])

In [None]:
import time
start=time.time()
history = model2.fit(train_set,
                     validation_data=validation_set,
                     epochs=20)
stop=time.time()
time=str(stop-start)

print("model2 requires " +time+ " training time")

In [None]:
list_ep = [i for i in range(1, 21)]

plt.figure(figsize = (8, 8))

plt.plot(list_ep, history.history['accuracy'], ls = '--', label = 'accuracy')

plt.plot(list_ep, history.history['val_accuracy'], ls = '--', label = 'val_accuracy')

plt.ylabel('Accuracy')

plt.xlabel('Epochs')

plt.legend()

plt.show()

### **Evaluating the Model on the Test Set**

In [None]:
model2.evaluate(test_set)

In [None]:
# Plot the confusion matrix and generate a classification report for the model
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
test_set = datagen_test.flow_from_directory(folder_path + "test",
                                                              target_size = (img_size,img_size),
                                                              color_mode = 'grayscale',
                                                              batch_size = 128,
                                                              class_mode = 'categorical',
                                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                                              shuffle = True) 
test_images, test_labels = next(test_set)

# Write the name of your chosen model in the blank
pred = model2.predict(test_images)
pred = np.argmax(pred, axis = 1) 
y_true = np.argmax(test_labels, axis = 1)

# Printing the classification report
print(classification_report(y_true, pred))

# Plotting the heatmap using confusion matrix
cm = confusion_matrix(y_true, pred)

cmn = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
plt.figure(figsize = (8, 5))
sns.heatmap(cmn, annot = True,  fmt = '.2f', xticklabels = ['happy', 'sad', 'neutral', 'surprise'], yticklabels = ['happy', 'sad', 'neutral', 'surprise'])
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()


### <u>**Proposed Approach**</u>

- **Potential techniques:** What different techniques should be explored ?<br>

For image classification problem, we will explore supervised machine learning models to train. Specifically, we intend to use Deep learning CNN architecture to solve classification problem and extract features.
Additionally, it is generally hard to collect many images and then train CNNs. In that case, we can take advantage of data Augmentation/ image data generator since CNNs have the property of translational invariance
Most importantly, we will try different techniques that can be used to improve the performance of a machine learning model. The Sigmoid activation function is a mathematical equation that is used to determine the output of a neural network. It has been replaced by the Rectified Linear Unit (ReLU) activation to help speed up training and avoid problems with the gradient. Pooling is a technique used to reduce the size of the input data and help the model generalize better. Dropout, regularization, and data augmentation are all techniques used to prevent the model from overfitting, which is when the model performs well on the training data but not on new data.Batch
normalization is a technique used to help prevent the gradient from vanishing or exploding, which can cause the model to not learn properly.


<br>

- **Overall solution design:** What is the potential solution design?<br>


In this experiment, we will be running two variations of CNN models to compare their performance on the dataset.Also, this experiment will be conducted in 3 stages.
The major difference of two models is complexity in their depth and their width. Another difference is their approach to deal with gradient descent. One is a base CNN model with less filters and some Dropout layers to reduce overfitting problems while the other model is more sophisticated with more filters for each convolutional layer followed by one additional Dense layer. That is, the second neural network is deeper and wider, which means the model has more neurons to train. Moreover, as the second model get deeper, the model adopt Batch Normalization to accelerate and stabilize deep learning training. The second model also has LeakyReLU as an improved ReLU activation.
The process of experimentation consists of 5 steps. The first step is data pre-processing, followed by data augmentation to increase diversity of the data. Secondly, we will train two different models. Thirdly, these two models will be compiled by the same optimizer using Adam and the same loss function of cross entropy.



<br>

- **Measures of success:** What are the key measures of success to compare different techniques?<br>

Aside from plotting accuracy for both training and validation dataset, the measures of success in image classification depends on the rate for misclassification in test data. Here, we will introduce confusion matrix. Under this hood, precision, recall, f1-score, and so forth will determine which model works better in terms of misclassification problem.


## **Transfer Learning Architectures**

In this section, we will create several Transfer Learning architectures. For the pre-trained models, we will select three popular architectures namely, VGG16, ResNet v2, and Efficient Net. The difference between these architectures and the previous architectures is that these will require 3 input channels while the earlier ones worked on 'grayscale' images. Therefore, we need to create new DataLoaders.

### **Creating our Data Loaders for Transfer Learning Architectures**

In this section, we are creating data loaders that we will use as inputs to our Neural Network. Unlike in Milestone 1, we will have to go with color_mode = 'rgb' as this is the required format for the transfer learning architectures.

In [None]:
batch_size  = 32
img_size = 48

datagen_train = ImageDataGenerator(horizontal_flip = True,
                                    brightness_range = (0., 2.),
                                    rescale = 1./255,
                                    shear_range = 0.3)

train_set = datagen_train.flow_from_directory(folder_path + "train",
                                              target_size = (img_size, img_size),
                                              color_mode = 'rgb',
                                              batch_size = batch_size,
                                              class_mode = 'categorical',
                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                              shuffle = True)

datagen_validation = ImageDataGenerator(rescale=1./255)


validation_set = datagen_validation.flow_from_directory(folder_path + "validation",
                                              target_size = (img_size, img_size),
                                              color_mode = 'rgb',
                                              batch_size = batch_size,
                                              class_mode = 'categorical',
                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                              shuffle = True)

datagen_test = ImageDataGenerator(rescale=1./255)

test_set = datagen_test.flow_from_directory(folder_path + "test",
                                              target_size = (img_size, img_size),
                                              color_mode = 'rgb',
                                              batch_size = batch_size,
                                              class_mode = 'categorical',
                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                              shuffle = True)

## **VGG16 Model**

In [None]:
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras import Model

vgg = VGG16(include_top = False, weights = 'imagenet', input_shape = (48, 48, 3))
vgg.summary()

### **Model Building**

* In this model, we will import till the **'block5_pool'** layer of the VGG16 model. You can scroll down in the model summary and look for 'block5_pool'. You can choose any other layer as well.
* Then we will add a Flatten layer, which receives the output of the 'block5_pool' layer as its input.
* We will add a few Dense layers and use 'relu' activation function on them.
* You may use Dropout and BatchNormalization layers as well.
* Then we will add our last dense layer, which must have 4 neurons and a 'softmax' activation function.

In [None]:
transfer_layer = vgg.get_layer('block5_pool')
vgg.trainable = False
number=0
for layer in vgg.layers:
    #print(layer.name, layer.trainable)
    number+=1
print(number)

In [None]:
transfer_layer = vgg.get_layer('block5_pool')
vgg.trainable = False

# Add classification layers on top of it  
x=Sequential()
x.add(vgg)



# Flattenning the output from the 3rd block of the VGG16 model
x = Flatten()(transfer_layer.output)



# Adding a Dense layer with 256 neurons
x = Dense(256, activation = 'relu')(x)

# Add a Dense Layer with 128 neurons
x =Dense(128, activation="relu")(x)

# Add a DropOut layer with Drop out ratio of 0.3
x =Dropout(0.3)(x)

# Add a Dense Layer with 64 neurons
x =Dense(64, activation="relu")(x)

# Add a Batch Normalization layer
x =BatchNormalization()(x)

# Adding the final dense layer with 4 neurons and use 'softmax' activation
pred = Dense(4, activation='softmax')(x)

vggmodel = Model(vgg.input, pred) # Initializing the model

### **Compiling and Training the VGG16 Model**

In [None]:
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

checkpoint = ModelCheckpoint("./vggmodel.h5", monitor = 'val_loss', verbose = 1, save_best_only = True, mode = 'max')

early_stopping = EarlyStopping(monitor = 'val_loss',
                          min_delta = 0,
                          patience = 3,
                          verbose = 1,
                          restore_best_weights = True
                          )

reduce_learningrate = ReduceLROnPlateau(monitor = 'val_loss',
                              factor = 0.2,
                              patience = 3,
                              verbose = 1,
                              min_delta = 0.0001)

callbacks_list = [early_stopping, checkpoint, reduce_learningrate]

epochs = 20

In [None]:
# Write your code to compile the vggmodel. Use categorical crossentropy as the loss function, Adam Optimizer with 0.001 learning rate, and set metrics to 'accuracy'. 
vggmodel.compile(optimizer=Adam(learning_rate=0.001), loss="categorical_crossentropy", metrics=["accuracy"])

vggmodel.summary()

In [None]:
# Write your code to fit your model. Use train_set as the training data and validation_set as the validation data. Train the model for 20 epochs.
import time
start=time.time()
history = vggmodel.fit(train_set,
                       validation_data=validation_set,
                       epochs=20)
stop=time.time()

time=str(stop-start)

print("VGG16 model requires " +time+ " training time")

In [None]:
list_ep = [i for i in range(1, 21)]

plt.figure(figsize = (8, 8))

plt.plot(list_ep, history.history['accuracy'], ls = '--', label = 'accuracy')

plt.plot(list_ep, history.history['val_accuracy'], ls = '--', label = 'val_accuracy')

plt.ylabel('Accuracy')

plt.xlabel('Epochs')

plt.legend()

plt.show()

In [None]:
list_ep = [i for i in range(1, 21)]

plt.figure(figsize = (8, 8))

plt.plot(list_ep, history.history['loss'], ls = '--', label = 'loss')

plt.plot(list_ep, history.history['val_loss'], ls = '--', label = 'val_loss')

plt.ylabel('Loss')

plt.xlabel('Epochs')

plt.legend()

plt.show()

### **Evaluating the VGG16 model**

In [None]:
# Write your code to evaluate model performance on the test set
vggmodel.evaluate(test_set)

In [None]:
# Plot the confusion matrix and generate a classification report for the model
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
test_set = datagen_test.flow_from_directory(folder_path + "test",
                                                              target_size = (img_size,img_size),
                                                              color_mode = 'rgb',
                                                              batch_size = 128,
                                                              class_mode = 'categorical',
                                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                                              shuffle = True) 
test_images, test_labels = next(test_set)

# Write the name of your chosen model in the blank
pred = vggmodel.predict(test_images)
pred = np.argmax(pred, axis = 1) 
y_true = np.argmax(test_labels, axis = 1)

# Printing the classification report
print(classification_report(y_true, pred))

# Plotting the heatmap using confusion matrix
cm = confusion_matrix(y_true, pred)

cmn = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
plt.figure(figsize = (8, 5))
sns.heatmap(cmn, annot = True,  fmt = '.2f', xticklabels = ['happy', 'sad', 'neutral', 'surprise'], yticklabels = ['happy', 'sad', 'neutral', 'surprise'])
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()


**Observations and Insights:**

Even though training and validation accuracy are  consistently improving as we run more epoch, both accuracy still seems low.  The test accuracy achieved 0.57.

## **ResNet V2 Model**

In [None]:
import tensorflow as tf
import tensorflow.keras.applications as ap
from tensorflow.keras import Model

Resnet = ap.ResNet101(include_top = False, weights = "imagenet", input_shape=(48,48,3))
Resnet.summary()

### **Model Building**

In [None]:
transfer_layer_Resnet = Resnet.get_layer('conv5_block3_add')
Resnet.trainable=False
number=0
for layer in Resnet.layers:
    #print(layer.name, layer.trainable)
    number+=1
number=str(number)
print("Resnet has " +number+ "layers")

In [None]:
transfer_layer_Resnet = Resnet.get_layer('conv5_block3_add')
Resnet.trainable=False

# Add classification layers on top of it
x=Sequential()
x.add(Resnet)
# Flattenning the output from the 3rd block of the VGG16 model
x = Flatten()(transfer_layer_Resnet.output)

# Add a Dense layer with 256 neurons
x = Dense(256, activation = 'relu')(x)

# Add a Dense Layer with 128 neurons
x =Dense(128, activation="relu")(x)

# Add a DropOut layer with Drop out ratio of 0.3
x= Dropout(0.3)(x)

# Add a Dense Layer with 64 neurons
x =Dense(64, activation="relu")(x)

# Add a Batch Normalization layer
x= BatchNormalization()(x)

# Add the final dense layer with 4 neurons and use a 'softmax' activation
pred = Dense(4, activation = 'softmax')(x)

resnetmodel = Model(Resnet.input, pred) # Initializing the model

### **Compiling and Training the Model**

In [None]:
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

checkpoint = ModelCheckpoint("./Resnetmodel.h5", monitor = 'val_acc', verbose = 1, save_best_only = True, mode = 'max')

# Write your code here. You may play around with the hyperparameters if you wish.

early_stopping = EarlyStopping(monitor = 'val_loss',
                          min_delta = 0,
                          patience = 3,
                          verbose = 1,
                          restore_best_weights = True
                          )

# Write your code here. You may play around with the hyperparameters if you wish.

reduce_learningrate =  ReduceLROnPlateau(monitor = 'val_loss',
                              factor = 0.2,
                              patience = 3,
                              verbose = 1,
                              min_delta = 0.0001)


callbacks_list = [early_stopping, checkpoint, reduce_learningrate]

epochs = 20

In [None]:
# Write your code to compile your resnetmodel. Use categorical crossentropy as your loss function, Adam Optimizer with 0.001 learning rate, and set your metrics to 'accuracy'. 

resnetmodel.compile(optimizer=Adam(learning_rate=0.001), loss="categorical_crossentropy", metrics=["accuracy"])

resnetmodel.summary()

In [None]:
 # Write your code to fit your model. Use train_set as your training data and validation_set as your validation data. Train your model for 20 epochs.

# Write your code to fit your model. Use train_set as your training data and validation_set as your validation data. Train your model for 20 epochs.
import time
start=time.time()
history = resnetmodel.fit(train_set,
                       validation_data=validation_set,
                       epochs=20)
stop=time.time()
time=str(stop-start)

print("ResNet101 model requires " +time+ " training time")


In [None]:
list_ep = [i for i in range(1, 21)]

plt.figure(figsize = (8, 8))

plt.plot(list_ep, history.history['accuracy'], ls = '--', label = 'accuracy')

plt.plot(list_ep, history.history['val_accuracy'], ls = '--', label = 'val_accuracy')

plt.ylabel('Accuracy')

plt.xlabel('Epochs')

plt.legend()

plt.show()

In [None]:
list_ep = [i for i in range(1, 21)]

plt.figure(figsize = (8, 8))

plt.plot(list_ep, history.history['loss'], ls = '--', label = 'loss')

plt.plot(list_ep, history.history['val_loss'], ls = '--', label = 'val_loss')

plt.ylabel('Loss')

plt.xlabel('Epochs')

plt.legend()

plt.show()

### **Evaluating the ResNet Model**

In [None]:
# Write your code to evaluate model performance on the test set
resnetmodel.evaluate(test_set)

In [None]:
# Plot the confusion matrix and generate a classification report for the model
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
test_set = datagen_test.flow_from_directory(folder_path + "test",
                                                              target_size = (img_size,img_size),
                                                              color_mode = 'rgb',
                                                              batch_size = 128,
                                                              class_mode = 'categorical',
                                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                                              shuffle = True) 
test_images, test_labels = next(test_set)

# Write the name of your chosen model in the blank
pred = resnetmodel.predict(test_images)
pred = np.argmax(pred, axis = 1) 
y_true = np.argmax(test_labels, axis = 1)

# Printing the classification report
print(classification_report(y_true, pred))

# Plotting the heatmap using confusion matrix
cm = confusion_matrix(y_true, pred)

cmn = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
plt.figure(figsize = (8, 5))
sns.heatmap(cmn, annot = True,  fmt = '.2f', xticklabels = ['happy', 'sad', 'neutral', 'surprise'], yticklabels = ['happy', 'sad', 'neutral', 'surprise'])
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()


## **EfficientNet Model**

In [None]:
import tensorflow as tf
import tensorflow.keras.applications as ap
from tensorflow.keras import Model
EfficientNet = ap.EfficientNetV2B2(include_top=False,weights="imagenet", input_shape= (48, 48, 3))

EfficientNet.summary()

### **Model Building**

**Build your own Architecture on top of the transfer layer. Be sure to have a Flatten layer after your transfer layer and also make sure you have 4 neurons and softmax activation function in your last dense layer**

In [None]:
transfer_layer_EfficientNet = EfficientNet.get_layer('block6e_expand_activation')
EfficientNet.trainable = False

x=Sequential()
x.add(transfer_layer_EfficientNet)
# Add your Flatten layer.
x=Flatten()(transfer_layer_EfficientNet.output)

# Add your Dense layers and/or BatchNormalization and Dropout layers
x=Dense(256, activation="relu")(x)
x=BatchNormalization()(x)

x=Dense(128, activation="relu")(x)
x=BatchNormalization()(x)
# Add your final Dense layer with 4 neurons and softmax activation function.

pred=Dense(4, activation="softmax")(x)

Efficientnetmodel = Model(EfficientNet.input, pred)

### **Compiling and Training the Model**

In [None]:
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

checkpoint = ModelCheckpoint("./Efficientnetmodel.h5", monitor='val_acc', verbose=1, save_best_only=True, mode='max')

early_stopping = EarlyStopping(monitor = 'val_loss',
                          min_delta = 0,
                          patience = 3,
                          verbose = 1,
                          restore_best_weights = True
                          )

reduce_learningrate = ReduceLROnPlateau(monitor = 'val_loss',
                              factor = 0.2,
                              patience = 3,
                              verbose = 1,
                              min_delta = 0.0001)
callbacks_list = [early_stopping,checkpoint,reduce_learningrate]

epochs = 10

In [None]:
# Write your code to compile your Efficientnetmodel. Use categorical crossentropy as your loss function, Adam Optimizer with 0.001 learning rate, and set your metrics to 'accuracy'.

Efficientnetmodel.compile(optimizer=Adam(learning_rate=0.001), loss="categorical_crossentropy", metrics=["accuracy"])

Efficientnetmodel.summary()

In [None]:
# Write your code to fit your model. Use train_set as your training data and validation_set as your validation data. Train your model for 20 epochs.
import time
start=time.time()
history = Efficientnetmodel.fit(train_set,
                       validation_data=validation_set,
                       epochs=20)
stop=time.time()
time=str(stop-start)

print("EfficientNet model requires " +time+ " training time")


In [None]:
list_ep = [i for i in range(1, 21)]

plt.figure(figsize = (8, 8))

plt.plot(list_ep, history.history['accuracy'], ls = '--', label = 'accuracy')

plt.plot(list_ep, history.history['val_accuracy'], ls = '--', label = 'val_accuracy')

plt.ylabel('Accuracy')

plt.xlabel('Epochs')

plt.legend()

plt.show()

In [None]:
list_ep = [i for i in range(1, 21)]

plt.figure(figsize = (8, 8))

plt.plot(list_ep, history.history['loss'], ls = '--', label = 'loss')

plt.plot(list_ep, history.history['val_loss'], ls = '--', label = 'val_loss')

plt.ylabel('Loss')

plt.xlabel('Epochs')

plt.legend()

plt.show()

### **Evaluating the EfficientnetNet Model**

In [None]:
# Write your code to evaluate the model performance on the test set

Efficientnetmodel.evaluate(test_set)

In [None]:
# Plot the confusion matrix and generate a classification report for the model
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
test_set = datagen_test.flow_from_directory(folder_path + "test",
                                                              target_size = (img_size,img_size),
                                                              color_mode = 'rgb',
                                                              batch_size = 128,
                                                              class_mode = 'categorical',
                                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                                              shuffle = True) 
test_images, test_labels = next(test_set)

# Write the name of your chosen model in the blank
pred = resnetmodel.predict(test_images)
pred = np.argmax(pred, axis = 1) 
y_true = np.argmax(test_labels, axis = 1)

# Printing the classification report
print(classification_report(y_true, pred))

# Plotting the heatmap using confusion matrix
cm = confusion_matrix(y_true, pred)

cmn = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
plt.figure(figsize = (8, 5))
sns.heatmap(cmn, annot = True,  fmt = '.2f', xticklabels = ['happy', 'sad', 'neutral', 'surprise'], yticklabels = ['happy', 'sad', 'neutral', 'surprise'])
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()


## **Building a Complex Neural Network Architecture**

In this section, we will build a more complex Convolutional Neural Network Model that has close to as many parameters as we had in our Transfer Learning Models. However, we will have only 1 input channel for our input images.

## **Creating our Data Loaders**

In this section, we are creating data loaders which we will use as inputs to the more Complicated Convolutional Neural Network. We will go ahead with color_mode = 'grayscale'.

In [None]:
batch_size  = 32
img_size = 48

datagen_train = ImageDataGenerator(horizontal_flip = True,
                                    brightness_range = (0., 2.),
                                    rescale = 1./255,
                                    shear_range = 0.3)

train_set = datagen_train.flow_from_directory(folder_path + "train",
                                              target_size = (img_size, img_size),
                                              color_mode = 'grayscale',
                                              batch_size = batch_size,
                                              class_mode = 'categorical',
                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                              shuffle = True)


datagen_validation = ImageDataGenerator(rescale=1./255)

validation_set = datagen_validation.flow_from_directory(folder_path + "validation",
                                              target_size = (img_size, img_size),
                                              color_mode = 'grayscale',
                                              batch_size = batch_size,
                                              class_mode = 'categorical',
                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                              shuffle = True)

datagen_test = ImageDataGenerator(rescale=1./255)


test_set = datagen_validation.flow_from_directory(folder_path + "test",
                                              target_size = (img_size, img_size),
                                              color_mode = 'grayscale',
                                              batch_size = batch_size,
                                              class_mode = 'categorical',
                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                              shuffle = True)

### **Model Building**

* In this network, we plan to have 5 Convolutional Blocks
* Add first Conv2D layer with **64 filters** and a **kernel size of 2**. Use the 'same' padding and provide the **input shape = (48, 48, 1)**. Use **'relu' activation**.
* Add your BatchNormalization layer followed by a LeakyRelU layer with Leaky ReLU parameter of **0.1**
* Add MaxPooling2D layer with **pool size = 2**.
* Add a Dropout layer with a Dropout Ratio of **0.2**. This completes the first Convolutional block.
* Add a second Conv2D layer with **128 filters** and a **kernel size of 2**. Use the **'same' padding** and **'relu' activation.**
* Follow this up with a similar BatchNormalization, LeakyRelU, Maxpooling2D, and Dropout layer like above to complete your second Convolutional Block.
* Add a third Conv2D layer with **512 filters** and a **kernel size of 2**. Use the **'same' padding** and **'relu' activation.** Once again, follow it up with a BatchNormalization, LeakyRelU, Maxpooling2D, and Dropout layer to complete your third Convolutional block.
* Add a fourth block, with the Conv2D layer having **512 filters**.
* Add the fifth block, having **128 filters**.
* Then add your Flatten layer, followed by your Dense layers.
* Add your first Dense layer with **256 neurons** followed by a BatchNormalization layer, a **'relu'** Activation, and a Dropout layer. This forms your first Fully Connected block
* Add your second Dense layer with **512 neurons**, again followed by a BatchNormalization layer, **relu** activation, and a Dropout layer.
* Add your final Dense layer with 4 neurons.
* Compile your model with the optimizer of your choice.

In [None]:
no_of_classes = 4
  
model3 = Sequential()

# Add 1st CNN Block
model3.add(Conv2D(64, (2,2), activation='relu', input_shape=(48, 48, 1), padding = 'same'))
model3.add(BatchNormalization())
model3.add(LeakyReLU(0.1))
model3.add(MaxPooling2D(pool_size=(2,2)))
model3.add(Dropout(0.2))


# Add 2nd CNN Block
model3.add(Conv2D(128, (2,2), activation='relu', padding = 'same'))
model3.add(BatchNormalization())
model3.add(LeakyReLU(0.1))
model3.add(MaxPooling2D(pool_size=(2,2)))
model3.add(Dropout(0.2))

# Add 3rd CNN Block
model3.add(Conv2D(512, (2,2), activation='relu', padding = 'same'))
model3.add(BatchNormalization())
model3.add(LeakyReLU(0.1))
model3.add(MaxPooling2D(pool_size=(2,2)))
model3.add(Dropout(0.2))


# Add 4th CNN Block
model3.add(Conv2D(512, (2,2), activation='relu', padding = 'same'))

# Add 5th CNN Block
model3.add(Conv2D(128, (2,2), activation='relu', padding = 'same'))


model3.add(Flatten())

# First fully connected layer
model3.add(Dense(256, activation="relu"))
model3.add(BatchNormalization())
model3.add(Dropout(0.2))



# Second fully connected layer
model3.add(Dense(512, activation="relu"))
model3.add(BatchNormalization())
model3.add(Dropout(0.2))

model3.add(Dense(no_of_classes, activation = 'softmax'))

### **Compiling and Training the Model**

In [None]:
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, CSVLogger

epochs = 35

steps_per_epoch = train_set.n//train_set.batch_size
validation_steps = validation_set.n//validation_set.batch_size

checkpoint = ModelCheckpoint("model3.h5", monitor = 'val_accuracy',
                            save_weights_only = True, model = 'max', verbose = 1)

reduce_lr = ReduceLROnPlateau(monitor = 'val_loss', factor = 0.1, patience = 2, min_lr = 0.0001 , model = 'auto')

callbacks = [checkpoint, reduce_lr]

In [None]:
# Write your code to compile your model3. Use categorical crossentropy as the loss function, Adam Optimizer with 0.003 learning rate, and set metrics to 'accuracy'.
model3.compile(loss="categorical_crossentropy", optimizer=Adam(learning_rate=0.003), metrics=["accuracy"])

model3.summary()

In [None]:
# Write your code to fit your model. Use train_set as the training data and validation_set as the validation data. Train your model for 35 epochs.
import time
start=time.time()
history = model3.fit(train_set,
                     validation_data=validation_set,
                     epochs=35)

stop=time.time()
time=str(stop-start)

print("model3 requires " +time+ " training time")

In [None]:
list_ep = [i for i in range(1, 36)]

plt.figure(figsize = (8, 8))

plt.plot(list_ep, history.history['accuracy'], ls = '--', label = 'accuracy')

plt.plot(list_ep, history.history['val_accuracy'], ls = '--', label = 'val_accuracy')

plt.ylabel('Accuracy')

plt.xlabel('Epochs')

plt.title("Model 3 History")

plt.legend()

plt.show()

In [None]:
list_ep = [i for i in range(1, 36)]

plt.figure(figsize = (8, 8))

plt.plot(list_ep, history.history['loss'], ls = '--', label = 'loss')

plt.plot(list_ep, history.history['val_loss'], ls = '--', label = 'val_loss')

plt.ylabel('Loss')

plt.xlabel('Epochs')

plt.title("Model 3 History")

plt.legend()

plt.show()

### **Evaluating the Model on Test Set**

In [None]:
model3.evaluate(test_set)

In [None]:
# Plot the confusion matrix and generate a classification report for the model
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
test_set = datagen_test.flow_from_directory(folder_path + "test",
                                                              target_size = (img_size,img_size),
                                                              color_mode = 'grayscale',
                                                              batch_size = 128,
                                                              class_mode = 'categorical',
                                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                                              shuffle = True) 
test_images, test_labels = next(test_set)

# Write the name of your chosen model in the blank
pred = model3.predict(test_images)
pred = np.argmax(pred, axis = 1) 
y_true = np.argmax(test_labels, axis = 1)

# Printing the classification report
print(classification_report(y_true, pred))

# Plotting the heatmap using confusion matrix
cm = confusion_matrix(y_true, pred)

cmn = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
plt.figure(figsize = (8, 5))
sns.heatmap(cmn, annot = True,  fmt = '.2f', xticklabels = ['happy', 'sad', 'neutral', 'surprise'], yticklabels = ['happy', 'sad', 'neutral', 'surprise'])
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()


## **Hyperparameter Tuning**

### **kernel size (3,3), learning rates 0.003**

In [None]:
no_of_classes = 4
  
model3 = Sequential()

# Add 1st CNN Block
model3.add(Conv2D(64, (3,3), activation='relu', input_shape=(48, 48, 1), padding = 'same'))
model3.add(BatchNormalization())
model3.add(LeakyReLU(0.1))
model3.add(MaxPooling2D(pool_size=(2,2)))
model3.add(Dropout(0.2))


# Add 2nd CNN Block
model3.add(Conv2D(128, (3,3), activation='relu', padding = 'same'))
model3.add(BatchNormalization())
model3.add(LeakyReLU(0.1))
model3.add(MaxPooling2D(pool_size=(2,2)))
model3.add(Dropout(0.2))

# Add 3rd CNN Block
model3.add(Conv2D(512, (3,3), activation='relu', padding = 'same'))
model3.add(BatchNormalization())
model3.add(LeakyReLU(0.1))
model3.add(MaxPooling2D(pool_size=(2,2)))
model3.add(Dropout(0.2))


# Add 4th CNN Block
model3.add(Conv2D(512, (3,3), activation='relu', padding = 'same'))

# Add 5th CNN Block
model3.add(Conv2D(128, (3,3), activation='relu', padding = 'same'))


model3.add(Flatten())

# First fully connected layer
model3.add(Dense(256, activation="relu"))
model3.add(BatchNormalization())
model3.add(Dropout(0.2))



# Second fully connected layer
model3.add(Dense(512, activation="relu"))
model3.add(BatchNormalization())
model3.add(Dropout(0.2))

model3.add(Dense(no_of_classes, activation = 'softmax'))

In [None]:
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, CSVLogger

epochs = 35

steps_per_epoch = train_set.n//train_set.batch_size
validation_steps = validation_set.n//validation_set.batch_size

checkpoint = ModelCheckpoint("model3.h5", monitor = 'val_accuracy',
                            save_weights_only = True, model = 'max', verbose = 1)

reduce_lr = ReduceLROnPlateau(monitor = 'val_loss', factor = 0.1, patience = 2, min_lr = 0.0001 , model = 'auto')

callbacks = [checkpoint, reduce_lr]

In [None]:
# Write your code to compile your model3. Use categorical crossentropy as the loss function, Adam Optimizer with 0.003 learning rate, and set metrics to 'accuracy'.
model3.compile(loss="categorical_crossentropy", optimizer=Adam(learning_rate=0.003), metrics=["accuracy"])

model3.summary()


In [None]:
# Write your code to fit your model. Use train_set as the training data and validation_set as the validation data. Train your model for 35 epochs.
import time
start=time.time()
history = model3.fit(train_set,
                     validation_data=validation_set,
                     epochs=35)

stop=time.time()
time=str(stop-start)

print("model3 requires " +time+ " training time")

In [None]:
list_ep = [i for i in range(1, 36)]

plt.figure(figsize = (8, 8))

plt.plot(list_ep, history.history['accuracy'], ls = '--', label = 'accuracy')

plt.plot(list_ep, history.history['val_accuracy'], ls = '--', label = 'val_accuracy')

plt.ylabel('Accuracy')

plt.xlabel('Epochs')

plt.title("Model 3 History")

plt.legend()

plt.show()

In [None]:
list_ep = [i for i in range(1, 36)]

plt.figure(figsize = (8, 8))

plt.plot(list_ep, history.history['loss'], ls = '--', label = 'loss')

plt.plot(list_ep, history.history['val_loss'], ls = '--', label = 'val_loss')

plt.ylabel('Loss')

plt.xlabel('Epochs')

plt.title("Model 3 Training History")

plt.legend()

plt.show()

In [None]:
model3.evaluate(test_set)

In [None]:
# Plot the confusion matrix and generate a classification report for the model
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
test_set = datagen_test.flow_from_directory(folder_path + "test",
                                                              target_size = (img_size,img_size),
                                                              color_mode = 'grayscale',
                                                              batch_size = 128,
                                                              class_mode = 'categorical',
                                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                                              shuffle = True) 
test_images, test_labels = next(test_set)

# Write the name of your chosen model in the blank
pred = model3.predict(test_images)
pred = np.argmax(pred, axis = 1) 
y_true = np.argmax(test_labels, axis = 1)

# Printing the classification report
print(classification_report(y_true, pred))

# Plotting the heatmap using confusion matrix
cm = confusion_matrix(y_true, pred)

cmn = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
plt.figure(figsize = (8, 5))
sns.heatmap(cmn, annot = True,  fmt = '.2f', xticklabels = ['happy', 'sad', 'neutral', 'surprise'], yticklabels = ['happy', 'sad', 'neutral', 'surprise'])
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()


### **kernel size (3,3), learning rates 0.001**

In [None]:
no_of_classes = 4
  
model3 = Sequential()

# Add 1st CNN Block
model3.add(Conv2D(64, (3,3), activation='relu', input_shape=(48, 48, 1), padding = 'same'))
model3.add(BatchNormalization())
model3.add(LeakyReLU(0.1))
model3.add(MaxPooling2D(pool_size=(2,2)))
model3.add(Dropout(0.2))


# Add 2nd CNN Block
model3.add(Conv2D(128, (3,3), activation='relu', padding = 'same'))
model3.add(BatchNormalization())
model3.add(LeakyReLU(0.1))
model3.add(MaxPooling2D(pool_size=(2,2)))
model3.add(Dropout(0.2))

# Add 3rd CNN Block
model3.add(Conv2D(512, (3,3), activation='relu', padding = 'same'))
model3.add(BatchNormalization())
model3.add(LeakyReLU(0.1))
model3.add(MaxPooling2D(pool_size=(2,2)))
model3.add(Dropout(0.2))


# Add 4th CNN Block
model3.add(Conv2D(512, (3,3), activation='relu', padding = 'same'))

# Add 5th CNN Block
model3.add(Conv2D(128, (3,3), activation='relu', padding = 'same'))


model3.add(Flatten())

# First fully connected layer
model3.add(Dense(256, activation="relu"))
model3.add(BatchNormalization())
model3.add(Dropout(0.2))



# Second fully connected layer
model3.add(Dense(512, activation="relu"))
model3.add(BatchNormalization())
model3.add(Dropout(0.2))

model3.add(Dense(no_of_classes, activation = 'softmax'))

In [None]:
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, CSVLogger

epochs = 35

steps_per_epoch = train_set.n//train_set.batch_size
validation_steps = validation_set.n//validation_set.batch_size

checkpoint = ModelCheckpoint("model3.h5", monitor = 'val_accuracy',
                            save_weights_only = True, model = 'max', verbose = 1)

reduce_lr = ReduceLROnPlateau(monitor = 'val_loss', factor = 0.1, patience = 2, min_lr = 0.0001 , model = 'auto')

callbacks = [checkpoint, reduce_lr]

In [None]:
# Write your code to compile your model3. Use categorical crossentropy as the loss function, Adam Optimizer with 0.003 learning rate, and set metrics to 'accuracy'.
model3.compile(loss="categorical_crossentropy", optimizer=Adam(learning_rate=0.001), metrics=["accuracy"])

model3.summary()



In [None]:
# Write your code to fit your model. Use train_set as the training data and validation_set as the validation data. Train your model for 35 epochs.
import time
start=time.time()
history = model3.fit(train_set,
                     validation_data=validation_set,
                     epochs=35)

stop=time.time()
time=str(stop-start)

print("model3 requires " +time+ " training time")

In [None]:
list_ep = [i for i in range(1, 36)]

plt.figure(figsize = (8, 8))

plt.plot(list_ep, history.history['accuracy'], ls = '--', label = 'accuracy')

plt.plot(list_ep, history.history['val_accuracy'], ls = '--', label = 'val_accuracy')

plt.ylabel('Accuracy')

plt.xlabel('Epochs')

plt.title("Model 3 History")

plt.legend()

plt.show()

In [None]:
list_ep = [i for i in range(1, 36)]

plt.figure(figsize = (8, 8))

plt.plot(list_ep, history.history['loss'], ls = '--', label = 'loss')

plt.plot(list_ep, history.history['val_loss'], ls = '--', label = 'val_loss')

plt.ylabel('Loss')

plt.xlabel('Epochs')

plt.title("Model 3 Training History")

plt.legend()

plt.show()

In [None]:
model3.evaluate(test_set)

In [None]:
# Plot the confusion matrix and generate a classification report for the model
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
test_set = datagen_test.flow_from_directory(folder_path + "test",
                                                              target_size = (img_size,img_size),
                                                              color_mode = 'grayscale',
                                                              batch_size = 128,
                                                              class_mode = 'categorical',
                                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                                              shuffle = True) 
test_images, test_labels = next(test_set)

# Write the name of your chosen model in the blank
pred = model3.predict(test_images)
pred = np.argmax(pred, axis = 1) 
y_true = np.argmax(test_labels, axis = 1)

# Printing the classification report
print(classification_report(y_true, pred))

# Plotting the heatmap using confusion matrix
cm = confusion_matrix(y_true, pred)

cmn = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
plt.figure(figsize = (8, 5))
sns.heatmap(cmn, annot = True,  fmt = '.2f', xticklabels = ['happy', 'sad', 'neutral', 'surprise'], yticklabels = ['happy', 'sad', 'neutral', 'surprise'])
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()


### **kernel size (2,2), learning rates 0.001**

In [None]:
no_of_classes = 4
  
model3 = Sequential()

# Add 1st CNN Block
model3.add(Conv2D(64, (2,2), activation='relu', input_shape=(48, 48, 1), padding = 'same'))
model3.add(BatchNormalization())
model3.add(LeakyReLU(0.1))
model3.add(MaxPooling2D(pool_size=(2,2)))
model3.add(Dropout(0.2))


# Add 2nd CNN Block
model3.add(Conv2D(128, (2,2), activation='relu', padding = 'same'))
model3.add(BatchNormalization())
model3.add(LeakyReLU(0.1))
model3.add(MaxPooling2D(pool_size=(2,2)))
model3.add(Dropout(0.2))

# Add 3rd CNN Block
model3.add(Conv2D(512, (2,2), activation='relu', padding = 'same'))
model3.add(BatchNormalization())
model3.add(LeakyReLU(0.1))
model3.add(MaxPooling2D(pool_size=(2,2)))
model3.add(Dropout(0.2))


# Add 4th CNN Block
model3.add(Conv2D(512, (2,2), activation='relu', padding = 'same'))

# Add 5th CNN Block
model3.add(Conv2D(128, (2,2), activation='relu', padding = 'same'))


model3.add(Flatten())

# First fully connected layer
model3.add(Dense(256, activation="relu"))
model3.add(BatchNormalization())
model3.add(Dropout(0.2))



# Second fully connected layer
model3.add(Dense(512, activation="relu"))
model3.add(BatchNormalization())
model3.add(Dropout(0.2))

model3.add(Dense(no_of_classes, activation = 'softmax'))

In [None]:
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, CSVLogger

epochs = 35

steps_per_epoch = train_set.n//train_set.batch_size
validation_steps = validation_set.n//validation_set.batch_size

checkpoint = ModelCheckpoint("model3.h5", monitor = 'val_accuracy',
                            save_weights_only = True, model = 'max', verbose = 1)

reduce_lr = ReduceLROnPlateau(monitor = 'val_loss', factor = 0.1, patience = 2, min_lr = 0.0001 , model = 'auto')

callbacks = [checkpoint, reduce_lr]

In [None]:
# Write your code to compile your model3. Use categorical crossentropy as the loss function, Adam Optimizer with 0.003 learning rate, and set metrics to 'accuracy'.
model3.compile(loss="categorical_crossentropy", optimizer=Adam(learning_rate=0.001), metrics=["accuracy"])

model3.summary()



In [None]:
# Write your code to fit your model. Use train_set as the training data and validation_set as the validation data. Train your model for 35 epochs.
import time
start=time.time()
history = model3.fit(train_set,
                     validation_data=validation_set,
                     epochs=35)

stop=time.time()
time=str(stop-start)

print("model3 requires " +time+ " training time")

In [None]:
list_ep = [i for i in range(1, 36)]

plt.figure(figsize = (8, 8))

plt.plot(list_ep, history.history['accuracy'], ls = '--', label = 'accuracy')

plt.plot(list_ep, history.history['val_accuracy'], ls = '--', label = 'val_accuracy')

plt.ylabel('Accuracy')

plt.xlabel('Epochs')

plt.title("Model 3 History")

plt.legend()

plt.show()

In [None]:
list_ep = [i for i in range(1, 36)]

plt.figure(figsize = (8, 8))

plt.plot(list_ep, history.history['loss'], ls = '--', label = 'loss')

plt.plot(list_ep, history.history['val_loss'], ls = '--', label = 'val_loss')

plt.ylabel('Loss')

plt.xlabel('Epochs')

plt.title("Model 3 Training History")

plt.legend()

plt.show()

In [None]:
# Write your code to evaluate the model performance on the test set
model3.evaluate(test_set)

In [None]:
# Plot the confusion matrix and generate a classification report for the model
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
test_set = datagen_test.flow_from_directory(folder_path + "test",
                                                              target_size = (img_size,img_size),
                                                              color_mode = 'grayscale',
                                                              batch_size = 128,
                                                              class_mode = 'categorical',
                                                              classes = ['happy', 'sad', 'neutral', 'surprise'],
                                                              shuffle = True) 
test_images, test_labels = next(test_set)

# Write the name of your chosen model in the blank
pred = model3.predict(test_images)
pred = np.argmax(pred, axis = 1) 
y_true = np.argmax(test_labels, axis = 1)

# Printing the classification report
print(classification_report(y_true, pred))

# Plotting the heatmap using confusion matrix
cm = confusion_matrix(y_true, pred)

cmn = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
plt.figure(figsize = (8, 5))
sns.heatmap(cmn, annot = True,  fmt = '.2f', xticklabels = ['happy', 'sad', 'neutral', 'surprise'], yticklabels = ['happy', 'sad', 'neutral', 'surprise'])
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
