# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint

Automated facial expression recognition provides an objective assessment of emotions. Human based assessment of emotions has many limitations and biases and automated facial expression technology has been found to deliver a better level of insight into behavior patterns. Emotion detection from facial expressions using AI is useful in automatically measuring consumers’ engagement with their content and brands, audience engagement for advertisements, customer satisfaction in the retail sector, psychological analyses, law enforcement etc.

In [3]:
#@title Explanation Video
from IPython.display import HTML

HTML("""<video width="854" height="480" controls>
  <source src="https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/Hackathon3b_expression_recognition.mp4" type="video/mp4">
</video>
""")

**Objectives:**

**Stage 4 (15 Marks):** Train a CNN Model and perform Expression Recognition in the EFR Mobile App.

**Stage 5 (5 Marks):** Test for Anti-Face Spoofing on the EFR Mobile App.

##**Stage 4 (15 Marks)**

**(i) Train a CNN Model for Expression Recognition on given Expression data  
(ii) Deploy the Model and Perform Expression Recognition on Team Data through the EFR Mobile App**


---


* Define and train a CNN for expression recognition for the data under folder "Expression_data" which segregated on expression basis.
* Collect your team data using EFR application and test your model on the same and optimize the CNN architecture for predicting the respective labels of the images.
* Save and Download the trained expression model and upload them in the ftp server (refer to [Filezilla Installation and Configuration document](https://drive.google.com/file/d/19UIKpyVK4r12Dxklo8quQdZQ31PWpiKM/view?usp=drive_link)).

* Update the **“exp_recognition.py”** file in the server. Open the files in the terminal (Command prompt) and provide the code for predicting the expression on the face (Note: To define the architecture of your trained model, you'll need to define it in the file **"exp_recognition_model.py"**).

* Test your model on the mobile app for Expression Recognition and Sequence Expression. Your team can also see your results in your terminal.


* Grading Scheme:
> * Expression Recognition (12M): If the functionality is returning expression class correctly for the face using the mobile app’s “Expression Recognition” functionality
> * Sequence Expression (3M): Get three consecutive correct Expressions using the mobile app’s “Sequence Expressions” functionality

**Download the dataset**

In [1]:
#@title Run this cell to download the dataset

from IPython import get_ipython
ipython = get_ipython()

notebook="M3_Hackathon" #name of the notebook

def setup():
#  ipython.magic("sx pip3 install torch")
    ipython.magic("sx wget wget https://cdn.talentsprint.com/aiml/Experiment_related_data/Expression_data.zip")

    ipython.magic("sx unzip Expression_data.zip")

    ipython.magic("sx pip install torch==2.5.1 -f https://download.pytorch.org/whl/cu100/stable")
    ipython.magic("sx pip install torchvision==0.11")
    ipython.magic("sx pip install opencv-python")
    print ("Setup completed successfully")
    return
setup()

Setup completed successfully


**Dataset attributes:**

During the setup you have downloaded the Expression data:

* **Expression_data**: In this folder, the images are segregrated in terms of Expression
> * Expressions available: ANGER, DISGUST, FEAR, HAPPINESS, NEUTRAL, SADNESS, SURPRISE
> * Each class is organised as one folder
> * There are ~18000 total images in the training data and ~4500 total images in the testing data

In [2]:
%ls

[0m[01;34mExpression_data[0m/  Expression_data.zip  [01;34m__MACOSX[0m/  [01;34msample_data[0m/


**Imports: All the imports are defined here**



In [3]:
%matplotlib inline
import torch
import torch.nn as nn
import torchvision.models as models
import torchvision
import torchvision.datasets as dset
import torchvision.transforms as transforms
from torch.utils.data import DataLoader,Dataset
import matplotlib.pyplot as plt
import torchvision.utils
import numpy as np
import random
from PIL import Image

from torch.autograd import Variable
import PIL.ImageOps

from torch import optim
import torch.nn.functional as F
import os
import warnings
from time import sleep
import sys
warnings.filterwarnings('ignore')

For the following step, to obtain hints on building a CNN model for face expression, you may refer to this [article](https://drive.google.com/open?id=1P2rpaWW3tOtGGnw4dvtdZ4hjoc8iDNst)

**Define and train a CNN model for expression recognition**

In [4]:

# Define the custom ResNet-50 model
class CustomResNet50(nn.Module):
    def __init__(self, num_classes=7):
        super(CustomResNet50, self).__init__()
        # Load pretrained ResNet-50
        self.model = models.resnet50(pretrained=True)

        # Modify the first convolutional layer to handle 48x48 inputs
        self.model.conv1 = nn.Conv2d(
            in_channels=3,  # Input channels (e.g., RGB images)
            out_channels=64,  # Output channels
            kernel_size=3,  # Smaller kernel size for small images
            stride=1,  # Reduced stride for smaller input size
            padding=1,  # Ensure size consistency
            bias=False
        )

        # Update the fully connected layer for 7 classes
        self.model.fc = nn.Linear(self.model.fc.in_features, num_classes)

    def forward(self, x):
        return self.model(x)

# Initialize the model
num_classes = 7
model = CustomResNet50(num_classes=num_classes)

# Move the model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Print model summary
from torchsummary import summary
summary(model, input_size=(3, 48, 48))

# Test with dummy input
dummy_input = torch.randn(32, 3, 48, 48).to(device)  # Batch size of 32
output = model(dummy_input)
print(f"Output shape: {output.shape}")  # Expected: [32, 7]


Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
100%|██████████| 97.8M/97.8M [00:00<00:00, 125MB/s]


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 48, 48]           1,728
       BatchNorm2d-2           [-1, 64, 48, 48]             128
              ReLU-3           [-1, 64, 48, 48]               0
         MaxPool2d-4           [-1, 64, 24, 24]               0
            Conv2d-5           [-1, 64, 24, 24]           4,096
       BatchNorm2d-6           [-1, 64, 24, 24]             128
              ReLU-7           [-1, 64, 24, 24]               0
            Conv2d-8           [-1, 64, 24, 24]          36,864
       BatchNorm2d-9           [-1, 64, 24, 24]             128
             ReLU-10           [-1, 64, 24, 24]               0
           Conv2d-11          [-1, 256, 24, 24]          16,384
      BatchNorm2d-12          [-1, 256, 24, 24]             512
           Conv2d-13          [-1, 256, 24, 24]          16,384
      BatchNorm2d-14          [-1, 256,

In [6]:
#   to train CNN model for Expression_Data


# Define the data transformations
transform = transforms.Compose([
    #transforms.Grayscale(num_output_channels=1),
    transforms.Resize((48, 48)),
    transforms.ToTensor(),
    #transforms.Normalize((0.5,), (0.5,))
])

# Load the dataset
train_dataset = torchvision.datasets.ImageFolder(root='/content/Expression_data/Facial_expression_train', transform=transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# Training loop
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device) # Assuming your model is already defined as 'model'

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
#optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
total, correct = 0, 0
num_epochs = 35  # Adjust as needed
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Calculate accuracy
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

        if (i + 1) % 100 == 0:
            accuracy = 100 * correct / total
            print(f'Epoch [{epoch + 1}/{num_epochs}], Step [{i + 1}/{len(train_loader)}], Loss: {loss.item():.4f}, Accuracy: {accuracy:.2f}%')
            total, correct = 0, 0  # Reset for next batch
    if accuracy >= 90:
        break

print("Training finished.")


Epoch [1/35], Step [100/569], Loss: 1.9745, Accuracy: 27.19%
Epoch [1/35], Step [200/569], Loss: 1.6368, Accuracy: 30.38%
Epoch [1/35], Step [300/569], Loss: 1.7845, Accuracy: 33.50%
Epoch [1/35], Step [400/569], Loss: 1.6004, Accuracy: 32.53%
Epoch [1/35], Step [500/569], Loss: 1.5882, Accuracy: 31.94%
Epoch [2/35], Step [100/569], Loss: 1.6802, Accuracy: 31.83%
Epoch [2/35], Step [200/569], Loss: 1.8280, Accuracy: 31.44%
Epoch [2/35], Step [300/569], Loss: 1.5618, Accuracy: 33.53%
Epoch [2/35], Step [400/569], Loss: 1.6883, Accuracy: 32.78%
Epoch [2/35], Step [500/569], Loss: 1.7183, Accuracy: 35.06%
Epoch [3/35], Step [100/569], Loss: 1.5974, Accuracy: 34.75%
Epoch [3/35], Step [200/569], Loss: 1.8277, Accuracy: 33.97%
Epoch [3/35], Step [300/569], Loss: 1.7669, Accuracy: 33.97%
Epoch [3/35], Step [400/569], Loss: 1.5208, Accuracy: 33.00%
Epoch [3/35], Step [500/569], Loss: 1.7767, Accuracy: 36.44%
Epoch [4/35], Step [100/569], Loss: 1.4778, Accuracy: 35.42%
Epoch [4/35], Step [200/

**Test your model and optimize CNN architecture for predicting the labels correctly**

In [7]:
# Save the model
torch.save(model.state_dict(), 'expression_model.pth')

In [None]:
# YOUR CODE HERE for test evaluation


In [8]:
test_dataset = torchvision.datasets.ImageFolder(root='/content/Expression_data/Facial_expression_test', transform=transform)
test_loader = DataLoader(test_dataset , batch_size=32, shuffle=True)

In [9]:

# Load the saved model
model = CustomResNet50()  # Assuming ExpressionCNN is defined in your code
model.load_state_dict(torch.load('expression_model.pth'))
model.eval()  # Set the model to evaluation mode

# Define the device (GPU if available, otherwise CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Testing loop
correct = 0
total = 0
with torch.no_grad():  # Disable gradient calculations during testing
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

test_accuracy = 100 * correct / total
print(f"Test Accuracy: {test_accuracy:.2f}%")

Test Accuracy: 37.62%


**Team Data Collection (activate the server first)**

  - (This can be done on the day of the Hackathon once the login username and password are given)

Activate the Server Access
* Open the terminal (Command Prompt)
* Login to SSH by typing **ssh (username)@aiml-sandbox1.talentsprint.com**. Give the login username which is given to you.

Eg: `ssh b16h3gxx@aiml-sandbox1.talentsprint.com`

  (If it is your first time connecting to the server from this computer, accept the connection by typing "yes".)
* After logging into SSH, please activate your virtual environment using the
command **source venv/bin/activate** and then press enter
* You can start the server by giving the command **sh runserver.sh** and then press enter.
* In order to collect team data in mobile app, ensure the server is active


**Collect your team data using the EFR Mobile App and fine-tune the CNN for expression data on your team**

Team Data Collection

* Follow the "Mobile_APP_Documentation" to collect the Expression photos of your team. These will be stored in the server to which login is provided to you.

[Mobile_APP_Documentation](https://drive.google.com/file/d/1F9SU-BwKViK_eZV2-P3pymvGUILUoVFf/view?usp=drive_link)


**Download your team expression data from the EFR app into your colab notebook using the links provided below**

NOTE: Replace the string "username" with your login username (such as b16h3gxx) in the below cell for expression images.

This data will be useful for testing the above trained cnn networks.

In [None]:
!wget -nH --recursive --no-parent --reject 'index.*' https://aiml-sandbox.talentsprint.com/expression_detection/username/captured_images_with_Expression/ --cut-dirs=3  -P ./captured_images_with_Expression

In [None]:
%ls

In [None]:
# YOUR CODE HERE for loading the team expression data. Note: Use the same transform which used for Expression_Data.
# YOU CODE HERE for Dataloader

In [None]:
# YOUR CODE HERE for getting the CNN representation of your team data with expression. Optimize the CNN model for predicting the labels of expressions correctly
# Note: If the CNN Model is not performing as expected, then you can add your Team Data to the Existing Training Data and Re-Train the Model.

**Save your trained model**

* Save the state dictionary of the classifier (use pytorch only), It will be useful in
integrating model to the mobile app

 [Hint](https://pytorch.org/tutorials/beginner/saving_loading_models.html)

In [None]:
### YOUR CODE HERE for saving the CNN model

**Download your trained model**
* Given the path of model file the following code downloads it through the browser

In [None]:
from google.colab import files
files.download('<model_file_path>')

##**Stage 5 (Anti Face Spoofing): (5 marks)**


---



The objective of anti face spoofing is to be able to unlock (say) a screen not just by your image
(which can be easily be spoofed with a photograph of yours) but by a switch in the expression
demanded by the Mobile App (which is much less probable to mimic)
* **Grading scheme**:
> * **Anti Face Spoofing**: (5M Only if both the cases mentioned below are achieved)
>>* **Unlock**: Correct face + Correct Demanded Expression
>>* **Stay Locked**: Correct face + Incorrect Demanded Expression (as you might imagine there are multiple other such possibilities, which you are free to explore)

In [None]:
# Test in your mobile app and see if it gets unlock.