In [1]:
!pip install --quiet opencv-python ultralytics
!pip install --quiet iterative-stratification

In [None]:
import pandas as pd
import ast
import os
from pathlib import Path
from google.colab import drive
import cv2
from ultralytics import YOLO
import torch
import sys
import os

drive.mount("/content/drive") #Change the line below to match your corresponding directory
%cd 'drive/Othercomputers/My Computer (1)/EmotionTeller-github'
current_dir = os.getcwd()
folder_path = os.path.join(current_dir, 'App/two_step_model')
sys.path.append(folder_path)

from utilsJ import *
from two_step_pipeline import Config, load_detector, load_classifier, run_on_image

# Detecting Group Emotions

Facial expression recognition is a crucial component for improving human–AI interaction and represents a core problem in computer vision, with applications in areas such as image captioning and behavioral analysis.

**Goal:** Detect faces and classify emotional expressions accurately, particularly in group image settings

**Modeling Approaches:**
- Two-step model: detect faces using a detection model followed by a fine-tuned emotion classifier
- Fine-tuned YOLO model: detects face and emotion in a single step









## Data

A total of 221 photos are taken from datasets [Human Group Emotions Labelled](https://huggingface.co/datasets/juanbtbx/Human-Group-Emotions-Labelled) (1) and [Emotic](https://github.com/rkosti/emotic) (2) combined. We customized these 221 photos by adding further labels and facial positions. We separated 43 photos from this set for final testing for both of the models.

For the two-stage model, in addition to the customized data, we added [FACES database](https://faces.mpdl.mpg.de/imeji/) (3) and a subset of [RAF database](http://www.whdeng.cn/RAF/model1.html#dataset) (4).


Below we load dataset (1) and (2) which we will use to determine the performance of the models on the test set.

In [7]:
data_root       = Path('Data')                # Folder where all data sources are stored
dataset_path    = data_root/'ImageData'       # List of folders containing images in .jpg format
data_meta       = ['emotic-relabelled.csv',
                'hgel-relabelled.csv']        # List of metadata corresponding to previous list of folders in .csv format. In our case we relabelled some of the data, so this is different from original metadata.
meta_root       = Path('Metadata')            # Folder where the previous .csv files are located
yolo_dir        = Path('YOLO_training')       # Folder where the results and data splitting will take place for YOLO
weights_path = Path('YOLO_training/runs')

emo_dic = {'Neutral':0,'Happy':1,'Surprise':2,'Sad':3,'Angry':4,'Fear':5,'Disgust':6}

os.makedirs(yolo_dir, exist_ok=True)

In [2]:
df_train = pd.read_csv(meta_root/'train_meta.csv')
df_train['objects'] = df_train['objects'].apply(lambda x: ast.literal_eval(x))

# Check if the file exists for each row
df_train['file_exists'] = df_train.apply(lambda x: os.path.exists(dataset_path/x.file_name),axis = 1)

# Drop rows where the file does not exist
df_train = df_train[df_train.file_exists]
df_train.drop(columns=['file_exists'], inplace=True)

df_test = pd.read_csv(meta_root/'test_meta.csv')
df_test['objects'] = df_test['objects'].apply(lambda x: ast.literal_eval(x))

# Check if the file exists for each row
df_test['file_exists'] = df_test.apply(lambda x: os.path.exists(dataset_path/x.file_name),axis = 1)

# Drop rows where the file does not exist
df_test = df_test[df_test.file_exists]
df_test.drop(columns=['file_exists'], inplace=True)

# Add the path to the image
df_train['path_to_img'] = df_train.apply(lambda x: dataset_path/x.file_name,axis=1)
df_test['path_to_img'] = df_test.apply(lambda x: dataset_path/x.file_name,axis=1)

## Models



### Model 1 - YOLO

Mention in which notebook training was done and any other details: `YOLOfinetune.ipynb`.

Load results, show at least one example and metrics.

### Model 2 - Two-step model

Mention in which notebook training was done and any other details: `YOLOfinetune.ipynb`.

Load results, show at least one example and metrics (might have to specify the extra metrics one can consider here).

## Evaluation and Results

### YOLOv11 models

Below we compute the metrics for the `yolov11m` model that we finetuned. Due to the nature of our predictions, we use an IoU threshold of 0.05. A bigger dataset would yield a more robust model, this would allow us to increase this threshold.

In [181]:
name = 'yolo11m_finetuned4'

medium_finetuned = YOLO(weights_path / name /'weights'/'best.pt')

train_preds = medium_finetuned.predict(list(df_train['path_to_img']),verbose=False)
train_combined = combine_gt_pred(df_train,train_preds,emo_dic)
#del train_preds
torch.cuda.empty_cache() # Important to not run out of GPU memory

test_preds = medium_finetuned.predict(list(df_test['path_to_img']),verbose=False)
test_combined = combine_gt_pred(df_test,test_preds,emo_dic)
del test_preds
torch.cuda.empty_cache() # Important to not run out of GPU memory

In [94]:
detect_metrics(train_combined, test_combined)

Train Metrics:
Precision: 0.7028
Recall: 0.6652
mAP50: 0.4333
mAP50-95: 0.0000

Test Metrics:
Precision: 0.5419
Recall: 0.3702
mAP50: 0.1341
mAP50-95: 0.0000


The cell below allows to visualize the predictions of our model.

In [17]:
drop_display(train_combined,test_combined)

Train Set:


Dropdown(description='Select Train Index:', options=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…

Output()


Test Set:


Dropdown(description='Select Test Index:', options=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, …

Output()

We now compute the metrics for the `yolov11m-face` model that we finetuned.

In [8]:
name = 'yolo11m_face_finetuned2'
medium_face_finetuned = YOLO(weights_path / name /'weights'/'best.pt')

train_face_preds = medium_face_finetuned.predict(list(df_train['path_to_img']),verbose=False, iou=0.5)
train_face_combined = combine_gt_pred(df_train,train_face_preds,emo_dic)
del train_face_preds
torch.cuda.empty_cache() # Important to not run out of GPU memory

test_face_preds = medium_face_finetuned.predict(list(df_test['path_to_img']),verbose=False, iou=0.5)
test_face_combined = combine_gt_pred(df_test,test_face_preds,emo_dic)
del test_face_preds
torch.cuda.empty_cache() # Important to not run out of GPU memory

In [9]:
detect_metrics(train_face_combined, test_face_combined)

Train Metrics:
Precision: 0.5399
Recall: 0.2396
mAP5: 0.1284

Test Metrics:
Precision: 0.5000
Recall: 0.2176
mAP5: 0.1957


The cell below allows to visualize the predictions of our model.

In [10]:
drop_display(train_face_combined,test_face_combined)

Train Set:


Dropdown(description='Select Train Index:', options=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…

Output()


Test Set:


Dropdown(description='Select Test Index:', options=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, …

Output()

### Two Step Model



In [3]:
cfg = Config(
    detector_path = 'App/two_step_model/BaselineModels/yolo11n-face-best.pt',
    classifier_path= 'App/two_step_model/BaselineModels/best_overall.pt'
)

det_model = load_detector(cfg)
cls_model = load_classifier(cfg)

# Process multiple images
train_image_files = list(df_train['path_to_img'])
test_image_files = list(df_test['path_to_img'])

train_preds_pre = []
for img_path in train_image_files:
    result = run_on_image(str(img_path), det_model, cls_model, cfg)
    train_preds_pre.append(result)
    #print(f"Processed {img_path.name}: {len(result['faces'])} faces")
test_preds_pre = []
for img_path in test_image_files:
    result = run_on_image(str(img_path), det_model, cls_model, cfg)
    test_preds_pre.append(result)

train_preds = two_step_to_yolo(df_train,train_preds_pre)
test_preds = two_step_to_yolo(df_test,test_preds_pre)

# Aggregate results
total_faces = sum(len(r['faces']) for r in train_preds_pre)
print(f"Total faces detected in train: {total_faces}")

# Aggregate results
total_faces = sum(len(r['faces']) for r in test_preds_pre)
print(f"Total faces detected in test: {total_faces}")

train_two_combined = combine_gt_pred(df_train,train_preds,emo_dic)
test_two_combined = combine_gt_pred(df_test,test_preds,emo_dic)

Loading classifier checkpoint from App/two_step_model/BaselineModels/best_overall.pt ...
Classifier ready. arch=resnet18, num_classes=7, classes=['Angry', 'Disgust', 'Fear', 'Happy', 'Neutral', 'Sad', 'Surprise']
Total faces detected in train: 837
Total faces detected in test: 113


Now we determine the metrics for this model.

In [4]:
detect_metrics(train_two_combined, test_two_combined)

Train Metrics:
Precision: 0.2378
Recall: 0.1806
mAP5: 0.0526

Test Metrics:
Precision: 0.2389
Recall: 0.1031
mAP5: 0.0356


Use the dropdown menus below to visualize the predictions of our model.

In [5]:
drop_display(train_combined,test_combined)

Train Set:


Dropdown(description='Select Train Index:', options=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…

Output()


Test Set:


Dropdown(description='Select Test Index:', options=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, …

Output()

We should note that despite the bounding boxes being mostly correct, the indetified emotion is usually not.

### App

## Further improvements