# NLP Project - EFR in English conversations

You can find the ppt with details and requirements of the project <a href="https://docs.google.com/presentation/d/1TTN1H3GdnaswGXW63SuSvD4CsI7HB9lkYuwXRMQp2ks/edit?usp=sharing"> here</a>. The ppt is equivalent to the <a href="https://virtuale.unibo.it/mod/page/view.php?id=1405067"> FAQ page</a>

You can find the official webpage of the challenge <a href="https://lcs2.in/SemEval2024-EDiReF/"> here</a>

**EFR: Given a dialogue, EFR aims to identify the trigger utterance(s) for an emotion-flip in a multi-party conversation dialogue.** 

For example: 
<center>
    <img src="./images/example_EFR.jpeg" alt="EFR" />
</center>

In [3]:
%load_ext autoreload
%autoreload 2

import pandas as pd
import numpy
import os
import torch
from sys import platform
from utilities import *

from models.randomClassifier import RandomClassifier
from models.majorityClassifier import MajorityClassifier


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Setting the device

In [4]:
print(f"PyTorch version: {torch.__version__}")

if platform == "darwin":    #Run on macOS
    
    print(f"Is MPS (Metal Performance Shader) built? {torch.backends.mps.is_built()}")
    print(f"Is MPS available? {torch.backends.mps.is_available()}")
    device = "mps" if torch.backends.mps.is_available() else "cpu"    
else:
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 

print(f"Using device: {device}")


PyTorch version: 2.2.0+cu121
Using device: cuda


In [5]:
if device != 'mps':
    !nvidia-smi

Mon Feb 19 12:01:37 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 4060 ...    Off | 00000000:01:00.0  On |                  N/A |
| N/A   48C    P8               3W /  55W |     53MiB /  8188MiB |     15%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                         

## Defining flags and variables

In [7]:
DATA_DIRECTORY = 'Data'             # Directory containing the dataset
DATASET = 'MELD_efr.json'           # Name of dataset file

DATASET_PATH = os.path.join(DATA_DIRECTORY, DATASET)    # Path of dataset in JSON format 


## Dataset Creation and Splitting

In [8]:
df = pd.read_json(DATASET_PATH)
df.set_index("episode", inplace=True)

df.head()


Unnamed: 0_level_0,speakers,emotions,utterances,triggers
episode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
utterance_0,"[Chandler, The Interviewer, Chandler, The Inte...","[neutral, neutral, neutral, neutral, surprise]",[also I was the point person on my company's t...,"[0.0, 0.0, 0.0, 1.0, 0.0]"
utterance_1,"[Chandler, The Interviewer, Chandler, The Inte...","[neutral, neutral, neutral, neutral, surprise,...",[also I was the point person on my company's t...,"[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0]"
utterance_2,"[Chandler, The Interviewer, Chandler, The Inte...","[neutral, neutral, neutral, neutral, surprise,...",[also I was the point person on my company's t...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ..."
utterance_3,"[Chandler, The Interviewer, Chandler, The Inte...","[neutral, neutral, neutral, neutral, surprise,...",[also I was the point person on my company's t...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
utterance_4,"[Joey, Rachel, Joey, Rachel]","[surprise, sadness, surprise, fear]",[But then who? The waitress I went out with la...,"[0.0, 0.0, 1.0, 0.0]"


In [9]:
#removing NaN values
nan_count_before = df["triggers"].apply(lambda lst: sum(pd.isna(x) for x in lst)).sum()
df['triggers'] = df['triggers'].apply(replace_nan_with_zero)
nan_count_after = df["triggers"].apply(lambda lst: sum(pd.isna(x) for x in lst)).sum()

print(f"Before: {nan_count_before} NaN values")
print(f"After: {nan_count_after} NaN values")


Before: 9 NaN values
After: 0 NaN values


In [10]:
train_df, val_df, test_df = split_dataset(df)
print(f"Size of each dataset:\nTraining: {train_df.shape}\nValidation: {val_df.shape}\nTest: {test_df.shape}")

display(train_df)
display(val_df)
display(test_df)

Size of each dataset:
Training: (3200, 4)
Validation: (400, 4)
Test: (400, 4)


Unnamed: 0_level_0,speakers,emotions,utterances,triggers
episode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
utterance_512,"[Ross, Chandler, Ross, Chandler, Ross, Chandle...","[neutral, neutral, neutral, joy, fear, joy, su...","[Ok, bye. Well, Monica's not coming, it's jus...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"
utterance_1748,"[Monica, Phoebe, Phoebe, Phoebe, Monica, Mike,...","[anger, neutral, neutral, joy, neutral, neutra...","[I can't believe it's raining again! Oh, it's ...","[0, 0, 0, 0, 0, 0, 0]"
utterance_193,"[Joey, Wayne, Joey, Joey, Joey, Joey, Joey, Jo...","[joy, anger, surprise, neutral, neutral, fear,...","[Morning! Hey, how's my favorite genius and my...","[0, 0, 0, 0, 0, 0, 0, 0, 0]"
utterance_1306,"[Joey, Joey, Joey, Joey, Joey, Monica, Joey, J...","[joy, neutral, neutral, neutral, surprise, joy...","[Very funny Ross!, Very life-like and funny., ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"
utterance_2660,"[Ross, Rachel, Ross, Rachel, Ross, Rachel]","[anger, neutral, anger, disgust, neutral, sadn...","[Y'know, hey! You're the one who ended it, rem...","[0, 0, 0, 0, 0, 0]"
...,...,...,...,...
utterance_91,"[Monica, Chandler, Phoebe, Rachel, Monica, Ros...","[surprise, anger, surprise, joy, neutral, neut...","[Are you insane? I mean Joey, is going to kill...","[0, 0, 0, 0, 0, 0, 0]"
utterance_2427,"[Mr. Posner, Rachel, Joanna, Joanna, Joanna, J...","[joy, joy, neutral, surprise, neutral, joy, ne...","[You have a very impressive resume, Ms. Green....","[0, 0, 0, 0, 0, 0, 0, 0, 0]"
utterance_3627,"[Phoebe, Phoebe, Phoebe, Phoebe, Monica, Phoeb...","[neutral, neutral, neutral, joy, surprise, joy...","[And there's a country called Argentinaaaa, ...","[0, 0, 0, 0, 0, 0, 0]"
utterance_1,"[Chandler, The Interviewer, Chandler, The Inte...","[neutral, neutral, neutral, neutral, surprise,...",[also I was the point person on my company's t...,"[0, 0, 0, 0, 0, 0, 0]"


Unnamed: 0_level_0,speakers,emotions,utterances,triggers
episode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
utterance_1613,"[Monica, Richard, Richard's Date, Monica, Rich...","[anger, neutral, neutral, surprise, neutral, n...","[Ow!, Really?! Well, it's just like everyone e...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"
utterance_1490,"[Phoebe, Rachel, Phoebe, Rachel, Phoebe, Rache...","[neutral, neutral, anger, sadness, surprise, n...","[Are you okay?, I need some milk., Ok, I've go...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"
utterance_3973,"[Ben, Rachel, Ben, Rachel, Ben, Rachel, Rachel...","[surprise, neutral, joy, surprise, joy, surpri...","[Really? Like how?, Well y'know, we would umm,...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"
utterance_3425,"[Rachel, Ross, Joey, Chandler, Monica, Ross, C...","[anger, surprise, joy, neutral, neutral, anger...",[I'm telling you it's like watching Bambi lear...,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"
utterance_3095,"[Elizabeth, Ross, Elizabeth, Ross, Ross, Eliza...","[joy, surprise, neutral, neutral, neutral, neu...",[Oh please! It was such a big class! You never...,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"
...,...,...,...,...
utterance_1279,"[Ross, Chandler, Ross, Chandler, Ross, Chandle...","[neutral, surprise, neutral, surprise, joy, jo...","[Hi., Hey. Soaps? Shampoos? Are you really ta...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
utterance_3937,"[Chandler, Monica, Chandler]","[joy, neutral, neutral]","[Well, I feel like a snack!, Do you want some ...","[0, 0, 0]"
utterance_1555,"[Joey, Man, Joey, Joey, Man, Joey, Man, Joey, ...","[neutral, surprise, neutral, neutral, surprise...",[Hi! I'm Dr. Drake Remoray and I have a few ro...,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
utterance_1560,"[Phoebe, Rachel, Phoebe, Rachel, Phoebe]","[joy, joy, joy, neutral, disgust]","[Hey! You guys, I'm writing a holiday song for...","[0, 0, 0, 0, 0]"


Unnamed: 0_level_0,speakers,emotions,utterances,triggers
episode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
utterance_2222,"[Joey, Joey, Joey, Joey, Joey, Joey, Joey, Nur...","[sadness, sadness, sadness, sadness, neutral, ...","[Wait!, Terry!, Please!, Look, I just lost my ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"
utterance_308,"[Monica, Chandler, Monica, Chandler, Monica, C...","[neutral, neutral, anger, disgust, sadness, ne...","[Chandler, we still haven't gotten an RSVP fro...","[0, 0, 0, 0, 0, 0]"
utterance_870,"[Monica, Rachel, Monica, Rachel, Monica, Ross,...","[anger, surprise, joy, joy, surprise, joy, neu...","[We were on the platform, ready to dance the w...","[0, 0, 0, 0, 0, 0, 0, 0, 0]"
utterance_1722,"[Phoebe, Joey, Monica, Joey, Monica]","[neutral, neutral, disgust, sadness, neutral]","[Hey, Joey. What's going on?, Clear the tracks...","[0, 0, 0, 0, 0]"
utterance_3748,"[Ross, Rachel, Ross, Rachel]","[sadness, neutral, surprise, anger]","[Ahh, no., Oh., Are you jealous?, Noo, I y'kno...","[0, 0, 0, 0]"
...,...,...,...,...
utterance_589,"[Chandler, Joey, Chandler, Joey]","[neutral, joy, neutral, surprise]","[Well, I just thought it'd make me feel good t...","[0, 0, 0, 0]"
utterance_2466,"[Rachel, Joey, Rachel, Joey, Joey, Joey, Joey,...","[surprise, neutral, joy, joy, neutral, neutral...",[Joey? Could you get that? What are you doing...,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
utterance_3084,"[Phoebe, Joey, Phoebe, Joey, Phoebe, Joey, Mon...","[joy, neutral, neutral, surprise, sadness, joy...","[Oh hey! How was your audition?, I'm sorry, do...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"
utterance_2283,"[Joey, Ross, Chandler, Chandler, Rachel, Chand...","[disgust, disgust, anger, neutral, neutral, su...","[Do you have any respect for your body?, Don't...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"


## Random Classifier

In [8]:
# todo metric computation

emotions = np.unique([item for sublist in df["emotions"] for item in sublist]) # flattening and taking unique emotions
random_classifier = RandomClassifier(emotions)

predicted_labels = random_classifier.predict(test_df)

# esempio di print per gli scettici che non crederanno che il classifier funziona
print(predicted_labels[0][0], predicted_labels[1][0])


['joy', 'surprise', 'surprise'] [0, 1, 1]


## Majority Classifier

In [9]:
# todo metrics computation

majority_classifier = MajorityClassifier()

majority_classifier.fit(train_df)
predicted_labels = majority_classifier.predict(test_df)

# esempio di print per gli scettici che non crederanno che il classifier funziona
print(predicted_labels[0][0], predicted_labels[1][0])


['neutral', 'neutral', 'neutral'] [0, 0, 0]


## Bert Models

In [10]:
# todo Bert Models