# Banijay Emotions CSV preprocessimg 

In this script we are processing the Robinson22_structure CSV file that was provided by Banijay. It includes, among others, the annotated emotions for each fragment, the episode name, and the start and end time of the fragment. This Data will be used to evaluate the performance of our emotion-prediction pipeline.

In [1]:
import pandas as pd 

In [2]:
# load banijay emotions dataframe
data = pd.read_csv('../../data/data_banijay/raw/Robinson22_structure.csv')

In [3]:
# create subset of dataframe
df = data[['Instance name', 'Episode name', 'Act', 'Chapter', 'Segment', 'Start Time (seconds)',
       'End Time (seconds)', 'Emotions']]

In [4]:
# drop duplicate rows to only keep one row for each fragment of the show
cleaned_df = df.drop_duplicates()

### Lookup emotions according to emotions wheel 

To better compare our predictions to the annotated emotions from Banijay, we are mapping the 'Emotions' column to the 6 core emotions as proposed by Paul Ekman. 

As not all emotions that were annotated by Banijay can be clearly mapped to the 6 core emotions, we are only mapping the ones with a clear match. For matching the emotions we used the emotion wheel as can be found here: https://adsai.buas.nl/Year2/BlockC/images/emotions_wheel.png 

**The following emotions were not mapped:**
- Realization
- Gratitude
- Admiration
- Approval
- Desire
- Anticipation
- Curiosity
- Caring
- Hunger

In [5]:
emotion_words_lookup = {
    'surprise': ['Surprise', 'Amusement', 'Excitement', 'Confusion'],
    'happiness': ['Joy', 'Pride', 'Relief', 'Love', 'Optimism'],
    'sadness': ['Sadness', 'Disappointment', 'Grief', 'Remorse', 'Shame'],
    'fear': ['Fear', 'Nervousness'],
    'anger': ['Anger', 'Disapproval', 'Embarrassment', 'Annoyance'],
    'disgust': ['Disgust']
}

In [6]:
def map_emotion(emotion_str):
    """
    Maps emotions from a string to their corresponding categories.
    
    Args:
        emotion_str (str): A string containing emotions separated by commas.
        
    Returns:
        str: A string containing the mapped emotion categories separated by commas.
    """
    if pd.isna(emotion_str):
        return None
    emotions = emotion_str.split(', ')
    mapped_emotions = []
    for emotion in emotions:
        for key, values in emotion_words_lookup.items():
            if emotion in values:
                mapped_emotions.append(key)
                break
    return ', '.join(mapped_emotions)

# Apply mapping function to 'Emotions' column
cleaned_df['mapped_emotion'] = cleaned_df['Emotions'].apply(map_emotion)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_df['mapped_emotion'] = cleaned_df['Emotions'].apply(map_emotion)


In [7]:
def remove_duplicates(emotion_str):
    """
    Remove duplicate emotions from a comma-separated string of emotions.
    
    Args:
        emotion_str (str): A comma-separated string of emotions.
        
    Returns:
        str: A new string with duplicate emotions removed.
    """
    if pd.isna(emotion_str):
        return None
    emotions = emotion_str.split(', ')
    unique_emotions = list(set(emotions))
    return ', '.join(unique_emotions)

# Apply remove_duplicates function to 'mapped_emotion' column
cleaned_df['mapped_emotion'] = cleaned_df['mapped_emotion'].apply(remove_duplicates)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_df['mapped_emotion'] = cleaned_df['mapped_emotion'].apply(remove_duplicates)


In [8]:
# remove rows with no emotion value in 'mapped_emotion' column
cleaned_df = cleaned_df.dropna(subset=['mapped_emotion'])

In [9]:
cleaned_df.head(10)

Unnamed: 0,Instance name,Episode name,Act,Chapter,Segment,Start Time (seconds),End Time (seconds),Emotions,mapped_emotion
67,Survivor,1,Aparte eilanden,Reality,Aankomst,124,350,"Joy, Surprise, Anticipation","surprise, happiness"
864,Survivor,1,Aparte eilanden,Proef,Uitvoering landingsproef,1031,1078,Joy,happiness
964,Survivor,1,Aparte eilanden,Proef,Uitvoering landingsproef,1111,1178,Disgust,disgust
1061,Survivor,1,Aparte eilanden,Proef,Uitvoering landingsproef,1210,1256,"Joy, Disappointment","sadness, happiness"
1155,Survivor,1,Aparte eilanden,Proef,Uitvoering landingsproef,1292,1353,Joy,happiness
1721,Survivor,1,Aparte eilanden,Proef,Uitvoering immuniteitsproef,1981,2290,"Fear, Amusement, Optimism, Anticipation, Disap...","sadness, surprise, fear, happiness"
1784,Survivor,1,Aparte eilanden,Proef,Uitvoering immuniteitsproef,2323,2951,"Joy, Annoyance, Excitement, Optimism, Pride, A...","surprise, anger, happiness"
1831,Survivor,1,Aparte eilanden,Proef,Uitvoering immuniteitsproef,2975,3067,"Joy, Optimism, Pride",happiness
1849,Survivor,1,Aparte eilanden,Proef,Uitslag immuniteitsproef,3067,3233,"Joy, Disappointment","sadness, happiness"
1865,Survivor,1,Aparte eilanden,Reality,Aankomst,3233,3265,Joy,happiness


In [10]:
cleaned_df.to_csv('../../data/data_banijay/processed/Robinson22_structure_cleaned.csv', index=False)