This file improve the emotion detection prompt with annotation results.

Specifically, I want to find out:<br>
    1. What are the best examples? <br>
    2. How many examples are good for usage? <br>

Out of the 100 tweets with human annotations. I will use 50 as training, 50 as testing set. 

From the 50 training set, I will pick 4 (each from HAP, HAN, LAP, LAN) to append on the prompt to see if it improves the performance. To measure the performance, I will run each prompt verion on the testing set and pick the one has the closest predictions as the human annotations. 

In [9]:
import pandas as pd
import numpy as np

## Data Preparation

In [3]:
# Import csv as pandas dataframe
annotations = pd.read_csv("../csvs/annotation_results/tweet_rows.csv")
print(len(annotations))



700


In [4]:
annotations.head()

Unnamed: 0,csv_index,tweet_id,tweet,Nervous,Sad,Happy,Calm,Excited,Angry,Relaxed,Fearful,Enthusiastic,Satisfied,Bored,Lonely,Tired,prolific_id,Positive,Negative
0,609,home-conversation-1814505443940585709,"The police didn't restore order, the communit...",2,1,1,1,3,2,1,1,2,1,1,1,1,6637b21240220c2517fffa1a,1.0,3.0
1,309,home-conversation-1814505443940585709,"The police didn't restore order, the communit...",5,3,1,1,1,5,1,4,1,1,1,1,1,67d44a66d5276abe0bf8ff87,2.0,5.0
2,209,home-conversation-1814505443940585709,"The police didn't restore order, the communit...",4,2,1,1,3,2,1,2,1,1,2,3,3,672f0ab96d0a3d857a526df0,1.0,3.0
3,9,home-conversation-1814505443940585709,"The police didn't restore order, the communit...",4,4,1,1,3,4,1,5,1,1,1,1,2,646bbb2b727f660f5ec8f1c9,1.0,4.0
4,109,home-conversation-1814505443940585709,"The police didn't restore order, the communit...",2,2,1,1,1,2,1,2,1,1,2,1,3,661a846a19ee01121d271070,1.0,3.0


In [7]:
# merge the rows by unique tweet_id and get the average of all emotions
# Group by tweet_id and calculate mean of emotion columns
emotion_columns = ['Nervous', 'Sad', 'Happy', 'Calm', 'Excited', 'Angry', 'Relaxed', 
                   'Fearful', 'Enthusiastic', 'Satisfied', 'Bored', 'Lonely', 'Tired', 
                   'Positive', 'Negative']

# Group by tweet_id and calculate mean for emotion columns, keep first occurrence of other columns
merged_annotations = annotations.groupby('tweet_id').agg({
    'tweet': 'first',  # Keep the first occurrence of the tweet text
    **{col: 'mean' for col in emotion_columns}  # Calculate mean for all emotion columns
}).reset_index()

print(f"Original dataset shape: {annotations.shape}")
print(f"Merged dataset shape: {merged_annotations.shape}")
print(f"Number of unique tweets: {merged_annotations['tweet_id'].nunique()}")


Original dataset shape: (700, 19)
Merged dataset shape: (100, 17)
Number of unique tweets: 100


In [8]:
merged_annotations.head()

Unnamed: 0,tweet_id,tweet,Nervous,Sad,Happy,Calm,Excited,Angry,Relaxed,Fearful,Enthusiastic,Satisfied,Bored,Lonely,Tired,Positive,Negative
0,home-conversation-1814505443940585709,"The police didn't restore order, the communit...",3.428571,2.857143,1.142857,1.142857,2.0,3.428571,1.142857,3.142857,1.285714,1.428571,1.714286,1.714286,2.428571,1.428571,3.857143
1,home-conversation-1814753557876514991,don't pretend you're a star wars fan if you d...,1.285714,1.142857,2.571429,1.857143,2.714286,1.285714,2.0,1.0,2.714286,2.285714,1.571429,1.285714,1.0,3.142857,1.285714
2,home-conversation-1815516707189895355,I’m here for @Montel_Williams suing all the M...,2.571429,2.714286,1.857143,1.714286,1.571429,2.857143,1.857143,2.428571,2.142857,2.0,1.285714,1.571429,2.142857,2.142857,3.142857
3,home-conversation-1815549693411180688,i got a comment implying i get paid so i have...,2.285714,2.285714,2.285714,2.428571,1.857143,2.428571,2.571429,1.857143,2.285714,2.428571,1.428571,1.285714,1.428571,2.428571,3.0
4,tweet-1814143199251730455,"the millennial silence at the beginning, the a...",1.0,1.0,3.571429,3.0,3.428571,1.285714,2.857143,1.0,3.142857,3.285714,1.714286,1.0,1.0,4.0,1.285714


In [10]:
# Separate the data frame into training and testing
# I want two data frames, one with 50 tweets and one with 50 tweets (randomly selected from the original data frame)
np.random.seed(42)

# Get total number of tweets
total_tweets = len(merged_annotations)

# Randomly select 50 tweets for training and 50 for testing
# Since we have more than 100 tweets, we can randomly select 50 for each
train_indices = np.random.choice(total_tweets, size=50, replace=False)
test_indices = np.random.choice(np.setdiff1d(np.arange(total_tweets), train_indices), size=50, replace=False)

# Create training and testing dataframes
train_df = merged_annotations.iloc[train_indices].reset_index(drop=True)
test_df = merged_annotations.iloc[test_indices].reset_index(drop=True)

print(f"Training set shape: {train_df.shape}")
print(f"Testing set shape: {test_df.shape}")
print(f"Number of unique tweets in training: {train_df['tweet_id'].nunique()}")
print(f"Number of unique tweets in testing: {test_df['tweet_id'].nunique()}")

# Verify no overlap between training and testing sets
train_ids = set(train_df['tweet_id'])
test_ids = set(test_df['tweet_id'])
overlap = train_ids.intersection(test_ids)
print(f"Number of overlapping tweet IDs: {len(overlap)}")


Training set shape: (50, 17)
Testing set shape: (50, 17)
Number of unique tweets in training: 50
Number of unique tweets in testing: 50
Number of overlapping tweet IDs: 0


In [11]:
# save the training and testing dataframes to csv
train_df.to_csv("../csvs/annotation_results/train_df.csv", index=False)
test_df.to_csv("../csvs/annotation_results/test_df.csv", index=False)

In [22]:
# Round the train_df's each emotion columns to integer
# Round all emotion columns to integers
emotion_columns = ['Nervous', 'Sad', 'Happy', 'Calm', 'Excited', 'Angry', 'Relaxed', 
                   'Fearful', 'Enthusiastic', 'Satisfied', 'Bored', 'Lonely', 'Tired', 
                   'Positive', 'Negative']

for col in emotion_columns:
    train_df[col] = train_df[col].round().astype(int)
train_df.head()



Unnamed: 0,tweet_id,tweet,Nervous,Sad,Happy,Calm,Excited,Angry,Relaxed,Fearful,...,Satisfied,Bored,Lonely,Tired,Positive,Negative,HAP,HAN,LAP,LAN
0,tweet-1816150260273570272,Honored to sit down with my friend and our gre...,3,2,2,2,2,3,1,2,...,2,2,1,2,2,3,1.857143,2.428571,1.571429,1.714286
1,tweet-1815412948699066825,"Happy Birthday to my best friend, husband and ...",1,1,4,3,3,1,3,1,...,3,2,1,2,4,1,2.952381,1.142857,2.666667,1.333333
2,tweet-1815804097997443496,he's going to get so much bpd pussy off this,2,2,2,1,2,2,2,1,...,2,2,2,1,2,3,2.190476,1.571429,1.52381,1.714286
3,tweet-1815207050579787915,What do Kamala Harris and Pete Buttigeig have ...,2,2,1,3,2,2,3,1,...,2,3,1,1,2,3,1.809524,2.047619,2.285714,1.761905
4,tweet-1815203143556137159,Please help me honor US Army Ranger SGT Robert...,2,3,1,2,1,2,2,2,...,2,2,2,2,3,3,1.428571,2.095238,1.952381,2.666667


In [23]:
# save the train_df_rounded to csv
train_df.to_csv("../csvs/annotation_results/train_df_rounded.csv", index=False)

Get the HAP, HAN, LAP, LAN

In [21]:
train_df = pd.read_csv("../csvs/annotation_results/train_df.csv")
train_df.head()

Unnamed: 0,tweet_id,tweet,Nervous,Sad,Happy,Calm,Excited,Angry,Relaxed,Fearful,...,Satisfied,Bored,Lonely,Tired,Positive,Negative,HAP,HAN,LAP,LAN
0,tweet-1816150260273570272,Honored to sit down with my friend and our gre...,2.714286,2.0,1.571429,1.571429,1.857143,2.571429,1.428571,2.0,...,1.714286,1.857143,1.285714,1.857143,2.142857,3.142857,1.857143,2.428571,1.571429,1.714286
1,tweet-1815412948699066825,"Happy Birthday to my best friend, husband and ...",1.285714,1.0,3.714286,2.857143,3.0,1.142857,2.571429,1.0,...,2.571429,1.571429,1.428571,1.571429,3.857143,1.428571,2.952381,1.142857,2.666667,1.333333
2,tweet-1815804097997443496,he's going to get so much bpd pussy off this,1.571429,1.714286,2.142857,1.428571,2.0,1.714286,1.571429,1.428571,...,1.571429,1.714286,1.714286,1.428571,1.714286,2.714286,2.190476,1.571429,1.52381,1.714286
3,tweet-1815207050579787915,What do Kamala Harris and Pete Buttigeig have ...,2.285714,1.714286,1.428571,2.714286,1.857143,2.428571,2.571429,1.428571,...,1.571429,2.571429,1.0,1.0,2.428571,2.857143,1.809524,2.047619,2.285714,1.761905
4,tweet-1815203143556137159,Please help me honor US Army Ranger SGT Robert...,1.857143,3.428571,1.0,1.857143,1.285714,2.285714,1.857143,2.142857,...,2.142857,2.142857,2.428571,2.285714,3.0,2.571429,1.428571,2.095238,1.952381,2.666667


In [16]:

# create an emotion_mapping to get HAP, HAN, LAP, LAN
emotion_mapping = {
    "HAP": ["Excited", "Enthusiastic", "Happy"],
    "HAN": ["Angry", "Fearful", "Nervous"],
    "LAP": ["Calm", "Relaxed", "Satisfied"],
    "LAN": ["Sad", "Bored", "Lonely"],
}

# get the HAP, HAN, LAP, LAN columns
for category, emotions in emotion_mapping.items():
    train_df[category] = train_df[emotions].mean(axis=1)

train_df.head()

Unnamed: 0,tweet_id,tweet,Nervous,Sad,Happy,Calm,Excited,Angry,Relaxed,Fearful,...,Satisfied,Bored,Lonely,Tired,Positive,Negative,HAP,HAN,LAP,LAN
0,tweet-1816150260273570272,Honored to sit down with my friend and our gre...,2.714286,2.0,1.571429,1.571429,1.857143,2.571429,1.428571,2.0,...,1.714286,1.857143,1.285714,1.857143,2.142857,3.142857,1.857143,2.428571,1.571429,1.714286
1,tweet-1815412948699066825,"Happy Birthday to my best friend, husband and ...",1.285714,1.0,3.714286,2.857143,3.0,1.142857,2.571429,1.0,...,2.571429,1.571429,1.428571,1.571429,3.857143,1.428571,2.952381,1.142857,2.666667,1.333333
2,tweet-1815804097997443496,he's going to get so much bpd pussy off this,1.571429,1.714286,2.142857,1.428571,2.0,1.714286,1.571429,1.428571,...,1.571429,1.714286,1.714286,1.428571,1.714286,2.714286,2.190476,1.571429,1.52381,1.714286
3,tweet-1815207050579787915,What do Kamala Harris and Pete Buttigeig have ...,2.285714,1.714286,1.428571,2.714286,1.857143,2.428571,2.571429,1.428571,...,1.571429,2.571429,1.0,1.0,2.428571,2.857143,1.809524,2.047619,2.285714,1.761905
4,tweet-1815203143556137159,Please help me honor US Army Ranger SGT Robert...,1.857143,3.428571,1.0,1.857143,1.285714,2.285714,1.857143,2.142857,...,2.142857,2.142857,2.428571,2.285714,3.0,2.571429,1.428571,2.095238,1.952381,2.666667


In [17]:
test_df = pd.read_csv("../csvs/annotation_results/test_df.csv")

# create an emotion_mapping to get HAP, HAN, LAP, LAN
emotion_mapping = {
    "HAP": ["Excited", "Enthusiastic", "Happy"],
    "HAN": ["Angry", "Fearful", "Nervous"],
    "LAP": ["Calm", "Relaxed", "Satisfied"],
    "LAN": ["Sad", "Bored", "Lonely"],
}

# get the HAP, HAN, LAP, LAN columns
for category, emotions in emotion_mapping.items():
    test_df[category] = test_df[emotions].mean(axis=1)

test_df.head()

Unnamed: 0,tweet_id,tweet,Nervous,Sad,Happy,Calm,Excited,Angry,Relaxed,Fearful,...,Satisfied,Bored,Lonely,Tired,Positive,Negative,HAP,HAN,LAP,LAN
0,home-conversation-1815516707189895355,I’m here for @Montel_Williams suing all the M...,2.571429,2.714286,1.857143,1.714286,1.571429,2.857143,1.857143,2.428571,...,2.0,1.285714,1.571429,2.142857,2.142857,3.142857,1.857143,2.619048,1.857143,1.857143
1,tweet-1816578588591718812,overstimulation… too weak to keep their eyes o...,3.0,1.857143,1.428571,1.285714,1.857143,1.857143,1.285714,1.714286,...,1.285714,2.428571,2.142857,2.0,1.714286,3.0,1.666667,2.190476,1.285714,2.142857
2,tweet-1814437570899857804,"If Biden bails, he could still run again in 4 ...",2.0,3.0,1.285714,1.857143,2.0,2.714286,1.714286,2.857143,...,1.0,2.571429,1.285714,3.0,1.714286,2.857143,1.666667,2.52381,1.52381,2.285714
3,tweet-1816311027241476299,They had to redo the balconies here and it’s j...,1.571429,3.571429,1.285714,1.571429,1.428571,2.857143,1.285714,1.428571,...,1.285714,1.285714,1.142857,1.285714,1.714286,3.0,1.52381,1.952381,1.380952,2.0
4,tweet-1815514662206640271,Just a girl not a threat ☀️ 🌕 \n\n#TheLastofUs,1.142857,1.0,2.857143,2.285714,3.428571,1.714286,1.857143,1.428571,...,1.714286,1.142857,1.285714,1.428571,3.571429,1.714286,2.904762,1.428571,1.952381,1.142857


In [18]:
print(len(test_df))

50


In [19]:
# save the training and testing with HAP, HAN, LAP, LAN to csv
train_df.to_csv("../csvs/annotation_results/train_df.csv", index=False)
test_df.to_csv("../csvs/annotation_results/test_df.csv", index=False)

Pick Example from the train_df

Here I picked 4 example by tweets with the highest HAP, HAN, LAP, LAN respectively.

In [39]:
# Print out number of rows where HAP, HAN, LAP, LAN > 3
# print(train_df[train_df['HAP'] > 4]))
# print(len(train_df[train_df['HAN'] > 3.4]))
# print(train_df[train_df['LAP'] > 3.3])
print(train_df[train_df['LAN'] > 2.6])

                    tweet_id  \
4  tweet-1815203143556137159   

                                               tweet  Nervous  Sad  Happy  \
4  Please help me honor US Army Ranger SGT Robert...        2    3      1   

   Calm  Excited  Angry  Relaxed  Fearful  ...  Satisfied  Bored  Lonely  \
4     2        1      2        2        2  ...          2      2       2   

   Tired  Positive  Negative       HAP       HAN       LAP       LAN  
4      2         3         3  1.428571  2.095238  1.952381  2.666667  

[1 rows x 21 columns]


Now we have the train_df and test_df. And some tweet examples to start with.

## Construct Prompt

In [47]:
import importlib
import emotion_detector
importlib.reload(emotion_detector)
from emotion_detector import _build_prompt

In [48]:
# Example tweet and settings
tweet_text = "Just had the most amazing coffee this morning! ☕️ Ready to tackle the day! 💪"
include_image = False
personalized = False
implied = False

# Use the same schema as in gpt_detect_emotion
schema = {
    "Nervous": "<1-5>",
    "Sad": "<1-5>",
    "Happy": "<1-5>",
    "Calm": "<1-5>",
    "Excited": "<1-5>",
    "Aroused": "<1-5>",
    "Angry": "<1-5>",
    "Relaxed": "<1-5>",
    "Fearful": "<1-5>",
    "Enthusiastic": "<1-5>",
    "Still": "<1-5>",
    "Satisfied": "<1-5>",
    "Bored": "<1-5>",
    "Lonely": "<1-5>",
    "explanation": "This tweet is <category_placeholder> because... This tweet contains <emotion_placeholder> emotion because...",
}

# Build the prompt
prompt = _build_prompt(
    tweet_text,
    include_image=include_image,
    personalized=personalized,
    implied=implied,
    schema=schema,
)

# Print the prompt
print(prompt)



    

    
        As an expert annotator specializing in emotions in social media content, your job is to predict what emotions and feelings the input would make a user feel when they read/view it.
        
        Analyze the emotional tone, sentiment, and emotional content directly expressed or implied in the tweet text.
        

    ——

    
    Definitions of emotions:

    Nervous: restless tension, emotion characterized by trembling, feelings of apprehensiveness, or other signs of anxiety or fear.

    Sad: the response to the loss of an object or person to which you are very attached. The prototypical experience is the death of a loved child, parent, or spouse. In sadness there is resignation, but it can turn into anguish in which there is agitation and protest over the loss and then return to sadness again.

    Happy: feelings that are enjoyed, that are sought by the person. A number of quite different enjoyable emotions, each triggered by a different event, involving a dif

So now we have a new prompt that has real example from human annotations. Next, we will compare this with gpt annotations with previous version of the prompt.

## Comparison of the two prompt

In [66]:
# load gpt_output
gpt_output = pd.read_csv("../csvs/final_batch_output.csv")

In [67]:
gpt_output.columns

Index(['tweet_id', 'predicted_nervous', 'predicted_sad', 'predicted_happy',
       'predicted_calm', 'predicted_excited', 'predicted_aroused',
       'predicted_angry', 'predicted_relaxed', 'predicted_fearful',
       'predicted_enthusiastic', 'predicted_still', 'predicted_satisfied',
       'predicted_bored', 'predicted_lonely', 'predicted_tired'],
      dtype='object')

In [72]:
# Extract numeric part from tweet_id
print(test_df['tweet_id'].dtype)
print(gpt_output['tweet_id'].dtype)

int64
int64


In [70]:
gpt_output['tweet_id'] = gpt_output['tweet_id'].str.extract(r'(\d+)$')

In [71]:
gpt_output['tweet_id'] = gpt_output['tweet_id'].astype('int64')

In [76]:
# drop duplicates in gpt_output
gpt_output = gpt_output.drop_duplicates(subset=['tweet_id'])

In [73]:
test_df.head()

Unnamed: 0,tweet_id,tweet,Nervous,Sad,Happy,Calm,Excited,Angry,Relaxed,Fearful,...,Bored,Lonely,Tired,Positive,Negative,HAP,HAN,LAP,LAN,tweet_id_numeric
0,1815516707189895355,I’m here for @Montel_Williams suing all the M...,2.571429,2.714286,1.857143,1.714286,1.571429,2.857143,1.857143,2.428571,...,1.285714,1.571429,2.142857,2.142857,3.142857,1.857143,2.619048,1.857143,1.857143,1815516707189895355
1,1816578588591718812,overstimulation… too weak to keep their eyes o...,3.0,1.857143,1.428571,1.285714,1.857143,1.857143,1.285714,1.714286,...,2.428571,2.142857,2.0,1.714286,3.0,1.666667,2.190476,1.285714,2.142857,1816578588591718812
2,1814437570899857804,"If Biden bails, he could still run again in 4 ...",2.0,3.0,1.285714,1.857143,2.0,2.714286,1.714286,2.857143,...,2.571429,1.285714,3.0,1.714286,2.857143,1.666667,2.52381,1.52381,2.285714,1814437570899857804
3,1816311027241476299,They had to redo the balconies here and it’s j...,1.571429,3.571429,1.285714,1.571429,1.428571,2.857143,1.285714,1.428571,...,1.285714,1.142857,1.285714,1.714286,3.0,1.52381,1.952381,1.380952,2.0,1816311027241476299
4,1815514662206640271,Just a girl not a threat ☀️ 🌕 \n\n#TheLastofUs,1.142857,1.0,2.857143,2.285714,3.428571,1.714286,1.857143,1.428571,...,1.142857,1.285714,1.428571,3.571429,1.714286,2.904762,1.428571,1.952381,1.142857,1815514662206640271


In [63]:
test_df.to_csv("../csvs/annotation_results/test_df.csv", index=False)

In [74]:
print(len(test_df))

50


In [77]:
# Merge the gpt_output with the test_df
test_merged = pd.merge(gpt_output, test_df, on='tweet_id', how='inner')
print(len(test_merged))

50


In [78]:
test_merged.head()

Unnamed: 0,tweet_id,predicted_nervous,predicted_sad,predicted_happy,predicted_calm,predicted_excited,predicted_aroused,predicted_angry,predicted_relaxed,predicted_fearful,...,Bored,Lonely,Tired,Positive,Negative,HAP,HAN,LAP,LAN,tweet_id_numeric
0,1816483288128577671,2,4,1,1,1,2,5,1,2,...,2.714286,2.428571,2.857143,1.714286,3.857143,1.904762,3.190476,1.904762,3.095238,1816483288128577671
1,1816374799813226778,1,1,4,2,3,2,1,2,1,...,2.571429,1.142857,1.285714,2.857143,1.428571,2.285714,1.047619,2.285714,1.571429,1816374799813226778
2,1816528401400021365,1,1,2,1,2,1,4,1,1,...,1.571429,1.428571,1.714286,1.571429,2.714286,1.761905,2.095238,1.428571,1.666667,1816528401400021365
3,1816518235099140473,1,2,4,1,3,2,1,1,1,...,1.428571,1.571429,1.714286,2.857143,1.142857,2.47619,1.428571,2.380952,1.47619,1816518235099140473
4,1816578588591718812,2,2,1,1,2,4,1,1,2,...,2.428571,2.142857,2.0,1.714286,3.0,1.666667,2.190476,1.285714,2.142857,1816578588591718812


In [79]:
test_merged.columns

Index(['tweet_id', 'predicted_nervous', 'predicted_sad', 'predicted_happy',
       'predicted_calm', 'predicted_excited', 'predicted_aroused',
       'predicted_angry', 'predicted_relaxed', 'predicted_fearful',
       'predicted_enthusiastic', 'predicted_still', 'predicted_satisfied',
       'predicted_bored', 'predicted_lonely', 'predicted_tired', 'tweet',
       'Nervous', 'Sad', 'Happy', 'Calm', 'Excited', 'Angry', 'Relaxed',
       'Fearful', 'Enthusiastic', 'Satisfied', 'Bored', 'Lonely', 'Tired',
       'Positive', 'Negative', 'HAP', 'HAN', 'LAP', 'LAN', 'tweet_id_numeric'],
      dtype='object')

In [80]:
test_merged.to_csv("../csvs/annotation_results/improved_prompt_output.csv", index=False)

## Run New Prompt

Now the test set is ready. I will run the prompt on each row's tweet and record the output with new column called predicted_{emotion}_new.

In [81]:
import pandas as pd
from emotion_detector import gpt_detect_emotion
import json
from tqdm import tqdm  # for progress bar

def process_test_set(test_df):
    # List to store all results
    results = []
    
    # Emotions we want to track (based on your schema)
    emotions = ['Nervous', 'Sad', 'Happy', 'Calm', 'Excited', 'Aroused', 
                'Angry', 'Relaxed', 'Fearful', 'Enthusiastic', 'Still', 
                'Satisfied', 'Bored', 'Lonely']
    
    # Process each tweet
    for idx, row in tqdm(test_df.iterrows(), total=len(test_df)):
        result_dict = {'tweet_id': row['tweet_id']}
        
        try:
            # Call GPT for emotion detection
            gpt_output = gpt_detect_emotion(
                tweet_text=row['tweet'],
                debug=False,  # Set to True if you want to see the prompts
                personalized=False,
                implied=False
            )
            
            # Check if there's an error in the response
            if 'error' in gpt_output:
                print(f"Error processing tweet {row['tweet_id']}: {gpt_output['error']}")
                continue
                
            # Add predictions to result dictionary
            for emotion in emotions:
                result_dict[f'predicted_{emotion}_new'] = gpt_output[emotion]
            
            # Add explanation
            result_dict['predicted_explanation_new'] = gpt_output['explanation']
            
        except Exception as e:
            print(f"Error processing tweet {row['tweet_id']}: {str(e)}")
            continue
        
        results.append(result_dict)
    
    # Convert results to DataFrame
    results_df = pd.DataFrame(results)
    
    # Merge with original test_df
    final_df = pd.merge(test_df, results_df, on='tweet_id', how='left')
    
    return final_df

# Run the processing
try:
    # Process the test set
    processed_df = process_test_set(test_merged)
    
    # Save the results (optional)
    processed_df.to_csv('test_results_with_predictions.csv', index=False)
    
    # Print some statistics
    print("\nProcessing completed!")
    print(f"Original rows: {len(test_merged)}")
    print(f"Processed rows: {len(processed_df)}")
    
    # Show a sample of the results
    print("\nSample of predictions:")
    sample_cols = ['tweet_id', 'tweet'] + [f'predicted_{e}_new' for e in emotions[:3]]  # Show first 3 emotions
    print(processed_df[sample_cols].head())

except Exception as e:
    print(f"An error occurred: {str(e)}")

100%|██████████| 50/50 [02:29<00:00,  2.99s/it]


Processing completed!
Original rows: 50
Processed rows: 50

Sample of predictions:
              tweet_id                                              tweet  \
0  1816483288128577671  They didn’t hold emergency meeting on how to s...   
1  1816374799813226778  -I want the most French paragraph about clocks...   
2  1816528401400021365  I agree, JD Vance. Lindsey Graham has no stake...   
3  1816518235099140473  Its MY honey \n\n(But ill always share if you ...   
4  1816578588591718812  overstimulation… too weak to keep their eyes o...   

  predicted_Sad_new predicted_Bored_new predicted_Lonely_new  
0                 4                   1                    1  
1                 1                   3                    1  
2                 2                   1                    1  
3                 1                   1                    1  
4                 2                   1                    1  





## Analysis of Alignment

Figure out the alignment between the two prediction and human.

In [85]:
processed_df.columns

Index(['tweet_id', 'predicted_nervous', 'predicted_sad', 'predicted_happy',
       'predicted_calm', 'predicted_excited', 'predicted_aroused',
       'predicted_angry', 'predicted_relaxed', 'predicted_fearful',
       'predicted_enthusiastic', 'predicted_still', 'predicted_satisfied',
       'predicted_bored', 'predicted_lonely', 'predicted_tired', 'tweet',
       'Nervous', 'Sad', 'Happy', 'Calm', 'Excited', 'Angry', 'Relaxed',
       'Fearful', 'Enthusiastic', 'Satisfied', 'Bored', 'Lonely', 'Tired',
       'Positive', 'Negative', 'HAP', 'HAN', 'LAP', 'LAN', 'tweet_id_numeric',
       'predicted_Nervous_new', 'predicted_Sad_new', 'predicted_Happy_new',
       'predicted_Calm_new', 'predicted_Excited_new', 'predicted_Aroused_new',
       'predicted_Angry_new', 'predicted_Relaxed_new', 'predicted_Fearful_new',
       'predicted_Enthusiastic_new', 'predicted_Still_new',
       'predicted_Satisfied_new', 'predicted_Bored_new',
       'predicted_Lonely_new', 'predicted_explanation_new'],


In [88]:
# Mean Absolute Error (MAE)
from sklearn.metrics import mean_absolute_error

def compare_annotations(df, round_suffix1='', round_suffix2='_new'):
    emotions = ['Nervous', 'Sad', 'Happy', 'Calm', 'Excited', 'Angry', 
                'Relaxed', 'Fearful', 'Enthusiastic', 'Satisfied', 
                'Bored', 'Lonely']
    
    results = []
    for emotion in emotions:
        # Get human and both GPT predictions
        human = df[emotion]  # original human annotation
        gpt1 = df[f'predicted_{emotion.lower()}']  # first round
        gpt2 = df[f'predicted_{emotion}{round_suffix2}']  # second round
        
        # Calculate MAE for both rounds
        mae1 = mean_absolute_error(human, gpt1)
        mae2 = mean_absolute_error(human, gpt2)
        
        # Calculate correlation coefficients
        corr1 = human.corr(gpt1)
        corr2 = human.corr(gpt2)
        
        results.append({
            'emotion': emotion,
            f'MAE_round{round_suffix1}': mae1,
            f'MAE_round{round_suffix2}': mae2,
            f'correlation_round{round_suffix1}': corr1,
            f'correlation_round{round_suffix2}': corr2
        })
    
    return pd.DataFrame(results)

# Use the function
comparison_df = compare_annotations(processed_df)
print("\nComparison of annotations:")
print(comparison_df)

# Calculate overall metrics
print("\nOverall metrics:")
print(f"Average MAE Round 1: {comparison_df[f'MAE_round'].mean():.3f}")
print(f"Average MAE Round 2: {comparison_df[f'MAE_round_new'].mean():.3f}")
print(f"Average Correlation Round 1: {comparison_df[f'correlation_round'].mean():.3f}")
print(f"Average Correlation Round 2: {comparison_df[f'correlation_round_new'].mean():.3f}")


Comparison of annotations:
         emotion  MAE_round  MAE_round_new  correlation_round  \
0        Nervous   0.665714       0.448571           0.659469   
1            Sad   0.662857       0.440000           0.701716   
2          Happy   0.977143       0.625714           0.672756   
3           Calm   0.688571       0.491429           0.379049   
4        Excited   0.945714       0.757143           0.483023   
5          Angry   0.705714       0.517143           0.804289   
6        Relaxed   0.608571       0.480000           0.365252   
7        Fearful   0.548571       0.471429           0.675102   
8   Enthusiastic   1.040000       0.802857           0.567991   
9      Satisfied   0.662857       0.534286           0.599415   
10         Bored   0.740000       0.700000          -0.100782   
11        Lonely   0.362857       0.425714           0.521769   

    correlation_round_new  
0                0.700420  
1                0.776988  
2                0.652604  
3             

So clearly the new prompt has improved the prediction's alignment with the human annotations.  <br>

What would be the next step here? How can we improve further?