This file explores the implied vs. expressed emotions in 100 randomly sampled tweets from the valid tweets csv. <br>
<br>
Goal: 
1. define the prompt to separate implied and expressed emotions on social media. <br>
2. find out what are the typical examples of tweets that have split of implied and expressed emotions. <br>

In [2]:
import pandas as pd
import random

In [9]:
# randomly sample 100 tweets from valid_tweets_gpt as the test_df, save the csv to implied_expressed_test folder

# Load the valid tweets with GPT predictions
valid_tweets_df = pd.read_csv('../csvs/valid_tweets_with_gpt.csv')
# Sample 100 random tweets
test_df = valid_tweets_df.sample(n=100, random_state=42)
# Save the sampled tweets to a new CSV file
test_df.to_csv('../csvs/implied_expressed_test/tweet_samples.csv', index=False)

Now we have the 100 samples in implied_expressed_test folder. <br>
We want to run a prompt to separate the implied and expressed emotion in them. <br>

In [6]:
# initialized a df to store the results
emotions = [
    "nervous", "sad", "happy", "calm", "excited", "aroused", "angry",
    "relaxed", "fearful", "enthusiastic", "still", "satisfied", "bored", "lonely"
]
columns = ['tweet_id'] + [f'implied_{emotion}' for emotion in emotions] + [f'expressed_{emotion}' for emotion in emotions]
print(f"Columns: {columns}")
results_df = pd.DataFrame(columns=columns)

Columns: ['tweet_id', 'implied_nervous', 'implied_sad', 'implied_happy', 'implied_calm', 'implied_excited', 'implied_aroused', 'implied_angry', 'implied_relaxed', 'implied_fearful', 'implied_enthusiastic', 'implied_still', 'implied_satisfied', 'implied_bored', 'implied_lonely', 'expressed_nervous', 'expressed_sad', 'expressed_happy', 'expressed_calm', 'expressed_excited', 'expressed_aroused', 'expressed_angry', 'expressed_relaxed', 'expressed_fearful', 'expressed_enthusiastic', 'expressed_still', 'expressed_satisfied', 'expressed_bored', 'expressed_lonely']


In [12]:
from emotion_detector import gpt_detect_emotion

results_list = []

total = len(test_df)
i = 0
for index, row in test_df.iterrows():
    i += 1
    tweet = row['tweet']
    print(f"\nProcessing {i}/{total} - Tweet ID: {row['tweet_id']}")
    print(f"Tweet: {tweet}")

    # Call the function to detect emotion
    result = gpt_detect_emotion(tweet, implied=True)

    # Load the json result into dicts
    explicit_dict = result['explicit']
    implied_dict = result['implied']

    # Prepare one row of results
    result_row = {'tweet_id': row['tweet_id']}
    for emotion in emotions:
        result_row[f'implied_{emotion}'] = int(implied_dict.get(emotion.capitalize(), 0))
        result_row[f'expressed_{emotion}'] = int(explicit_dict.get(emotion.capitalize(), 0))

    # Add explanations
    result_row['explicit_reason'] = explicit_dict.get('explanation', '')
    result_row['implied_reason'] = implied_dict.get('explanation', '')

    results_list.append(result_row)

# Final dataframe
results_df = pd.DataFrame(results_list)
print(f"\n✅ Completed processing {total} tweets.")


Processing 1/100 - Tweet ID: tweet-1820482399446159577
Tweet: [1985] "The De Lorean, DMC-12, with stainless-steel body and gull-wing doors, is a distinctive sports car, and one of the few 1980 cars predicted to increase in value." (Herald Examiner Collection) https://t.co/bQdN57ZF3f https://t.co/TG8N7jgBbJ

Processing 2/100 - Tweet ID: tweet-1813971527161049132
Tweet: Will Biden’s Presidential Race last the weekend?

Processing 3/100 - Tweet ID: tweet-1818829000598265922
Tweet: 17 human skulls found at suspected shrine in Uganda. https://t.co/lTdGnvPamF

Processing 4/100 - Tweet ID: tweet-1810721614943670469
Tweet: The Wire. 
-End the War on Drugs. Drugs won. 
-The institutions are structured to protect themselves, not you. 
-Those who push back against the systems will suffer. 
-Don’t trust the stats. Stats are manipulated to push personal agendas. 

“This game is rigged, man.” -Bodie https://t.co/Y6rNkhwoNr

Processing 5/100 - Tweet ID: tweet-1816156388684620036
Tweet: #biodiversity

In [13]:
# save the results to a CSV file
results_df.to_csv('../csvs/implied_expressed_test/results.csv', index=False)
print(f"Results saved to ../csvs/implied_expressed_test/results.csv")

Results saved to ../csvs/implied_expressed_test/results.csv


In [14]:
results_df_original = results_df.copy()

In [15]:
# Calculate the differences
# Emotion gap
for emotion in emotions:
    results_df[f'{emotion}_gap'] = results_df[f'implied_{emotion}'] - results_df[f'expressed_{emotion}']
results_df['total_gap'] = results_df[[f'{e}_gap' for e in emotions]].abs().sum(axis=1)

# save the updated results with gaps 
results_df.to_csv('../csvs/implied_expressed_test/results.csv', index=False)
print(f"Results with gaps saved to ../csvs/implied_expressed_test/results.csv")

Results with gaps saved to ../csvs/implied_expressed_test/results.csv


In [17]:
# return the top 10 with the largest total gap
top_10_gaps = results_df.nlargest(10, 'total_gap')
print("\nTop 10 tweets with the largest total gap:")
print(top_10_gaps[['tweet_id', 'total_gap'] + [f'{e}_gap' for e in emotions]])


Top 10 tweets with the largest total gap:
                     tweet_id  total_gap  nervous_gap  sad_gap  happy_gap  \
78  tweet-1823817038587998486         14            1        0          2   
8   tweet-1821127975812645307         12            1       -1          1   
59  tweet-1819906912604856689         10            0        1          1   
49  tweet-1819050106114482204          8            0        0          1   
77  tweet-1818098407619358841          8            0       -1          2   
1   tweet-1813971527161049132          7            2        1          0   
22  tweet-1820507630223204371          7            0        1          1   
48  tweet-1812858464638492897          7            2        0          0   
50  tweet-1819868263091991034          7            0        0          2   
85  tweet-1815968390202417246          7            1        0          1   

    calm_gap  excited_gap  aroused_gap  angry_gap  relaxed_gap  fearful_gap  \
78         1            3     

In [23]:
# merge the tweet samples with the results
merged_df = pd.merge(test_df, results_df, on='tweet_id', how='left')

# delete one of the tweet_id columns
merged_df = merged_df.loc[:, ~merged_df.columns.duplicated()]

merged_df.head()

Unnamed: 0,tweet_id_numeric,tweet_id,tweet,predicted_nervous,predicted_sad,predicted_happy,predicted_calm,predicted_excited,predicted_aroused,predicted_angry,...,aroused_gap,angry_gap,relaxed_gap,fearful_gap,enthusiastic_gap,still_gap,satisfied_gap,bored_gap,lonely_gap,total_gap
0,1820482399446159577,tweet-1820482399446159577,"[1985] ""The De Lorean, DMC-12, with stainless-...",1,1,4,2,5,3,1,...,0,0,0,0,1,0,0,0,0,2
1,1813971527161049132,tweet-1813971527161049132,Will Biden’s Presidential Race last the weekend?,2,1,1,1,3,2,1,...,0,0,0,2,1,0,0,0,0,7
2,1818829000598265922,tweet-1818829000598265922,17 human skulls found at suspected shrine in U...,2,4,1,1,1,2,3,...,1,0,0,1,0,0,0,0,0,4
3,1810721614943670469,tweet-1810721614943670469,The Wire. \n-End the War on Drugs. Drugs won. ...,3,4,1,1,2,2,4,...,0,1,0,1,0,0,0,0,1,5
4,1816156388684620036,tweet-1816156388684620036,#biodiversity\nWhat is it?\nWhy is it under th...,1,2,2,1,3,2,1,...,0,0,0,1,1,0,0,0,0,5


In [20]:
# save the results to a CSV file
merged_df.to_csv('../csvs/implied_expressed_test/results.csv', index=False)
print(f"Results saved to ../csvs/implied_expressed_test/results.csv")

Results saved to ../csvs/implied_expressed_test/results.csv


In [None]:
# reorganize predicted, expressed, and implied emotions to be next to each other
ordered_columns = (
    ['tweet_id'] +
    [col for emotion in emotions for col in [
        f'implied_{emotion}', f'expressed_{emotion}', f'predicted_{emotion}'
    ]] +
    ['implied_reason', 'explicit_reason'] +
    [f'{emotion}_gap' for emotion in emotions] +
    ['total_gap']
)
merged_df = merged_df[ordered_columns]
# save the reorganized results to a CSV file    
merged_df.head()

SyntaxError: did you forget parentheses around the comprehension target? (3709079882.py, line 3)

In [None]:

merged_df.to_csv('../csvs/implied_expressed_test/results_reorganized.csv', index=False)