# Emotion Classification with Llama 3 Instruct 8B

This is the notebook for performing inference with Llama3. The model was tested on the GoEmotions and TwitterData datasets, plus on GoEmotions in the Ekman labels version. We didn't perform a thorough mapping analysis like in the other transformers, due to the poor results. Note that the outputs of the cells are not present: due to the time and memory requirements of this model, it has been run on the remote machine, and the results saved on the .csv files (located in the "results" directory). The predictions have then been processed directly in the "model_comparison" notebook.

In [None]:
from lib.dataset_utils import *
from lib.plot_utils import *
from lib.models import Llama3
from lib.dataset_utils import Llama_EmotionsData
from sklearn.metrics import accuracy_score, jaccard_score, f1_score

model_name = "meta-llama/Meta-Llama-3-8B-Instruct"

### Loading Twitter

In [None]:
_, _, twitter_test = load_twitter_data_cleaned() 
twitter_emotions = twitter_test.columns[1:]
twitter_test_dataset = Llama_EmotionsData(twitter_test)

### Loading Goemotions

In [None]:
_, _, goemotions_test = load_goemotions_cleaned()
grouped_test_df = goemotions_apply_emotion_mapping(goemotions_test)
goemotions_emotions = goemotions_test.columns[1:]
# names starting with 'ekman_' are the mapped emotions
EKMAN_LABELS = grouped_test_df.columns[grouped_test_df.columns.str.startswith('ekman_')].tolist()
grouped_test_df = Llama_EmotionsData(grouped_test_df)
goemotions_test_dataset = Llama_EmotionsData(goemotions_test)

# Metrics definition
We make a simple local redefinition of the metrics: the ones used by the other transformers include the threshold tuning, which is not to be done with Llama.

In [None]:
def accuracy(targets, predictions):
    return accuracy_score(targets, predictions)
def jaccard(targets, predictions):
    return jaccard_score(targets, predictions, average='micro', zero_division=0)
def jaccard_samples(targets, predictions):
    return jaccard_score(targets, predictions, average='samples', zero_division=0)
def f1(targets, predictions):
    return f1_score(targets, predictions, average='macro', zero_division=0)
def f1_micro(targets, predictions):
    return f1_score(targets, predictions, average='micro', zero_division=0)

SCORES = {"accuracy": accuracy, "jaccard": jaccard, "jaccard_samples":jaccard_samples, "f1": f1, "f1_micro": f1_micro}

# Defining samples strings for 3-shots prompting.
The following samples are manually taken from training data. They are the same for every prompt: there would be no use in changing them, since they are not memorized by the model. 

In [None]:
# 3 Shots from Twitter training set
TWITTER_SAMPLES = f"""Look at these examples:\n
Sentence: i think it s the easiest time of year to feel dissatisfied\n
Answer: anger\n
Sentence: i feel as confused about life as a teenager or as jaded as a year old man\n
Answer: fear\n
Sentence: i have seen heard and read over the past couple of days i am left feeling impressed by more than a few companies\n
Answer: surprise\n
Now """ # ..."Classify the sentence" etc.

# 3 Shots from Goemotions training set
GOEMOTIONS_SAMPLES = f"""Look at these examples:\n
Sentence: Slowing things down now\n
\nDoes it evoke the emotion 'disappointment'?
Answer: False\n
Sentence: Aww... she'll probably come around eventually, I'm sure she was just jealous of [NAME]... I mean, what woman wouldn't be! lol \n
\nDoes it evoke the emotion 'amusement'?
Answer: True\n
Sentence: Super, thanks\n
\nDoes it evoke the emotion 'gratitude'?
Answer: True\n 
Now """ # ..."Consider the following sentence" etc.

# 3 Shots from Goemotions training set
GROUPED_SAMPLES = f"""Look at these examples:\n
Sentence: Slowing things down now\n
\nDoes it evoke the emotion 'fear'?
Answer: False\n
Sentence: Aww... she'll probably come around eventually, I'm sure she was just jealous of [NAME]... I mean, what woman wouldn't be! lol \n
\nDoes it evoke the emotion 'anger'?
Answer: False\n
Sentence: Super, thanks\n
\nDoes it evoke the emotion 'joy'?
Answer: True\n 
Now """ # ..."Consider the following sentence" etc.

# Twitter 0 Shots

In [None]:
twitter_0shot_llama3 = Llama3(model_name, scores = SCORES, emotions = twitter_emotions)
results = twitter_0shot_llama3.classify(twitter_test_dataset, progress_bar = True)
del twitter_0shot_llama3

# Twitter 3 Shots

In [None]:
twitter_3shot_llama3 = Llama3(model_name, scores = SCORES, emotions = twitter_emotions, samples = TWITTER_SAMPLES)
results = twitter_3shot_llama3.classify(twitter_test_dataset, progress_bar = True)
del twitter_3shot_llama3

# Goemotions 0 Shots

In [None]:
goemo_0shot_llama3 = Llama3(model_name, scores = SCORES, emotions = goemotions_emotions, mode ="multi")
results = goemo_0shot_llama3.classify(goemotions_test_dataset, progress_bar = True)
del goemo_0shot_llama3

# Goemotions 3 Shots

In [None]:
goemo_3shot_llama3 = Llama3(model_name, scores = SCORES, emotions = goemotions_emotions, mode = "multi", samples = GOEMOTIONS_SAMPLES)
results = goemo_3shot_llama3.classify(goemotions_test_dataset, progress_bar = True)
del goemo_3shot_llama3

# Grouped Goemotions 0 Shots

In [None]:
grouped_0shot_llama3 = Llama3(model_name, scores = SCORES, emotions = EKMAN_LABELS, mode ="grouped")
results = grouped_0shot_llama3.classify(grouped_test_df, progress_bar = True)
del grouped_0shot_llama3

# Grouped Goemotions 3 Shots

In [None]:
grouped_3shot_llama3 = Llama3(model_name, scores = SCORES, emotions = EKMAN_LABELS, mode ="grouped", samples = GROUPED_SAMPLES)
results = grouped_3shot_llama3.classify(grouped_test_df, progress_bar = True)
del grouped_3shot_llama3