# ChatGPT API: Zero-Shot Learning with Prompt Engineering
## ACL 2023 Conference
## WASSA 2023 Shared Task on Empathy, Emotion, and Personality Detection in Interactions
More details [here](https://codalab.lisn.upsaclay.fr/competitions/11167#learn_the_details)

In [1]:
import openai
import os
import re
import numpy as np
import pandas as pd
import time
import tiktoken
import backoff
from typing import List
from sklearn.metrics import classification_report, multilabel_confusion_matrix
from tqdm.autonotebook import tqdm
tqdm.pandas()

pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 400)

# to see all env variables:
#for name, value in os.environ.items():
#    print("{0}: {1}".format(name, value))

  from tqdm.autonotebook import tqdm


In [2]:
def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0301"):
    '''Return number of tokens used in a list of messages for ChatGPT'''
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        #print("Warning: model not found. Using cl100k_base encoding.")
        encoding = tiktoken.get_encoding("cl100k_base")
    if model == "gpt-3.5-turbo":
        #print("Warning: gpt-3.5-turbo may change over time. Returning num tokens assuming gpt-3.5-turbo-0301.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0301")
    elif model == "gpt-4":
        #print("Warning: gpt-4 may change over time. Returning num tokens assuming gpt-4-0314.")
        return num_tokens_from_messages(messages, model="gpt-4-0314")
    elif model == "gpt-3.5-turbo-0301":
        tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokens_per_name = -1  # if there's a name, the role is omitted
    elif model == "gpt-4-0314":
        tokens_per_message = 3
        tokens_per_name = 1
    else:
        raise NotImplementedError(f"""num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.""")
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    return num_tokens

In [3]:
random_state = 47

# Load data

In [4]:
file1    = 'data/df_train.pkl'
df_train = pd.read_pickle(file1)

file2    = 'data/df_dev.pkl'
df_dev   = pd.read_pickle(file2)

print(df_train.shape, df_dev.shape)

2023-04-29 23:26:04.054415: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


(792, 28) (208, 26)


In [5]:
cols = [ 'essay_clean_spellchecked', 'emotion', ]
with pd.option_context('display.max_colwidth', None):
    display(df_dev[cols].head(25))

Unnamed: 0,essay_clean_spellchecked,emotion
0,"How sad is it that this kind of pain and suffering, and those kind of living conditions still exist today? what a gap we have in society between developed countries and those that aren't. It's crazy to drive around the US and see all the money people spend on pointless things, and then to think about how the people in Haiti are living.",[Sadness]
1,The article is kind of tragic and hits close to home as I am the son of Haitian immigrants. Haiti has a lot of problems that only become exaggerated during natural disasters. I think what the Haitian people really need from the international community is help developing infrastructure so they can address these issues themselves. Foreign aid only acts as a band aid.,[Sadness]
2,"I think that these kinds of stories, are sad, yet inspirational and leave you with kind of a good feeling. Even though his story is sad, it's cool and inspiring/motivational to see that he rose up against his circumstances. That he worked hard to make something of himself and he succeeded in what he wanted to do.",[Sadness]
3,It's crazy that random accidents like this happen everyday. I am not a baseball fan but of course enjoy a baseball game every now and again. I lived and worked in Miami too so I am vaguely familiar with that baseball player who unfortunately passed away. The effort to save him was great but unfortunately bad things seem to happen every day. He was so young too so it makes it worse.,[Neutral]
4,"This story makes me so so sad.... As someone who also grew up in the system, I can strongly relate. It's sad that America has not figured out a better and more safe system to handle kid's without parents or with parents who are unfit. A lot of the times, the system is no better, or even worse than the situation kids were in before, and I think this story is a good example of that.",[Sadness]
5,"After reading the article, my first reaction and feeling is that i feel really bad for the brothers. I feel like people their age should not have to be locked inside a jail cell. They should be out in the world improving themselves and being normal people. It's also really sad for the family members of these brothers as well because they are probably all suffering and worrying.",[Sadness]
6,"I didn't know coal mining had such adverse effects on the surrounding environment. It has basically ruined the lives of the people who live nearby these mines. And the animal populations too, imagine a heard of elephants not able to sustain themselves with the food available and needing to invade human territory...They must really be in a desperate situation.",[Neutral]
7,"This is very sad. I can't imagine having elephants come stampede my house in the middle of the night. What a terrible and sad situation, and these poor people can't even do anything about it. Someone needs to stop the deforestation and stop polluting the air these people breathe, it is not right that they are doing and all for the sake of turning a profit.",[Sadness]
8,"Guys, reading this article really hits home for me. If you or someone you know is having suicidal thoughts, please get help from the available sources. Suicide is no joke and it is a shame when someone does not get the help they need. I've struggled with this for a few years now but I got the help I needed. This woman was not as fortunate.",[Sadness]
9,Hey guys. So I just read this article about Iraqi Christians being persecuted by Muslims in Iraq. I don't understand why people of different religious backgrounds can't get along there. I'm sure it is a cultural thing but it is such unnecessary violence and conflict. It hurts both sides and I wish there was a way we could get them to set aside their differences. But not military action. We don't need another war.,[Neutral]


# Prompts

In [47]:
# categories are listed in the decreasing frequency order
prompt = """
1 - You are given a list of 8 emotion categories: Sadness, Neutral, Anger, Disgust, Fear, Hope, Surprise, Joy.
2 - Act as a helpful emotion classifier, analyze the text delimited below with triple backticks \
and classify this text into one most relevant emotion category from the above list.
3 - You may add another emotion category from the above list ONLY AND ONLY IF this other category is also relevant \
to the text delimited below with triple backticks.
4 - Carefully analyze the text for any emotions present before outputting the correct emotion.
5 - Output just the category or categories and nothing else. If there are two relevant emotion categories: \
sort them alphabetically, concatenate with a forward slash, and output only them and nothing else.

Text: ```{}```
"""
s = 'This is a text sample'
print(prompt.format(s), '\n')


1 - You are given a list of 8 emotion categories: Sadness, Neutral, Anger, Disgust, Fear, Hope, Surprise, Joy.
2 - Act as a helpful emotion classifier, analyze the text delimited below with triple backticks and classify this text into one most relevant emotion category from the above list.
3 - You may add another emotion category from the above list ONLY AND ONLY IF this other category is also relevant to the text delimited below with triple backticks.
4 - Carefully analyze the text for any emotions present before outputting the correct emotion.
5 - Output just the category or categories and nothing else. If there are two relevant emotion categories: sort them alphabetically, concatenate with a forward slash, and output only them and nothing else.

Text: ```This is a text sample```
 



In [35]:
# Using followup Q1 can improve the reponse. If the reponse has multiple words, first parse it and try to find
# the category in it. Only if this doesn't work, send followup Q2. ChatGPT can offer the second category in reponse
# to Q1, but can change its mind again and offer a third category if asked Q2
followup = """Are you sure about that? If yes, output the same category or categories; \
if no, change the category or categories"""
followup

'Are you sure about that? If yes, output the same category or categories; if no, change the category or categories'

# Sample request

In [36]:
#openai.api_key = "sk-..."
openai.api_key = os.getenv("OPENAI_API_KEY2")
model          = 'gpt-3.5-turbo'
labels_set     = {'sadness', 'neutral', 'anger', 'disgust', 'hope', 'fear', 'surprise', 'joy'} 
clean          = re.compile(r'[^a-zA-Z ]+')
multi_spaces   = re.compile('\s{2,}')

In [37]:
def verify_label(label_):
    '''
       Verify if label_ contains any of the categories
       from the predefined set of labels
    '''
    label_ = clean.sub(' ', label_)
    label_ = multi_spaces.sub(' ', label_).lower().split()
    res    = [i.capitalize() for i in label_ if i in labels_set]
    res    = list(set(res))
    return '/'.join(sorted(res)) if res else None

In [38]:
def verify_num_tokens(model, messages):
    '''Check that there is enough tokens available for a ChatGPT repsonse'''
    num_tokens_tiktoken = num_tokens_from_messages(messages, model)
    if num_tokens_tiktoken > 3950:
        print(f'Number of tokens is {num_tokens_tiktoken} which exceeds 3950')
        print(f'TEXT: {text_}\n')
        return False
    else:
        return True


@backoff.on_exception(backoff.expo, openai.error.RateLimitError, max_time=10)
def get_response(model, messages, temperature=0.5, max_tokens=None):
    '''Send request, return reponse'''
    response  = openai.ChatCompletion.create(
        model = model,
        messages = messages,
        temperature = temperature,        # range(0,2), the more the less deterministic / focused
        top_p = 1,                        # top probability mass, e.g. 0.1 = only tokens from top 10% proba mass
        n = 1,                            # number of chat completions
        #max_tokens = max_tokens,          # tokens to return
        stream = False,        
        stop=None,                        # sequence to stop generation (new line, end of text, etc.)
        )
    content = response['choices'][0]['message']['content'].strip()
    #num_tokens_api = response['usage']['prompt_tokens']
    return content

In [39]:
def classify_text(text_, prompt_):
    '''Classify text_ using prompt_ and ChatGPT API'''
        
    # compose messages and check num_tokens
    messages = [
            { "role": "system", "content": "You are a helpful emotion classifier.", },
            { "role": "user", "content": prompt_.format(text_), },
            ]
    if not verify_num_tokens(model, messages): return None
    label_    = get_response(model, messages, )
    old_label = label_
    label_    = verify_label(label_)        # get just the category if response is too long
        
    # if label not found in response text - second, extended chat
    if label_ is None:
        print(f'Asking the clarifying question for essay: {text_}. Previous answer: {old_label}')
        messages += [
            { "role": "assistant", "content": old_label, },
            { "role": "user", "content": followup, }
            ]        
        label_    = get_response(model, messages, )
        old_label = label_
        label_    = verify_label(label_)        # get just the category if response is too long
    print(f"Previous answer: '{old_label}'. Final answer: '{label_}'")
            
    return label_ if label_ is not None else old_label

In [40]:
def classify_text_with_clarifying(text_, prompt_):
    '''
       Classify text_ using prompt_ and ChatGPT API,
       then clarify response with followup question -
       this can help make the response more precise
    '''
        
    # compose messages and check num_tokens
    messages = [
            { "role": "system", "content": "You are a helpful text classifier.", },
            { "role": "user", "content": prompt_.format(text_), },
            ]
    if not verify_num_tokens(model, messages): return None
    label_    = get_response(model, messages)
    old_label = label_
    label_    = verify_label(label_)                      # get just the category if response is too long
        
    # ask additional clarifying question - sometimes it helps
    messages += [
        { "role": "assistant", "content": old_label, },
        { "role": "user", "content": followup, }
        ]
    #time.sleep( random.uniform(1.1, 1.8) )                # wait not to overload ChatGPT
    label2_    = get_response(model, messages)
    old_label2 = label2_
    label2_    = verify_label(label2_)                    # get just the category if response is too long

    return old_label, label_, old_label2, label2_

In [51]:
idx = 2
text, groundtruth_labels = df_dev[['essay_clean_spellchecked', 'emotion']].values[idx]
label = classify_text(text, prompt)
messages = [
    {'role': 'user', 'content': prompt.format(text)},
]

print(prompt.format( text ))
print(f"\nGROUNDTRUTH LABEL:\n{'/'.join( groundtruth_labels )}")
print(f"\nPREDICTED LABEL:\n{label}")
print(f'\nTOTAL TOKENS: { num_tokens_from_messages(messages, model) }')

Previous answer: 'The most relevant emotion category for this text is "Hope".'. Final answer: 'Hope'

1 - You are given a list of 8 emotion categories: Sadness, Neutral, Anger, Disgust, Fear, Hope, Surprise, Joy.
2 - Act as a helpful emotion classifier, analyze the text delimited below with triple backticks and classify this text into one most relevant emotion category from the above list.
3 - You may add another emotion category from the above list ONLY AND ONLY IF this other category is also relevant to the text delimited below with triple backticks.
4 - Carefully analyze the text for any emotions present before outputting the correct emotion.
5 - Output just the category or categories and nothing else. If there are two relevant emotion categories: sort them alphabetically, concatenate with a forward slash, and output only them and nothing else.

Text: ```I think that these kinds of stories, are sad, yet inspirational and leave you with kind of a good feeling. Even though his story i

# Batch Request

In [21]:
# target variables
label2key = {   
    'Anger':    0,
    'Disgust':  1,
    'Fear':     2,
    'Hope':     3,    
    'Joy':      4,
    'Neutral':  5,
    'Sadness':  6,
    'Surprise': 7,
}
key2label = {v: k for k,v in label2key.items()}
print(key2label)

{0: 'Anger', 1: 'Disgust', 2: 'Fear', 3: 'Hope', 4: 'Joy', 5: 'Neutral', 6: 'Sadness', 7: 'Surprise'}


In [22]:
def get_target(emotions: List[str])->List[int]:
    '''
        Convert list of strings with categories into list of 0s and 1s with length 8 because there are 8 categories;
        1 in the i-th position means that this essay belongs to the i-th category as in key2label[i]
    '''
    res  = [0]*8
    idxs = [label2key[e] for e in emotions]    
    for idx in idxs:
        res[idx] = 1
    return res

In [26]:
df_dev.head()

Unnamed: 0,article_id,conversation_id,speaker_number,essay_id,speaker_id,essay,essay_clean,split,gender,education,race,age,income,emotion,emotion_count,char_length,word_length,target_encoded,article,article_clean,essay_clean_docs,essay_clean_spellchecked,article_clean_docs,article_clean_spellchecked,compare1,compare2,pred_all,pred_encoded
0,35,1,1,0,68,How sad is it that this kind of pain and suffe...,How sad is it that this kind of pain and suffe...,dev,2,2,1,21,20000,[Sadness],1,339,63,"[0, 0, 0, 0, 0, 0, 1, 0]","A month after Hurricane Matthew, 800,000 Haiti...","A month after Hurricane Matthew, 800,000 Haiti...","(How, sad, is, it, that, this, kind, of, pain,...",How sad is it that this kind of pain and suffe...,"(A, month, after, Hurricane, Matthew, ,, 800,0...","A month after Hurricane Matthew, 800,000 Haiti...",False,False,Sadness,"[0, 0, 0, 0, 0, 0, 1, 0]"
1,35,4,1,3,79,The article is kind of tragic and hits close t...,The article is kind of tragic and hits close t...,dev,1,6,3,33,64000,[Sadness],1,367,63,"[0, 0, 0, 0, 0, 0, 1, 0]","A month after Hurricane Matthew, 800,000 Haiti...","A month after Hurricane Matthew, 800,000 Haiti...","(The, article, is, kind, of, tragic, and, hits...",The article is kind of tragic and hits close t...,"(A, month, after, Hurricane, Matthew, ,, 800,0...","A month after Hurricane Matthew, 800,000 Haiti...",True,False,Hope/Sadness,"[0, 0, 0, 1, 0, 0, 1, 0]"
2,213,7,1,6,68,"I think that these kinds of stories, are sad, ...","I think that these kinds of stories, are sad, ...",dev,2,2,1,21,20000,[Sadness],1,315,57,"[0, 0, 0, 0, 0, 0, 1, 0]",Miami Marlins star pitcher Jose Fernandez kill...,Miami Marlins star pitcher Jose Fernandez kill...,"(I, think, that, these, kinds, of, stories, ,,...","I think that these kinds of stories, are sad, ...","(Miami, Marlins, star, pitcher, Jose, Fernande...",Miami Marlins star pitcher Jose Fernandez kill...,True,False,Hope/Sadness,"[0, 0, 0, 1, 0, 0, 1, 0]"
3,213,9,1,8,84,It's crazy that random accidents like this hap...,It's crazy that random accidents like this hap...,dev,2,4,1,25,55000,[Neutral],1,385,72,"[0, 0, 0, 0, 0, 1, 0, 0]",Miami Marlins star pitcher Jose Fernandez kill...,Miami Marlins star pitcher Jose Fernandez kill...,"(It, 's, crazy, that, random, accidents, like,...",It's crazy that random accidents like this hap...,"(Miami, Marlins, star, pitcher, Jose, Fernande...",Miami Marlins star pitcher Jose Fernandez kill...,True,False,Sadness,"[0, 0, 0, 0, 0, 0, 1, 0]"
4,78,12,1,11,68,This story makes me so so sad.... As someone w...,This story makes me so so sad.... As someone w...,dev,2,2,1,21,20000,[Sadness],1,384,76,"[0, 0, 0, 0, 0, 0, 1, 0]",Brothers Behind Bars — “The only photograph I...,"Brothers Behind Bars — ""The only photograph I ...","(This, story, makes, me, so, so, sad, ...., As...",This story makes me so so sad.... As someone w...,"(Brothers, Behind, Bars, —, "", The, only, phot...","Brothers Behind Bars — ""The only photograph I ...",True,True,Sadness,"[0, 0, 0, 0, 0, 0, 1, 0]"


In [60]:
df_dev['pred_all'] = df_dev['essay'].progress_apply( lambda x: classify_text(x, prompt) )
df_dev['pred_all'].value_counts()

  0%|          | 0/208 [00:00<?, ?it/s]

Previous answer: 'The most relevant emotion category for this text is "Sadness".'. Final answer: 'Sadness'
Previous answer: 'Sadness'. Final answer: 'Sadness'
Previous answer: 'The most relevant emotion category for this text is Hope.'. Final answer: 'Hope'
Previous answer: 'Sadness/Neutral'. Final answer: 'Neutral/Sadness'
Previous answer: 'Sadness.'. Final answer: 'Sadness'
Previous answer: 'The most relevant emotion category for the given text is Sadness.'. Final answer: 'Sadness'
Previous answer: 'Sadness'. Final answer: 'Sadness'
Previous answer: 'The most relevant emotion category for the given text is "Sadness".'. Final answer: 'Sadness'
Previous answer: 'The most relevant emotion category for the given text is "Sadness".'. Final answer: 'Sadness'
Previous answer: 'The most relevant emotion category for this text is "Sadness".'. Final answer: 'Sadness'
Previous answer: 'The most relevant emotion category for the given text is "Sadness".'. Final answer: 'Sadness'
Previous answer:

Previous answer: 'The most relevant emotion category for this text is "Disgust".'. Final answer: 'Disgust'
Previous answer: 'The most relevant emotion category for this text is "Sadness".'. Final answer: 'Sadness'
Previous answer: 'The most relevant emotion category for the given text is "Sadness".'. Final answer: 'Sadness'
Previous answer: 'Fear/Hate'. Final answer: 'Fear'
Previous answer: 'Anger/Disgust'. Final answer: 'Anger/Disgust'
Previous answer: 'Sadness/Disgust.'. Final answer: 'Disgust/Sadness'
Previous answer: 'The most relevant emotion category for the given text is Neutral.'. Final answer: 'Neutral'
Previous answer: 'Anger/Fear.'. Final answer: 'Anger/Fear'
Previous answer: 'Fear/Neutral.'. Final answer: 'Fear/Neutral'
Previous answer: 'The most relevant emotion category for this text is "Sadness/Hope".'. Final answer: 'Hope/Sadness'
Previous answer: 'The most relevant emotion category for the given text is "Disgust".'. Final answer: 'Disgust'
Previous answer: 'Anger/Fear'

Previous answer: 'Anger/Sadness.'. Final answer: 'Anger/Sadness'
Previous answer: 'The most relevant emotion category for this text is "Hope".'. Final answer: 'Hope'
Previous answer: 'Hope/Neutral.'. Final answer: 'Hope/Neutral'
Previous answer: 'The most relevant emotion category for the given text is "Sadness".'. Final answer: 'Sadness'
Previous answer: 'The most relevant emotion category for the given text is "Hope".'. Final answer: 'Hope'
Previous answer: 'Sadness/Disgust'. Final answer: 'Disgust/Sadness'
Previous answer: 'Sadness/ Fear'. Final answer: 'Fear/Sadness'
Previous answer: 'Fear/Hope'. Final answer: 'Fear/Hope'
Previous answer: 'Sadness.'. Final answer: 'Sadness'
Previous answer: 'Anger/Disgust'. Final answer: 'Anger/Disgust'
Previous answer: 'The most relevant emotion category for this text is "Sadness".'. Final answer: 'Sadness'
Previous answer: 'Sadness.'. Final answer: 'Sadness'
Previous answer: 'The most relevant emotion category for this text is Sadness.'. Final an

Sadness                  91
Hope                     14
Disgust                  14
Anger                    12
Anger/Disgust            11
Hope/Sadness             11
Neutral/Sadness          10
Disgust/Sadness           7
Fear                      7
Anger/Fear                6
Disgust/Fear              4
Hope/Neutral              3
Neutral                   3
Fear/Sadness              3
Anger/Sadness             2
Fear/Neutral              2
Hope/Joy                  2
Fear/Surprise             1
Anger/Disgust/Sadness     1
Anger/Hope                1
Joy/Sadness               1
Neutral/Surprise          1
Fear/Hope                 1
Name: pred_all, dtype: int64

In [36]:
# if a followup question was used - clean reponse
df_dev['pred'] = df_dev['pred_all'].apply( lambda x: x[3] )
print('Null values:\n', df_dev.isna().sum(), sep='')
df_dev['pred'].value_counts()

Null values:
article_id                    0
conversation_id               0
speaker_number                0
essay_id                      0
speaker_id                    0
essay                         0
essay_clean                   0
split                         0
gender                        0
education                     0
race                          0
age                           0
income                        0
emotion                       0
emotion_count                 0
char_length                   0
word_length                   0
target_encoded                0
article                       0
article_clean                 0
essay_clean_docs              0
essay_clean_spellchecked      0
article_clean_docs            0
article_clean_spellchecked    0
compare1                      0
compare2                      0
pred_all                      0
pred_encoded                  0
pred                          3
dtype: int64


Sadness             54
Fear/Sadness        31
Hope/Sadness        24
Anger/Sadness       23
Disgust/Sadness     19
Anger/Disgust        9
Hope                 8
Fear                 6
Disgust              5
Neutral/Sadness      4
Anger/Fear           4
Hope/Neutral         4
Disgust/Fear         3
Anger                2
Anger/Neutral        2
Neutral              2
Fear/Neutral         1
Disgust/Neutral      1
Sadness/Surprise     1
Joy                  1
Fear/Surprise        1
Name: pred, dtype: int64

In [42]:
# if a followup question was used - review why some predictions were NaNs
temp = df_dev[ df_dev['pred'].isna() ]
print(temp.index)
temp[['pred_all', 'pred']].values.tolist()

Int64Index([150, 167, 173], dtype='int64')


[[('Neutral',
   'Neutral',
   'I think the text is more related to Disappointment.',
   None),
  None],
 [('Sadness',
   'Sadness',
   "Yes, I am sure. The text expresses concern about children not being able to eat and the country's failure to provide basic necessities to its people, which is a sad situation.",
   None),
  None],
 [('Hope/Sadness',
   'Hope/Sadness',
   "Yes, I'm sure. The text expresses sympathy for the situation but also a sense of helplessness, and ends with a hopeful attitude towards the future.",
   None),
  None]]

In [43]:
# if a followup question was used - manually assign missing predictions
df_dev.at[150, 'pred'] = 'Neutral'
df_dev.at[167, 'pred'] = 'Sadness'
df_dev.at[173, 'pred'] = 'Hope/Sadness'

In [61]:
# binarize predictions
df_dev['pred_encoded'] = df_dev['pred_all'].apply( lambda x: get_target(x.split('/')) )

In [62]:
y_dev_encoded      = np.array( df_dev['target_encoded'].values.tolist() )
y_dev_pred_encoded = np.array( df_dev['pred_encoded'].values.tolist() )
labels = list(label2key.keys())
print( classification_report( y_dev_encoded, y_dev_pred_encoded, target_names=labels, digits=4 ) )

              precision    recall  f1-score   support

       Anger     0.6061    0.5263    0.5634        38
     Disgust     0.4054    0.6250    0.4918        24
        Fear     0.2083    0.6250    0.3125         8
        Hope     0.2500    0.5000    0.3333        16
         Joy     0.3333    0.5000    0.4000         2
     Neutral     0.6316    0.2222    0.3288        54
     Sadness     0.6905    0.8614    0.7665       101
    Surprise     0.0000    0.0000    0.0000         3

   micro avg     0.5362    0.6016    0.5670       246
   macro avg     0.3906    0.4825    0.3995       246
weighted avg     0.5810    0.6016    0.5570       246
 samples avg     0.5801    0.6202    0.5768       246



In [56]:
file = 'data/predictions_EMO.tsv'
with open(file, 'w', encoding='utf-8') as f:
    f.write('\n'.join(df_dev['pred_all'].tolist()))

## Results

__CONCLUSIONS__:
* Temperature 0.5 is better than 0. 1.0 didn't improve the results
* Column essay_clean_spellchecked shows better results than essay_clean or essay
* The results are not deterministic (because of the temperature?)

__Experiment 7__  
Same as experiment 5, but column = essay  
```
              precision    recall  f1-score   support

       Anger     0.6061    0.5263    0.5634        38
     Disgust     0.4054    0.6250    0.4918        24
        Fear     0.2083    0.6250    0.3125         8
        Hope     0.2500    0.5000    0.3333        16
         Joy     0.3333    0.5000    0.4000         2
     Neutral     0.6316    0.2222    0.3288        54
     Sadness     0.6905    0.8614    0.7665       101
    Surprise     0.0000    0.0000    0.0000         3

   micro avg     0.5362    0.6016    0.5670       246
   macro avg     0.3906    0.4825    0.3995       246
weighted avg     0.5810    0.6016    0.5570       246
 samples avg     0.5801    0.6202    0.5768       246
```

__Experiment 6__  
Same as experiment 5, but column = essay_clean  
```
              precision    recall  f1-score   support

       Anger     0.6216    0.6053    0.6133        38
     Disgust     0.4138    0.5000    0.4528        24
        Fear     0.2381    0.6250    0.3448         8
        Hope     0.2188    0.4375    0.2917        16
         Joy     0.5000    0.5000    0.5000         2
     Neutral     0.7222    0.2407    0.3611        54
     Sadness     0.6719    0.8515    0.7511       101
    Surprise     0.1429    0.3333    0.2000         3

   micro avg     0.5401    0.6016    0.5692       246
   macro avg     0.4412    0.5117    0.4394       246
weighted avg     0.5986    0.6016    0.5633       246
 samples avg     0.5849    0.6130    0.5784       246
```

__Experiment 5 (BEST)__ - third place in the leaderboard on the dev set!  
Temperature = 0.5, column = essay_clean_spellchecked  
_Prompt_
```
1 - You are given a list of 8 emotion categories: Sadness, Neutral, Anger, Disgust, Fear, Hope, Surprise, Joy.
2 - Act as a helpful emotion classifier, analyze the text delimited below with triple backticks and classify this text into one most relevant emotion category from the above list.
3 - You may add another emotion category from the above list ONLY AND ONLY IF this other category is also relevant to the text delimited below with triple backticks.
4 - Carefully analyze the text for any emotions present before outputting the correct emotion.
5 - Output just the category or categories and nothing else. If there are two relevant emotion categories: sort them alphabetically, concatenate with a forward slash, and output only them and nothing else.

Text: ```This is a text sample```
```
```
             precision    recall  f1-score   support

       Anger     0.6471    0.5789    0.6111        38
     Disgust     0.3750    0.5000    0.4286        24
        Fear     0.1923    0.6250    0.2941         8
        Hope     0.3103    0.5625    0.4000        16
         Joy     1.0000    0.5000    0.6667         2
     Neutral     0.6667    0.2222    0.3333        54
     Sadness     0.6797    0.8614    0.7598       101
    Surprise     0.1429    0.3333    0.2000         3

   micro avg     0.5418    0.6057    0.5720       246
   macro avg     0.5017    0.5229    0.4617       246
weighted avg     0.5982    0.6057    0.5648       246
 samples avg     0.5801    0.6154    0.5744       246
```

__Experiment 4__  
Column = essay_clean_spellchecked  
Same as Experiment 3, but temperature = 1.0
```
              precision    recall  f1-score   support

       Anger     0.6957    0.4211    0.5246        38
     Disgust     0.4000    0.3333    0.3636        24
        Fear     0.2500    0.7500    0.3750         8
        Hope     0.2558    0.6875    0.3729        16
         Joy     0.0000    0.0000    0.0000         2
     Neutral     0.5185    0.2593    0.3457        54
     Sadness     0.5882    0.8911    0.7087       101
    Surprise     0.0000    0.0000    0.0000         3

   micro avg     0.4932    0.5894    0.5370       246
   macro avg     0.3385    0.4178    0.3363       246
weighted avg     0.5266    0.5894    0.5198       246
 samples avg     0.5312    0.6010    0.5425       246
```

__Experiment 3__  
Column = essay_clean_spellchecked  
Same as Experiment 1, but temperature = 0.5
```
              precision    recall  f1-score   support

       Anger     0.6923    0.4737    0.5625        38
     Disgust     0.3333    0.2500    0.2857        24
        Fear     0.2400    0.7500    0.3636         8
        Hope     0.2439    0.6250    0.3509        16
         Joy     1.0000    0.5000    0.6667         2
     Neutral     0.5200    0.2407    0.3291        54
     Sadness     0.6065    0.9307    0.7344       101
    Surprise     0.0000    0.0000    0.0000         3

   micro avg     0.5034    0.6016    0.5481       246
   macro avg     0.4545    0.4713    0.4116       246
weighted avg     0.5344    0.6016    0.5286       246
 samples avg     0.5577    0.6106    0.5617       246
```

__Experiment 2__  
Temperature = 0, column = essay_clean_spellchecked  
_Double Prompt_ - having a clarifying question doesn't improve the results  
Prompt 1  
1. You are given a list of 8 emotion categories: Anger, Disgust, Fear, Hope, Joy, Neutral, Sadness, Surprise.
2. Act as a helpful text classifier and classify the text provided below into one most relevant emotion category.
3. You may add another emotion category from the list ONLY AND ONLY IF this other category is also relevant to the text provided below.
4. Output only the most relevant emotion category and nothing else.
5. If there are two relevant categories, sort them alphabetically, concatenate them with a forward slash, and output them

Text: '''I just read an article about bullying in France. Apparently, the suicide of a 17 year old French girl has caused the country to reevaluate its approach toward bullying. I personally have never experienced intense bullying or ridicule first hand or even second hand. I guess I was a pretty likable kid after all. I think the key to solving problems like this is most definitely inclusion. The reason children feel this way is because they perceive themselves to be absolutely alone.'''

Prompt 2  
Are you sure about that? If yes, output the same category or categories; if no, change the category or categories
```
              precision    recall  f1-score   support

       Anger     0.5500    0.5789    0.5641        38
     Disgust     0.2703    0.4167    0.3279        24
        Fear     0.1522    0.8750    0.2593         8
        Hope     0.2432    0.5625    0.3396        16
         Joy     0.0000    0.0000    0.0000         2
     Neutral     0.6000    0.1667    0.2609        54
     Sadness     0.6139    0.9604    0.7490       101
    Surprise     0.0000    0.0000    0.0000         3

   micro avg     0.4583    0.6260    0.5292       246
   macro avg     0.3037    0.4450    0.3126       246
weighted avg     0.5159    0.6260    0.5144       246
 samples avg     0.4832    0.6418    0.5288       246
```

__Experiment 1__  
Temperature = 0, column = essay_clean_spellchecked  
_Prompt_:
1. You are given a list of 8 emotion categories: Anger, Disgust, Fear, Hope, Joy, Neutral, Sadness, Surprise.
2. Act as a helpful text classifier and classify the text provided below into one most relevant emotion category.
3. You may add another emotion category from the list ONLY AND ONLY IF this other category is also relevant to the text provided below.
4. Output only the most relevant emotion category and nothing else.
5. If there are two relevant categories, sort them alphabetically, concatenate them with a forward slash, and output them

Text: '''I just read an article about bullying in France. Apparently, the suicide of a 17 year old French girl has caused the country to reevaluate its approach toward bullying. I personally have never experienced intense bullying or ridicule first hand or even second hand. I guess I was a pretty likable kid after all. I think the key to solving problems like this is most definitely inclusion. The reason children feel this way is because they perceive themselves to be absolutely alone.'''
```
              precision    recall  f1-score   support

       Anger     0.6667    0.4211    0.5161        38
     Disgust     0.4000    0.3333    0.3636        24
        Fear     0.1786    0.6250    0.2778         8
        Hope     0.2727    0.5625    0.3673        16
         Joy     0.0000    0.0000    0.0000         2
     Neutral     0.5909    0.2407    0.3421        54
     Sadness     0.6115    0.9505    0.7442       101
    Surprise     0.0000    0.0000    0.0000         3

   micro avg     0.5087    0.5976    0.5495       246
   macro avg     0.3400    0.3916    0.3264       246
weighted avg     0.5463    0.5976    0.5288       246
 samples avg     0.5649    0.6034    0.5633       246
 ```