<a href="https://www.kaggle.com/code/karinl90/twitter-emotion?scriptVersionId=212430840" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

✅ Today we're going to study the accuracy of the dataframe result.

First, we need to import the necesary libraries and the dataset.

In [1]:
import pandas as pd
from sklearn.metrics import accuracy_score

In [2]:
data = pd.read_parquet('/kaggle/input/twitter-emotion-classification-dataset/train-00000-of-00001.parquet')

After that, show the dataset and map the labels to emotion names.

In [3]:
data.head(6)

Unnamed: 0,text,label
0,i feel awful about it too because it s my job ...,0
1,im alone i feel awful,0
2,ive probably mentioned this before but i reall...,1
3,i was feeling a little low few days back,0
4,i beleive that i am much more sensitive to oth...,2
5,i find myself frustrated with christians becau...,2


In [4]:
emotion_map = {0: 'sadness', 1: 'joy', 2: 'love', 3:'anger', 4:'fear', 5:'surprise'}
data['emotion'] = data['label'].map(emotion_map)

In [5]:
print(data)

                                                     text  label  emotion
0       i feel awful about it too because it s my job ...      0  sadness
1                                   im alone i feel awful      0  sadness
2       ive probably mentioned this before but i reall...      1      joy
3                i was feeling a little low few days back      0  sadness
4       i beleive that i am much more sensitive to oth...      2     love
...                                                   ...    ...      ...
416804  that was what i felt when i was finally accept...      1      joy
416805  i take every day as it comes i m just focussin...      4     fear
416806      i just suddenly feel that everything was fake      0  sadness
416807  im feeling more eager than ever to claw back w...      1      joy
416808  i give you plenty of attention even when i fee...      0  sadness

[416809 rows x 3 columns]


⬆️ It's time to import our model.

In [6]:
from transformers import pipeline

In [7]:
classifier = pipeline("text-classification", model = "j-hartmann/emotion-english-distilroberta-base", return_all_scores=True)

config.json:   0%|          | 0.00/1.00k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/329M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/294 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]



Due to our dataset has more than 400,000 rows, we're going to create a new dataset using the first 100 entries.

In [8]:
df_100 = data.loc[:100]

In [9]:
df_100

Unnamed: 0,text,label,emotion
0,i feel awful about it too because it s my job ...,0,sadness
1,im alone i feel awful,0,sadness
2,ive probably mentioned this before but i reall...,1,joy
3,i was feeling a little low few days back,0,sadness
4,i beleive that i am much more sensitive to oth...,2,love
...,...,...,...
96,i feel like talking in a snobbish uppity accen...,3,anger
97,i don t feel as needy and desperate to prove t...,0,sadness
98,i feel bad for my mum who carries everything a...,0,sadness
99,i have a feeling some people really outgoing p...,1,joy


Verify the labels available.

In [10]:
labels_available = ["joy", "love", "anger", "fear", "surprise","sadness"]

Create a function to get the emotion with the highest value.

In [11]:
def get_predicted_label(scores):
    return max(scores, key=lambda x: x['score'])['label']

In [12]:
y_true = []
y_pred = []

For loop to evaluate data and print results.

In [13]:
for index, row in df_100.iterrows():
    text = row['text']
    original_label = row['emotion']
    if original_label not in labels_available:
        print(f"True label '{true_label}' is not available.")
        continue

    prediction_scores = classifier(text)
    predicted_label = get_predicted_label(prediction_scores[0])
    y_true.append(original_label)
    y_pred.append(predicted_label)
    print(f"Tweet: '{text}'")
    print(f"True label: {original_label}")
    print(f"Predicted label: {predicted_label}")
    print("-"*50)

Tweet: 'i feel awful about it too because it s my job to get him in a position to succeed and it just didn t happen here'
True label: sadness
Predicted label: sadness
--------------------------------------------------
Tweet: 'im alone i feel awful'
True label: sadness
Predicted label: sadness
--------------------------------------------------
Tweet: 'ive probably mentioned this before but i really do feel proud of myself for actually keeping up with my new years resolution of monthly and weekly goals'
True label: joy
Predicted label: joy
--------------------------------------------------
Tweet: 'i was feeling a little low few days back'
True label: sadness
Predicted label: sadness
--------------------------------------------------
Tweet: 'i beleive that i am much more sensitive to other peoples feelings and tend to be more compassionate'
True label: love
Predicted label: joy
--------------------------------------------------
Tweet: 'i find myself frustrated with christians because i fe

🔎 Measure the accuracy of the model used.

In [14]:
accuracy = accuracy_score(y_true, y_pred)
print(f"\nModel accuracy: {accuracy:.2f}")


Model accuracy: 0.81


In [15]:
filtered = df_100.iloc[:len(y_true)]

In [16]:
y_true = y_true[:len(df_100)]
y_pred = y_pred[:len(df_100)]

In [17]:
new_df = pd.DataFrame({
    'text': df_100['text'].iloc[:len(y_true)],
    'true_label': y_true,
    'predicted_label': y_pred
})

Compare the original label vs the predicted label.

In [18]:
new_df

Unnamed: 0,text,true_label,predicted_label
0,i feel awful about it too because it s my job ...,sadness,sadness
1,im alone i feel awful,sadness,sadness
2,ive probably mentioned this before but i reall...,joy,joy
3,i was feeling a little low few days back,sadness,sadness
4,i beleive that i am much more sensitive to oth...,love,joy
...,...,...,...
96,i feel like talking in a snobbish uppity accen...,anger,sadness
97,i don t feel as needy and desperate to prove t...,sadness,sadness
98,i feel bad for my mum who carries everything a...,sadness,sadness
99,i have a feeling some people really outgoing p...,joy,joy


In [19]:
new_df.iloc[5]['text']

'i find myself frustrated with christians because i feel that there is constantly a talk about loving one another being there for each other and praying for each other and i have seen that this is not always the case'

In [20]:
data.iloc[5]

text       i find myself frustrated with christians becau...
label                                                      2
emotion                                                 love
Name: 5, dtype: object

In [21]:
new_df.iloc[5]

text               i find myself frustrated with christians becau...
true_label                                                      love
predicted_label                                                anger
Name: 5, dtype: object