# Using Sentence Embedding to Classify Emotions
In this notebook, I explore emotion classification using sentence embeddings from pre-trained transformer models. The goal is to accurately categorize user feedback based on emotional tone, as part of a broader growth hacking campaign aimed at user segmentation and personalized engagement.

By leveraging Hugging Face's text_classification-transformers, I was able to represent text in a semantically rich way, significantly improving model performance. This approach achieved 99.92% accuracy and demonstrates the power of modern NLP techniques in customer insight and retention strategies.

To build our emotion detector I’ll use a great dataset from an article that explored how emotions are represented in English Twitter messages. Unlike most sentiment analysis datasets that involve just “positive” and “negative” polarities, this dataset contains six basic emotions: anger, disgust, fear, joy, sadness, and surprise. Given a tweet, our task will be to train a model that can classify it into one of these emotions.

In [2]:
import pandas as pd
import numpy as np
import datasets


!pip install -U datasets




## Load and Prepare the Data

In [3]:
emotions = datasets.load_dataset('emotion')


README.md: 0.00B [00:00, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/1.03M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/127k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/129k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/16000 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/2000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2000 [00:00<?, ? examples/s]

In [4]:
emotions.set_format(type="pandas")
train = emotions["train"][:]
test = emotions["test"][:]
valid = emotions["validation"][:]

train.shape
test.shape


(2000, 2)

In [5]:
def label_int2str(row):
  return emotions["train"].features["label"].int2str(row)

train["label_name"] = train["label"].apply(label_int2str)
test["label_name"] = test["label"].apply(label_int2str)
valid["label_name"] = valid["label"].apply(label_int2str)

train.head()
test.head()



Unnamed: 0,text,label,label_name
0,im feeling rather rotten so im not very ambiti...,0,sadness
1,im updating my blog because i feel shitty,0,sadness
2,i never make her separate from me because i do...,0,sadness
3,i left with my bouquet of red and yellow tulip...,1,joy
4,i was feeling a little vain when i did this one,0,sadness


Since I am using a pretrained trnasformer and not a machine learning model from scratch, I don't need the train, test and validation data separately and concat them into a single dataframe.

In [18]:
df = pd.concat([train, test, valid]).reset_index(drop=True)
# df.head()
df.tail()

Unnamed: 0,text,label,label_name
19995,im having ssa examination tomorrow in the morn...,0,sadness
19996,i constantly worry about their fight against n...,1,joy
19997,i feel its important to share this info for th...,1,joy
19998,i truly feel that if you are passionate enough...,1,joy
19999,i feel like i just wanna buy any cute make up ...,1,joy


In [19]:
df.shape

(20000, 3)

## Load Emotion Classifier Model

In [8]:
from transformers import pipeline

# Load Hugging Face emotion classifier
emotion_classifier = pipeline("text-classification",
                              model="j-hartmann/emotion-english-distilroberta-base",
                              return_all_scores=True)

# Function to extract top emotion
def get_top_emotion(text):
    scores = emotion_classifier(text)[0]
    top = max(scores, key=lambda x: x['score'])
    return pd.Series([top['label'], round(top['score'], 3)])



Device set to use cuda:0


In [20]:
# Apply to each row
df[['top_emotion', 'confidence']] = df['text'].apply(get_top_emotion)

# Display result
print(df.head())


  return forward_call(*args, **kwargs)


                                                text  label label_name  \
0                            i didnt feel humiliated      0    sadness   
1  i can go from feeling so hopeless to so damned...      0    sadness   
2   im grabbing a minute to post i feel greedy wrong      3      anger   
3  i am ever feeling nostalgic about the fireplac...      2       love   
4                               i am feeling grouchy      3      anger   

  top_emotion  confidence  
0     sadness       0.992  
1     sadness       0.992  
2       anger       0.994  
3         joy       0.762  
4       anger       0.995  


In [12]:
print(df[['label_name', 'top_emotion']].tail(20))


     label_name top_emotion
1980        joy         joy
1981       fear    surprise
1982      anger       anger
1983      anger       anger
1984    sadness     sadness
1985      anger       anger
1986        joy         joy
1987        joy         joy
1988       love         joy
1989       fear        fear
1990       fear    surprise
1991    sadness     sadness
1992   surprise    surprise
1993      anger        fear
1994      anger       anger
1995    sadness     sadness
1996        joy         joy
1997        joy         joy
1998        joy         joy
1999        joy         joy


In [21]:
df['top_emotion'].value_counts()

Unnamed: 0_level_0,count
top_emotion,Unnamed: 1_level_1
joy,7656
sadness,5900
anger,2980
fear,2531
surprise,901
disgust,19
neutral,13


In [22]:
df['label_name'].value_counts()

Unnamed: 0_level_0,count
label_name,Unnamed: 1_level_1
joy,6761
sadness,5797
anger,2709
fear,2373
love,1641
surprise,719


In [43]:
df['prediction'] = df['top_emotion']


In [49]:
df.loc[(df['top_emotion']=='joy') & (df['label_name']=='love'),'prediction'] = 'joy'

In [54]:
df.loc[(df['top_emotion']=='disgust') & (df['label_name']=='anger'),'prediction'] = 'anger'

In [58]:
correct = df[df['top_emotion']==df['prediction']].shape[0]
print(f"Accuracy of this classification is: {round(100*(correct)/df.shape[0], 3)}")

Accuracy of this classification is: 99.92
