<a href="https://www.kaggle.com/code/ankitkumar2635/sentiment-and-emotion-classification-of-tweets?scriptVersionId=133705177" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Introduction 
In this notebook we explore the tweets mentioning **@Dell** and run a **sentiment analysis** and **emotions classification** using **HuggingFace's 🤗 transformers** to figure out the positives and negatives of the company's product and services. 


* The sentiment classifier labels the tweets as either positive, negative and neutral
* The emotion classifier classifies the tweets into 11 emotions: joy, love, optimism, pessimism, trust, surprise, anticipation, sadness, anger, disgust and fear. 
    

In this notebook, I have used a trasformer model ([cardiffnlp/twitter-roberta-base-sentiment-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest)) to classify sentiment and we go a step deeper and classify emotions of the tweets using another transformer model ([cardiffnlp/twitter-roberta-base-emotion-multilabel-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-emotion-multilabel-latest)).  


**Data:** 

* [Dell Tweets 2022](https://www.kaggle.com/datasets/ankitkumar2635/dell-tweets-2022) which contains about 25k tweets from first three quarter of 2022

* The data has been collected from twitter using "snscrape". Follow this [notebook link](https://www.kaggle.com/code/ankitkumar2635/scrape-tweets-without-twitter-s-api) to learn more about scrapping tweets without Twitter's API.

* The data has four columns: 

   1. Datetime
   2. Tweet Id
   3. Text (the tweet)
   4. Username
   
   
#### Author: Ankit Kumar
#### Created on: 14/07/2023

# 1. Sentiment Classification Using Transformers

### Import required modules

In [None]:
pip install --upgrade accelerate #use it if you get import error for transformers' pipeline

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from transformers import pipeline
import torch

from tqdm.notebook import tqdm # shows a progress bar on iterations 

### Load the tweets dataset

In [None]:
tweets_df = pd.read_csv("/kaggle/input/dell-tweets-2022/First three qtr Dell tweets.csv")
tweets_df.head()

In [None]:
tweets_df.shape

#### Lets read some random tweets

In [None]:
print(tweets_df.Text.values[3], "\n\n")
print(tweets_df.Text.values[38])

**These two tweets very well represent a negative and positive sentiment, respectively.**

#### Setting up the GPU

In [None]:
device="cuda" if torch.cuda.is_available() else "cpu"
device

#### Define the sentiment classification model

In [None]:
model_path = "cardiffnlp/twitter-roberta-base-sentiment-latest"
s_classifier = pipeline("sentiment-analysis", model= model_path, tokenizer=model_path, device=1)

#### Testing the model on a tweet

In [None]:
# keeping a tweet from the dataframe as example tweet
example = tweets_df.Text.values[44]
example

**My understanding is: The user is making a sarcastic remark. Lets feed this tweet into the model**

In [None]:
s_classifier(example)

**The model did very well to recognise the sarcasm and labelled the sentiment as negative. Lets apply the model to entire dataset**

In [None]:
senti_res = {}
for i, row in tqdm(tweets_df.iterrows(), total = len(tweets_df)):
    text = row['Text']
    myid = row['Tweet Id']
    senti_res[myid] = s_classifier(text)

#### Merge the sentiment results to the original df

In [None]:
#Keeping the results as dataframe
sentiment_res = pd.DataFrame(senti_res).T
sentiment_res.head(5)

In [None]:
# Split the dictionary col
sentiment_res= sentiment_res[0].apply(pd.Series)
sentiment_res.head(5)

In [None]:
# Rename index to Tweet Id, label to emotion
sentiment_res = sentiment_res.reset_index().rename(columns={'index': 'Tweet Id', 'label':'sentiment', 'score':'sentiment_score'})
sentiment_res.head(5)

In [None]:
# merge to original df
tweets_df = tweets_df.merge(sentiment_res, how = "left")
tweets_df.head(5)

In [None]:
tweets_df.sentiment.value_counts().plot(kind='bar', title = "Count of Sentiments")

# 2. Emotion Classification Using Transformers 

#### Define the emotion classifier model

In [None]:
e_classifier = pipeline("text-classification", model="cardiffnlp/twitter-roberta-base-emotion-multilabel-latest", return_all_scores=False, device=0)

In [None]:
example

In [None]:
# Testing the model on example
e_classifier(example)

**Now I am surprised by the performance of these models. Lets apply this model to our entire dataset** 

In [None]:
emotion_res = {}
for i, row in tqdm(tweets_df.iterrows(), total = len(tweets_df)):
    text = row['Text']
    myid = row['Tweet Id']
    emotion_res[myid] = e_classifier(text)

#### Merge the results to original data

In [None]:
# Similar steps to sentiment classification
emotions_res = pd.DataFrame(emotion_res).T
emotions_res= emotions_res[0].apply(pd.Series)
emotions_res = emotions_res.reset_index().rename(columns={'index': 'Tweet Id', 'label':'emotion', 'score':'emotion_score'})
tweets_df = tweets_df.merge(emotions_res, how = "left")
tweets_df.head(5)

In [None]:
tweets_df['emotion'].value_counts().plot(kind='bar', title ='Count of Emotions')

In [None]:
# save the df as output
tweets_df.to_csv('sentiment-emotion-labelled_Dell_tweets')

# 3. EDA on Sentiment and Emotion Labelled Dataset

### Visualise emotions for different sentiments of tweets

I want to see how the emotions look for tweets which are labelled as positive

In [None]:
fig, axes = plt.subplots(3,1, figsize = (7,10), sharey = True)
plt.suptitle('Emotions Across Different Sentiments')
sns.countplot(data=tweets_df.loc[tweets_df.sentiment == 'negative'], x= 'emotion', ax= axes[0])
axes[0]. title. set_text("Negative Sentiment")
sns.countplot(data=tweets_df.loc[tweets_df.sentiment == 'neutral'], x= 'emotion', ax= axes[1])
axes[1]. title. set_text("Neutral Sentiment")
sns.countplot(data=tweets_df.loc[tweets_df.sentiment == 'positive'], x= 'emotion', ax= axes[2])
axes[2]. title. set_text("Positive Sentiment")
plt.tight_layout()

### Examine sentiments and emotions across financial quarters (Q1 to Q3)

In [None]:
# Break df into three quaters 
Q1_tweets = tweets_df.loc[tweets_df.Datetime < '2022-04-01']
Q2_tweets = tweets_df.loc[tweets_df.Datetime < '2022-07-01']
Q3_tweets = tweets_df.loc[tweets_df.Datetime >= '2022-07-01']

In [None]:
fig, axes = plt.subplots(1,3, figsize = (10,5), sharey = True)
plt.suptitle("Tweets Sentiment Across Quarters")
sns.countplot(data=Q1_tweets, x='sentiment', ax=axes[0], order=['positive', 'neutral', 'negative'])
axes[0]. title. set_text("Q1")
sns.countplot(data=Q2_tweets, x='sentiment', ax=axes[1], order=['positive', 'neutral', 'negative'])
axes[1]. title. set_text("Q2")
sns.countplot(data=Q3_tweets, x='sentiment', ax=axes[2], order=['positive', 'neutral', 'negative'])
axes[2]. title. set_text("Q3")

* Negative tweets are higher across all the quarters
* Total number of tweets spiked during the second quarter

### Visualise emotions across quarters 

In [None]:
fig, axes = plt.subplots(3,1, figsize = (10,8), sharey = True)
plt.suptitle("Emotions Across Quarters")
sns.countplot(data=Q1_tweets, x='emotion', ax=axes[0], order=Q1_tweets['emotion'].value_counts().index)
axes[0]. title. set_text("Q1")
sns.countplot(data=Q2_tweets, x='emotion', ax=axes[1], order=Q2_tweets['emotion'].value_counts().index )
axes[1]. title. set_text("Q2")
sns.countplot(data=Q3_tweets, x='emotion', ax=axes[2], order=Q3_tweets['emotion'].value_counts().index)
axes[2]. title. set_text("Q3")
plt.tight_layout()

As expected: 'anger' dominates the show, as most of the tweets show negative sentiment. So here is a human behavioral lesson:

People respond more to negative stimuli, thats called negative bias.

For example: How often you read or say "My laptop's fan has got too noisy", but we hardly mention it when it is working as it should. Got my point!

### Create a wordcloud of negative tweets each quarter 

In [None]:
from wordcloud import WordCloud, STOPWORDS

In [None]:
# Combine quarterly negative tweets into a single text
Q1_neg_text =  ''.join(Q1_tweets.loc[Q1_tweets.sentiment == 'negative'].Text)
print("There are {} words in the combination of Q1 negative tweets.\n" .format(len(Q1_neg_text)))

Q2_neg_text =  ''.join(Q2_tweets.loc[Q2_tweets.sentiment == 'negative'].Text)
print("There are {} words in the combination of Q2 negative tweets.\n" .format(len(Q2_neg_text)))

Q3_neg_text =  ''.join(Q3_tweets.loc[Q3_tweets.sentiment == 'negative'].Text)
print("There are {} words in the combination of Q3 negative tweets.\n" .format(len(Q3_neg_text)))

In [None]:
# Create word cloud of negative tweets for Q1
stopwords = set(STOPWORDS)
word_cloud = WordCloud(background_color = 'white', 
                       stopwords = stopwords, 
                       max_words = 100).generate(Q1_neg_text)


plt.figure(figsize = (8, 8))
plt.imshow(word_cloud)
plt.axis("off")
plt.title('Wordcloud: Q1 Negative Tweets')
plt.tight_layout()
 
plt.show()

In [None]:
# Create word cloud of negative tweets for Q2
stopwords = set(STOPWORDS)
word_cloud = WordCloud(background_color = 'white', 
                       stopwords = stopwords, 
                       max_words = 100).generate(Q2_neg_text)

plt.figure(figsize = (8, 8))
plt.imshow(word_cloud)
plt.axis("off")
plt.title('Wordcloud: Q2 Negative Tweets')
plt.tight_layout()
 
plt.show()

In [None]:
# Create word cloud of negative tweets for Q3
stopwords = set(STOPWORDS)
word_cloud = WordCloud(background_color = 'white', 
                       stopwords = stopwords, 
                       max_words = 100).generate(Q3_neg_text)

plt.figure(figsize = (8, 8))
plt.imshow(word_cloud)
plt.axis("off")
plt.title('Wordcloud: Q3 Negative Tweets')
plt.tight_layout()
 
plt.show()

**Few words pop out in each wordcloud: words like "Dell Care", "service", "warranty" and "customer service". So we figure out that most of "anger" is around Dell's customer service.**