# Sentiment Analysis


### Zero Shot Labeling 

### Code performed in Google Colab

In this notebook I will use one of the hugging face models - Zero Shot model to label my tweets by sentiment. This approach allows me to assign sentiment labels to the tweets without explicitly training a sentiment analysis model.

The Zero Shot model is a powerful language model that can generalize across various tasks, including sentiment analysis. It can assign sentiment labels such as positive, negative, or neutral to text inputs by leveraging its understanding of the underlying language patterns and contextual information.

To perform the sentiment labeling, I am using Google Colab, a cloud-based Jupyter notebook environment provided by Google. Colab offers the advantage of running code on powerful remote servers, providing access to GPUs and TPUs for efficient deep learning computations.

In the notebook, I will demonstrate how to load the Zero Shot model from the Hugging Face library, process the tweet data, and apply the model to predict sentiment labels for each tweet. The model's predictions will provide insights into the sentiment expressed in the tweets, enabling further analysis and interpretation of the collected data.

By leveraging the Zero Shot model for sentiment labeling, I can efficiently categorize the tweets based on their sentiment without the need for extensive manual annotation or building a dedicated sentiment analysis model from scratch.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
!pip install transformers flair

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.28.1-py3-none-any.whl (7.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.0/7.0 MB[0m [31m58.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting flair
  Downloading flair-0.12.2-py3-none-any.whl (373 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m373.1/373.1 kB[0m [31m39.8 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m68.5 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.14.1-py3-none-any.whl (224 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m29.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollectin

In [3]:
import pandas as pd
# Hugging Face model
from transformers import pipeline

# Import flair pre-trained sentiment model
from flair.models import TextClassifier
classifier = TextClassifier.load('en-sentiment')

# Import flair Sentence to process input text
from flair.data import Sentence

# Import accuracy_score to check performance
from sklearn.metrics import accuracy_score

2023-04-26 08:50:21,860 https://nlp.informatik.hu-berlin.de/resources/models/sentiment-curated-distilbert/sentiment-en-mix-distillbert_4.pt not found in cache, downloading to /tmp/tmp96i2sf_8


100%|██████████| 253M/253M [00:12<00:00, 21.7MB/s]

2023-04-26 08:50:34,480 copying /tmp/tmp96i2sf_8 to cache at /root/.flair/models/sentiment-en-mix-distillbert_4.pt





2023-04-26 08:50:35,243 removing temp file /tmp/tmp96i2sf_8


Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In [4]:
df = pd.read_csv('/content/drive/MyDrive/SentimentAnalysis/tweets_df_preprocessed.csv')

In [5]:
df['original_tweets'] = df['original_tweets'].str.strip()

In [6]:
df

Unnamed: 0.1,Unnamed: 0,date,author_id,text,original_tweets,tokens,cleaned_tokens,stems,lemma
0,0,2023-03-31 00:54:56+00:00,80832189,ron desantis stated honor legal requirement ex...,Ron DeSantis just stated he would not honor a ...,"['ron', 'desantis', 'stated', 'honor', 'legal'...","['ron', 'desantis', 'stated', 'honor', 'legal'...","['ron', 'desanti', 'state', 'honor', 'legal', ...","['ron', 'desantis', 'state', 'honor', 'legal',..."
1,1,2023-03-31 00:54:50+00:00,2479303121,abortion completely legal constitution . .,Abortion is completely legal and in our consti...,"['abortion', 'completely', 'legal', 'constitut...","['abortion', 'completely', 'legal', 'constitut...","['abort', 'complet', 'legal', 'constitut']","['abortion', 'completely', 'legal', 'constitut..."
2,2,2023-03-31 00:52:32+00:00,1526105788728475653,""" forced pregnancy "" legal term specifical...","""Forced pregnancy"" is a legal term specificall...","['""', 'forced', 'pregnancy', '""', 'legal', 'te...","['forced', 'pregnancy', 'legal', 'term', 'spec...","['forc', 'pregnanc', 'legal', 'term', 'specif'...","['force', 'pregnancy', 'legal', 'term', 'speci..."
3,3,2023-03-31 00:52:09+00:00,438628988,"americans know true , 80 % believe aborti...","Americans know this IS true, which is why 80% ...","['americans', 'know', 'true', ',', '80', '%', ...","['americans', 'know', 'true', '80', 'believe',...","['american', 'know', 'true', '80', 'believ', '...","['americans', 'know', 'true', 'believe', 'abor..."
4,4,2023-03-31 00:51:47+00:00,1492201362154831874,democrats want legalize abortion scruples .,Democrats that want to legalize abortion don’t...,"['democrats', 'want', 'legalize', 'abortion', ...","['democrats', 'want', 'legalize', 'abortion', ...","['democrat', 'want', 'legal', 'abort', 'scrupl']","['democrats', 'want', 'legalize', 'abortion', ..."
...,...,...,...,...,...,...,...,...,...
22482,22482,2023-02-23 02:42:35+00:00,1492272489455620097,"abortion 9 months legal , past trimester ill...","abortion up to 9 months shouldn’t be legal, an...","['abortion', '9', 'months', 'legal', ',', 'pas...","['abortion', 'months', 'legal', 'past', 'trime...","['abort', 'month', 'legal', 'past', 'trimest',...","['abortion', 'month', 'legal', 'past', 'trimes..."
22483,22483,2023-02-23 02:29:15+00:00,452462161,"scotus rules decades ago privacy , fundame...","SCOTUS also rules decades ago that privacy, be...","['scotus', 'rules', 'decades', 'ago', 'privacy...","['scotus', 'rules', 'decades', 'ago', 'privacy...","['scotu', 'rule', 'decad', 'ago', 'privaci', '...","['scotus', 'rule', 'decade', 'ago', 'privacy',..."
22484,22484,2023-02-23 02:27:01+00:00,1519800752549421058,yes . means obtain abortion legal .,Yes. And I will go to all means to obtain an a...,"['yes', '.', 'means', 'obtain', 'abortion', 'l...","['yes', 'means', 'obtain', 'abortion', 'legal']","['ye', 'mean', 'obtain', 'abort', 'legal']","['yes', 'mean', 'obtain', 'abortion', 'legal']"
22485,22485,2023-02-23 02:25:21+00:00,1445411972388847622,", injunction ( incorrectly ) placed trigger...","Also, due to the injunction (incorrectly) plac...","[',', 'injunction', '(', 'incorrectly', ')', '...","['injunction', 'incorrectly', 'placed', 'trigg...","['injunct', 'incorrectli', 'place', 'trigger',...","['injunction', 'incorrectly', 'place', 'trigge..."


In [None]:
# I want to use hugging_face model to predict sentiment for my tweets since clustering-labeling method is not the best one (only 34% precision score)
# then I want to compare results from clustering and hugging_face model

In [7]:
# Define pipeline

classifier = pipeline(task = 'zero-shot-classification',
                     model="facebook/bart-large-mnli",
                     device = 0)

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [8]:
import warnings
warnings.filterwarnings("ignore")

tweets = df['original_tweets'].to_list()
candidate_labels = ["positive", "negative"]

# Set the hyppothesis template
hypothesis_template = "The sentiment of this tweet is {}."

# Prediction results
zeroshot_prediction = classifier(tweets, candidate_labels, hypothesis_template=hypothesis_template)
zeroshot_prediction = pd.DataFrame(zeroshot_prediction)
zeroshot_prediction.head(10)

Unnamed: 0,sequence,labels,scores
0,Ron DeSantis just stated he would not honor a ...,"[negative, positive]","[0.9887816309928894, 0.011218375526368618]"
1,Abortion is completely legal and in our consti...,"[negative, positive]","[0.5180108547210693, 0.4819890856742859]"
2,"""Forced pregnancy"" is a legal term specificall...","[negative, positive]","[0.9548926949501038, 0.04510725289583206]"
3,"Americans know this IS true, which is why 80% ...","[positive, negative]","[0.8548560738563538, 0.14514391124248505]"
4,Democrats that want to legalize abortion don’t...,"[negative, positive]","[0.9443130493164062, 0.05568692460656166]"
5,How Abortion Bans Are Impacting Pregnant Patie...,"[negative, positive]","[0.7610181570053101, 0.23898188769817352]"
6,There are more people that want to make aborti...,"[negative, positive]","[0.7828027009963989, 0.2171972692012787]"
7,How does an abortion ban become a forced gesta...,"[negative, positive]","[0.9582416415214539, 0.04175836592912674]"
8,Pretty sure Republicans don't want any childre...,"[negative, positive]","[0.9405350685119629, 0.05946493148803711]"
9,But those 1st 2 things he didn't do!\n\nPlease...,"[negative, positive]","[0.9591174125671387, 0.04088260978460312]"


In [9]:
zeroshot_prediction['zs_prediction'] = zeroshot_prediction['labels'].apply(lambda x: x[0])
zeroshot_prediction['zs_prediction'] = zeroshot_prediction['zs_prediction'].map({'positive': 0, 'negative': 1})
zeroshot_prediction['zs_predicted_score'] = zeroshot_prediction['scores'].apply(lambda x: x[0])

zeroshot_prediction.head(5)

Unnamed: 0,sequence,labels,scores,zs_prediction,zs_predicted_score
0,Ron DeSantis just stated he would not honor a ...,"[negative, positive]","[0.9887816309928894, 0.011218375526368618]",1,0.988782
1,Abortion is completely legal and in our consti...,"[negative, positive]","[0.5180108547210693, 0.4819890856742859]",1,0.518011
2,"""Forced pregnancy"" is a legal term specificall...","[negative, positive]","[0.9548926949501038, 0.04510725289583206]",1,0.954893
3,"Americans know this IS true, which is why 80% ...","[positive, negative]","[0.8548560738563538, 0.14514391124248505]",0,0.854856
4,Democrats that want to legalize abortion don’t...,"[negative, positive]","[0.9443130493164062, 0.05568692460656166]",1,0.944313


 # Here I also prepared my own-labeled 150 first tweets to compare results

In [10]:
zeroshot_prediction.to_excel('/content/drive/MyDrive/SentimentAnalysis/zero_shot_prediction.xlsx')

In [12]:
labeled_tweets = pd.read_excel('/content/drive/MyDrive/SentimentAnalysis/zero_shot_prediction_hands_labels.xlsx')

In [13]:
labeled_tweets.head(25)

Unnamed: 0,sequence,labels,scores,zs_prediction,zs_predicted_score,hand_labels
0,Ron DeSantis just stated he would not honor a ...,"['negative', 'positive']","[0.9887816309928894, 0.011218375526368618]",1,0.988782,1
1,Abortion is completely legal and in our consti...,"['negative', 'positive']","[0.5180108547210693, 0.4819890856742859]",1,0.518011,0
2,"""Forced pregnancy"" is a legal term specificall...","['negative', 'positive']","[0.9548926949501038, 0.04510725289583206]",1,0.954893,1
3,"Americans know this IS true, which is why 80% ...","['positive', 'negative']","[0.8548560738563538, 0.14514391124248505]",0,0.854856,0
4,Democrats that want to legalize abortion don’t...,"['negative', 'positive']","[0.9443130493164062, 0.05568692460656166]",1,0.944313,1
5,How Abortion Bans Are Impacting Pregnant Patie...,"['negative', 'positive']","[0.7610181570053101, 0.23898188769817352]",1,0.761018,0
6,There are more people that want to make aborti...,"['negative', 'positive']","[0.7828027009963989, 0.2171972692012787]",1,0.782803,1
7,How does an abortion ban become a forced gesta...,"['negative', 'positive']","[0.9582416415214539, 0.04175836592912674]",1,0.958242,1
8,Pretty sure Republicans don't want any childre...,"['negative', 'positive']","[0.9405350685119629, 0.05946493148803711]",1,0.940535,1
9,But those 1st 2 things he didn't do!\n\nPlease...,"['negative', 'positive']","[0.9591174125671387, 0.04088260978460312]",1,0.959117,1


In [14]:
from IPython.display import display
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, f1_score

In [17]:
predicted_classes = labeled_tweets.zs_prediction
y_test = labeled_tweets.hand_labels

conf_matrix = pd.DataFrame(confusion_matrix(labeled_tweets.hand_labels, labeled_tweets.zs_prediction))
print('Confusion Matrix')
display(conf_matrix)

test_scores = accuracy_score(y_test,predicted_classes), precision_score(y_test, predicted_classes), recall_score(y_test, predicted_classes), f1_score(y_test, predicted_classes)

print('\n \n Scores')
scores = pd.DataFrame(data=[test_scores])
scores.columns = ['accuracy', 'precision', 'recall', 'f1']
scores = scores.T
scores.columns = ['scores']
display(scores)

Confusion Matrix


Unnamed: 0,0,1
0,26,74
1,5,46



 
 Scores


Unnamed: 0,scores
accuracy,0.476821
precision,0.383333
recall,0.901961
f1,0.538012


 # 47% accuracy score. It still not good, but it's better than clustering results so I will use zero-shot method to label my tweets

In [21]:
zeroshot_prediction = zeroshot_prediction.rename(columns={'sequence': 'original_tweets'})
zeroshot_prediction

Unnamed: 0,original_tweets,labels,scores,zs_prediction,zs_predicted_score
0,Ron DeSantis just stated he would not honor a ...,"[negative, positive]","[0.9887816309928894, 0.011218375526368618]",1,0.988782
1,Abortion is completely legal and in our consti...,"[negative, positive]","[0.5180108547210693, 0.4819890856742859]",1,0.518011
2,"""Forced pregnancy"" is a legal term specificall...","[negative, positive]","[0.9548926949501038, 0.04510725289583206]",1,0.954893
3,"Americans know this IS true, which is why 80% ...","[positive, negative]","[0.8548560738563538, 0.14514391124248505]",0,0.854856
4,Democrats that want to legalize abortion don’t...,"[negative, positive]","[0.9443130493164062, 0.05568692460656166]",1,0.944313
...,...,...,...,...,...
22482,"abortion up to 9 months shouldn’t be legal, an...","[negative, positive]","[0.9908162355422974, 0.009183789603412151]",1,0.990816
22483,"SCOTUS also rules decades ago that privacy, be...","[negative, positive]","[0.9144995808601379, 0.08550041913986206]",1,0.914500
22484,Yes. And I will go to all means to obtain an a...,"[negative, positive]","[0.9525007009506226, 0.04749932140111923]",1,0.952501
22485,"Also, due to the injunction (incorrectly) plac...","[negative, positive]","[0.9203109741210938, 0.07968904823064804]",1,0.920311


In [23]:
df = df.merge(zeroshot_prediction[['original_tweets', 'zs_prediction']], on='original_tweets', how='left')
df

Unnamed: 0.1,Unnamed: 0,date,author_id,text,original_tweets,tokens,cleaned_tokens,stems,lemma,zs_prediction
0,0,2023-03-31 00:54:56+00:00,80832189,ron desantis stated honor legal requirement ex...,Ron DeSantis just stated he would not honor a ...,"['ron', 'desantis', 'stated', 'honor', 'legal'...","['ron', 'desantis', 'stated', 'honor', 'legal'...","['ron', 'desanti', 'state', 'honor', 'legal', ...","['ron', 'desantis', 'state', 'honor', 'legal',...",1
1,1,2023-03-31 00:54:50+00:00,2479303121,abortion completely legal constitution . .,Abortion is completely legal and in our consti...,"['abortion', 'completely', 'legal', 'constitut...","['abortion', 'completely', 'legal', 'constitut...","['abort', 'complet', 'legal', 'constitut']","['abortion', 'completely', 'legal', 'constitut...",1
2,2,2023-03-31 00:52:32+00:00,1526105788728475653,""" forced pregnancy "" legal term specifical...","""Forced pregnancy"" is a legal term specificall...","['""', 'forced', 'pregnancy', '""', 'legal', 'te...","['forced', 'pregnancy', 'legal', 'term', 'spec...","['forc', 'pregnanc', 'legal', 'term', 'specif'...","['force', 'pregnancy', 'legal', 'term', 'speci...",1
3,3,2023-03-31 00:52:09+00:00,438628988,"americans know true , 80 % believe aborti...","Americans know this IS true, which is why 80% ...","['americans', 'know', 'true', ',', '80', '%', ...","['americans', 'know', 'true', '80', 'believe',...","['american', 'know', 'true', '80', 'believ', '...","['americans', 'know', 'true', 'believe', 'abor...",0
4,4,2023-03-31 00:51:47+00:00,1492201362154831874,democrats want legalize abortion scruples .,Democrats that want to legalize abortion don’t...,"['democrats', 'want', 'legalize', 'abortion', ...","['democrats', 'want', 'legalize', 'abortion', ...","['democrat', 'want', 'legal', 'abort', 'scrupl']","['democrats', 'want', 'legalize', 'abortion', ...",1
...,...,...,...,...,...,...,...,...,...,...
23826,22482,2023-02-23 02:42:35+00:00,1492272489455620097,"abortion 9 months legal , past trimester ill...","abortion up to 9 months shouldn’t be legal, an...","['abortion', '9', 'months', 'legal', ',', 'pas...","['abortion', 'months', 'legal', 'past', 'trime...","['abort', 'month', 'legal', 'past', 'trimest',...","['abortion', 'month', 'legal', 'past', 'trimes...",1
23827,22483,2023-02-23 02:29:15+00:00,452462161,"scotus rules decades ago privacy , fundame...","SCOTUS also rules decades ago that privacy, be...","['scotus', 'rules', 'decades', 'ago', 'privacy...","['scotus', 'rules', 'decades', 'ago', 'privacy...","['scotu', 'rule', 'decad', 'ago', 'privaci', '...","['scotus', 'rule', 'decade', 'ago', 'privacy',...",1
23828,22484,2023-02-23 02:27:01+00:00,1519800752549421058,yes . means obtain abortion legal .,Yes. And I will go to all means to obtain an a...,"['yes', '.', 'means', 'obtain', 'abortion', 'l...","['yes', 'means', 'obtain', 'abortion', 'legal']","['ye', 'mean', 'obtain', 'abort', 'legal']","['yes', 'mean', 'obtain', 'abortion', 'legal']",1
23829,22485,2023-02-23 02:25:21+00:00,1445411972388847622,", injunction ( incorrectly ) placed trigger...","Also, due to the injunction (incorrectly) plac...","[',', 'injunction', '(', 'incorrectly', ')', '...","['injunction', 'incorrectly', 'placed', 'trigg...","['injunct', 'incorrectli', 'place', 'trigger',...","['injunction', 'incorrectly', 'place', 'trigge...",1


In [25]:
df.to_csv('/content/drive/MyDrive/SentimentAnalysis/tweets_df_cleaned_labeled.csv')