<p align = "center" draggable=”false” ><img src="https://user-images.githubusercontent.com/37101144/161836199-fdb0219d-0361-4988-bf26-48b0fad160a3.png"
     width="200px"
     height="auto"/>
</p>

# <h1 align="center" id="heading">Sentiment Analysis of Twitter Data</h1>

<hr>


### ☑️ Objectives
At the end of this session, you will be able to:
- [ ] Understand how to find and run pre-trained models
- [ ] Evaluate results from pre-trained models
- [ ] Run a pre-trained model using real twitter data


### 🔨 Pre-Assignment

Create a new Conda environment for sentiment anaylsis (sa)

```bash
  conda create -n sa python=3.8 jupyter -y
```

Activate your new environment
```bash
  conda activate sa
```

Open the jupyter-notebook
```bash
  jupyter-notebook
```

Navigate through the repo in the notebook to find `imports.ipynb` for this week and open it.

Run all of the cells in the notebook.


### Background
Please review the weekly narrative [here](https://www.notion.so/Week-2-Data-Centric-AI-the-AI-Product-Lifecycle-72a84c1517b44fcbb3e6bd11d47477dc#2b73937612bb46559f5b91dc2bf55e7d)




<hr>

## 🚀 Let's Get Started

Let's first start with our imports

In [1]:
import csv # Allows us to read and write csv files
from pprint import pprint # Make our print functions easier to read

from transformers import pipeline # Hugging face pipeline to load online models

🤗 Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.

These models can be applied on:
- 📝 Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages.

- 🖼️ Images, for tasks like image classification, object detection, and segmentation.
- 🗣️ Audio, for tasks like speech recognition and audio classification.

This is the pipeline method in transformers that we'll be using to analyze our sentiment data. Since we're not specifying a pretrained model, the pipeline has a default sentiment analysis model called [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).

In [2]:
sentiment_pipeline = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In this example, we'll supply two polar sentiments and test out the model pipeline.

In [3]:
data = ["This is great!", "Oh no!"]
sentiment_pipeline(data)

[{'label': 'POSITIVE', 'score': 0.9998694658279419},
 {'label': 'NEGATIVE', 'score': 0.994263231754303}]

The `label` in this case indicates the prediction for the sentiment type.

The `score` indicates the confidence of the prediction (between 0 and 1).

Since our sentiments were very polar, it was easier for the model to predict the sentiment type.

Let's see what happens when we use a less clear example:

In [4]:
challenging_sentiments = ["I don't think freddriq should leave, he's been helpful.",
                          "Is that the lake we went to last month?"]
sentiment_pipeline(challenging_sentiments)

[{'label': 'NEGATIVE', 'score': 0.9955561757087708},
 {'label': 'NEGATIVE', 'score': 0.9860844016075134}]

<hr>

### Loading the Twitter Data

Let's play with some twitter data. We'll be using a modified version of the [Elon Musk twitter dataset on Kaggle](https://www.kaggle.com/datasets/andradaolteanu/all-elon-musks-tweets).

In [5]:
with open('../data/elonmusk_tweets.csv', newline='', encoding='utf8') as f:
    tweets=[]
    reader = csv.reader(f)
    twitter_data = list(reader)
    for tweet in twitter_data:
        tweets.append(tweet[0])

pprint(tweets[:100])

['@vincent13031925 For now. Costs are decreasing rapidly.',
 'Love this beautiful shot',
 '@agnostoxxx @CathieDWood @ARKInvest Trust the shrub',
 'The art In Cyberpunk is incredible',
 '@itsALLrisky 🤣🤣',
 '@seinfeldguru @WholeMarsBlog Nope haha',
 '@WholeMarsBlog If you don’t say anything &amp; engage Autopilot, it will '
 'soon guess based on time of day, taking you home or to work or to what’s on '
 'your calendar',
 '@DeltavPhotos @PortCanaveral That rocket is a hardcore veteran of many '
 'missions',
 'Blimps rock  https://t.co/e8cu5FkNOI',
 '@engineers_feed Due to lower gravity, you can travel from surface of Mars to '
 'surface of Earth fairly easily with a single stage rocket. Earth to Mars is '
 'vastly harder.',
 '@DrPhiltill Good thread',
 '@alexellisuk Pretty much',
 '@tesla_adri @WholeMarsBlog These things are best thought of as '
 'probabilities. There are 5 forward-facing cameras. It is highly likely that '
 'at least one of them will see multiple cars ahead.',
 '@WholeMa

First things first - let's look at the sentiment as determined by the [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) (default model) in the pipeline.

In [6]:
distil_sentiment = sentiment_pipeline(tweets[0:100])

Let's check out the distribution of positive/negative Tweets and see the breakdown using Python's 🐍 standard library `collections.Counter`!

In [7]:
from collections import Counter

tweet_distro = Counter([x['label'] for x in distil_sentiment])
pos_sent_count = tweet_distro['POSITIVE']
neg_sent_count = tweet_distro['NEGATIVE']
total_sent_count = sum(tweet_distro.values())

print(f"{pos_sent_count} ({pos_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are positive.")
print(f"{neg_sent_count} ({neg_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are negative.")

49 (49.00%) of the tweets classified are positive.
51 (51.00%) of the tweets classified are negative.


Let's do that process again, but use a model with an additional potential label "NEUTRAL" called [bertweet-sentiment-analysis](https://huggingface.co/finiteautomata/bertweet-base-sentiment-analysis)

To start - we'll build a pipeline with the new model by using the 🤗 Hugging Face address: `finiteautomata/bertweet-base-sentiment-analysis`

In [8]:
bertweet_pipeline = pipeline(model="finiteautomata/bertweet-base-sentiment-analysis")

Downloading:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

Next, and the same as before, let's run the analysis on 100 of Elon's tweets.

In [9]:
bert_sentiment = bertweet_pipeline(tweets[0:100])

And then, let's check out the breakdown of positive, negative, AND neutral sentiments!

In [10]:
from collections import Counter

tweet_distro = Counter([x['label'] for x in bert_sentiment])
pos_sent_count = tweet_distro['POS']
neu_sent_count = tweet_distro['NEU']
neg_sent_count = tweet_distro['NEG']
total_sent_count = sum(tweet_distro.values())

print(f"{pos_sent_count} ({pos_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are positive.")
print(f"{neu_sent_count} ({neu_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are neutral.")
print(f"{neg_sent_count} ({neg_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are negative.")

29 (29.00%) of the tweets classified are positive.
64 (64.00%) of the tweets classified are neutral.
7 (7.00%) of the tweets classified are negative.


❓ **Setup and exploratory analysis to answer the assignment questions**

In [11]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('../data/elonmusk_tweets.csv', header=None)
elontweets = df.rename(columns={0:"Tweet"})
elontweets = elontweets.head(500) #only consider the first 500 tweets in the list
elontweets.head()

Unnamed: 0,Tweet
0,@vincent13031925 For now. Costs are decreasing...
1,Love this beautiful shot
2,@agnostoxxx @CathieDWood @ARKInvest Trust the ...
3,The art In Cyberpunk is incredible
4,@itsALLrisky 🤣🤣


In [12]:
# create list for each model
base_list = sentiment_pipeline(list(elontweets['Tweet']))
sent_list = bertweet_pipeline(list(elontweets['Tweet']))

In [13]:
#create some new columns for our dataframe

# base
base_class = []
base_score = []
for value in base_list:
    base_class.append(value['label'])
    base_score.append(value['score'])
    
elontweets['dist_classification'] = pd.Series(base_class)
elontweets['dist_score'] = pd.Series(base_score)

# sentiment
sent_class = []
sent_score = []
for value in sent_list:
    sent_class.append(value['label'])
    sent_score.append(value['score'])
    
elontweets['bert_classification'] = pd.Series(sent_class)
elontweets['bert_score'] = pd.Series(sent_score)

In [14]:
elontweets.head(5)

Unnamed: 0,Tweet,dist_classification,dist_score,bert_classification,bert_score
0,@vincent13031925 For now. Costs are decreasing...,NEGATIVE,0.996366,NEU,0.952394
1,Love this beautiful shot,POSITIVE,0.999882,POS,0.990994
2,@agnostoxxx @CathieDWood @ARKInvest Trust the ...,NEGATIVE,0.849832,NEU,0.973386
3,The art In Cyberpunk is incredible,POSITIVE,0.999886,POS,0.982426
4,@itsALLrisky 🤣🤣,NEGATIVE,0.98395,NEG,0.962732


**❓ What do you notice about the difference in the results?**

- The Distilbert model had a roughly 60-40% split between *negative* and *positive* predictions.The Bertweet model classified the majority of the tweets as *neutral* (66.8%) and relatively few as *negative* (11.4%).

In [15]:
#counts for distilbert model
elontweets.dist_classification.value_counts(normalize=True)

NEGATIVE    0.604
POSITIVE    0.396
Name: dist_classification, dtype: float64

In [16]:
#counts for the bertweet model
elontweets.bert_classification.value_counts(normalize=True)

NEU    0.668
POS    0.218
NEG    0.114
Name: bert_classification, dtype: float64

- The Distilbert model tended to classify *neutral* predictions from the Bertweet model as *negative* (68% of the time) despite having a 60-40% distribution of *negative* and *positive* predictions overall.

In [17]:
#distilbert base class counts for neutral classifications in bertweet model
elontweets[elontweets['bert_classification']=='NEU'].dist_classification.value_counts(normalize=True)

NEGATIVE    0.679641
POSITIVE    0.320359
Name: dist_classification, dtype: float64

- If we exclude *neutral* predictions from the Bertweet model, then for the same tweets we get that it predicts *positive* outcomes more than the Distilbert model (66% vs 55%) and it predicts negative outcomes less likely (34% vs 45%).

In [18]:
elontweets[elontweets['bert_classification']!='NEU'].dist_classification.value_counts(normalize=True)

POSITIVE    0.548193
NEGATIVE    0.451807
Name: dist_classification, dtype: float64

In [19]:
elontweets[elontweets['bert_classification']!='NEU'].bert_classification.value_counts(normalize=True)

POS    0.656627
NEG    0.343373
Name: bert_classification, dtype: float64

❓ Do the results for the `bertweet-base` model look better, or worse, than the results for the `distilbert-base` model? Why?

Lets try a few logical examples to compare the models:

- both models predict accurately for low odd negations (up to 5)
- both models predict poorly for low even negations
- Distilbert model handles negations better than Bertweet
- Bertweet naturally handles more non-positive and non-negative tweets better (i.e. neutral)
- Bertweet handles conditional logic a bit better (i.e. assesses better probability of being certain).

In [20]:
def negation(number=0, string=""):
    
    negations = ""
    newstring = ""
    
    if number == 0:
        return string, sentiment_pipeline(string), bertweet_pipeline(string)
    
    else:
        newstring = "not " * number + string
        return newstring, sentiment_pipeline(newstring), bertweet_pipeline(newstring)


In [21]:
# no negation good word
negation(0, "good")

('good',
 [{'label': 'POSITIVE', 'score': 0.9998161196708679}],
 [{'label': 'POS', 'score': 0.9473403692245483}])

In [22]:
# no negation bad word
negation(0, "bad")

('bad',
 [{'label': 'NEGATIVE', 'score': 0.999782383441925}],
 [{'label': 'NEG', 'score': 0.8907870054244995}])

In [23]:
# 1 negation good word
negation(1, "good")

('not good',
 [{'label': 'NEGATIVE', 'score': 0.9997889399528503}],
 [{'label': 'NEG', 'score': 0.9618973731994629}])

In [24]:
# 1 negation bad word
negation(1, "bad")

('not bad',
 [{'label': 'POSITIVE', 'score': 0.9995881915092468}],
 [{'label': 'POS', 'score': 0.9798154830932617}])

In [25]:
# double negation good word
negation(2, "good")

('not not good',
 [{'label': 'NEGATIVE', 'score': 0.9997919201850891}],
 [{'label': 'NEG', 'score': 0.9710240960121155}])

In [26]:
# double negation bad word
negation(2, "bad")

('not not bad',
 [{'label': 'POSITIVE', 'score': 0.9997580647468567}],
 [{'label': 'POS', 'score': 0.984266459941864}])

In [27]:
# triple negation good word
negation(3, "good")

('not not not good',
 [{'label': 'NEGATIVE', 'score': 0.9997898936271667}],
 [{'label': 'NEG', 'score': 0.9729840159416199}])

In [28]:
# triple negation bad word
negation(3,"bad")

('not not not bad',
 [{'label': 'POSITIVE', 'score': 0.9997838139533997}],
 [{'label': 'POS', 'score': 0.9850047826766968}])

In [29]:
print(negation(9,"good"))
print(negation(9,"bad"))

('not not not not not not not not not good', [{'label': 'NEGATIVE', 'score': 0.680907666683197}], [{'label': 'NEG', 'score': 0.9778483510017395}])
('not not not not not not not not not bad', [{'label': 'POSITIVE', 'score': 0.9889856576919556}], [{'label': 'NEG', 'score': 0.7724595665931702}])


In [30]:
# lets try to generlaize for large n in the good word case

n = list(range(100)) # negations

n_good = []  # for good word case
for i in range(100):
    if i % 2:
        n_good.append(1) #correct pred
    else: 
        n_good.append(0) #incorrect pred

In [31]:
df_good = pd.DataFrame(list(zip(n, n_good)), columns=['n', 'True Good'])

In [32]:
def prediction_distilbert(n):
    temp = negation(n, "good")[1]
    return 1 if temp[0]['label']=="POSITIVE" else 0

def prediction_bertweet(n):
    temp = negation(n, "good")[2]
    return 1 if temp[0]['label']=="POS" else 0

In [33]:
df_good['DistPred'] = df_good['n'].apply(prediction_distilbert)
df_good['BertPred'] = df_good['n'].apply(prediction_bertweet)

In [34]:
df_melted = pd.melt(df_good, id_vars=['n'], value_vars=['True Good', 'DistPred', 'BertPred'])
df_melted.head()

Unnamed: 0,n,variable,value
0,0,True Good,0
1,1,True Good,1
2,2,True Good,0
3,3,True Good,1
4,4,True Good,0


In [39]:
df_good

Unnamed: 0,n,True Good,DistPred,BertPred
0,0,0,1,1
1,1,1,0,0
2,2,0,0,0
3,3,1,0,0
4,4,0,0,0
...,...,...,...,...
95,95,1,1,0
96,96,0,1,0
97,97,1,1,0
98,98,0,1,0


In [42]:
# Conditionals
print(sentiment_pipeline("That is good if it's not bad"))
print(bertweet_pipeline("That is good if it's not bad"))

[{'label': 'POSITIVE', 'score': 0.9997832179069519}]
[{'label': 'POS', 'score': 0.9394725561141968}]


In [45]:
print(sentiment_pipeline("If it's not bad then it's good"))
print(bertweet_pipeline("If it's not bad then it's good"))

[{'label': 'POSITIVE', 'score': 0.9998347759246826}]
[{'label': 'POS', 'score': 0.7261000275611877}]


In [46]:
print(sentiment_pipeline("That's pretty average"))
print(bertweet_pipeline("That's pretty average"))

[{'label': 'NEGATIVE', 'score': 0.9646835327148438}]
[{'label': 'NEG', 'score': 0.8612220883369446}]


<hr>

### Partner Exercise

With your partner, try and determine what the following tweets might be classified as. Try to classify them into the same groups as both of the model pipelines we saw today - and try adding a few of your own sentences/Tweets! 

In [36]:
example_difficult_tweets = [
    "Kong vs Godzilla has record for most meth ever consumed in a writer's room",
    "@ashleevance Battery energy density is the key to electric aircraft. Autonomy for aircraft could have been done a long time ago. Modern airliners are very close to autonomous.",
    "Tesla's action is not directly reflective of my opinion. Having some Bitcoin, which is simply a less dumb form of liquidity than cash, is adventurous enough for an S&P500 company.",
    "Free speech is non negotiable",
    "To be clear we have not yet made changes to Twitter's content moderation policies",
]

The `distilbert-base` model:

In [37]:
for tweet in example_difficult_tweets[0:1000]:
    pprint(sentiment_pipeline(tweet))
    print(tweet + '\n')

[{'label': 'POSITIVE', 'score': 0.5429081320762634}]
Kong vs Godzilla has record for most meth ever consumed in a writer's room

[{'label': 'NEGATIVE', 'score': 0.6348376870155334}]
@ashleevance Battery energy density is the key to electric aircraft. Autonomy for aircraft could have been done a long time ago. Modern airliners are very close to autonomous.

[{'label': 'POSITIVE', 'score': 0.9419705867767334}]
Tesla's action is not directly reflective of my opinion. Having some Bitcoin, which is simply a less dumb form of liquidity than cash, is adventurous enough for an S&P500 company.

[{'label': 'NEGATIVE', 'score': 0.8268800377845764}]
Free speech is non negotiable

[{'label': 'NEGATIVE', 'score': 0.9965620636940002}]
To be clear we have not yet made changes to Twitter's content moderation policies



The `bertweet-base` model:

In [38]:
for tweet in example_difficult_tweets[0:1000]:
    pprint(bertweet_pipeline(tweet))
    print(tweet + '\n')

[{'label': 'NEG', 'score': 0.72130286693573}]
Kong vs Godzilla has record for most meth ever consumed in a writer's room

[{'label': 'NEU', 'score': 0.8023843169212341}]
@ashleevance Battery energy density is the key to electric aircraft. Autonomy for aircraft could have been done a long time ago. Modern airliners are very close to autonomous.

[{'label': 'NEU', 'score': 0.8843539953231812}]
Tesla's action is not directly reflective of my opinion. Having some Bitcoin, which is simply a less dumb form of liquidity than cash, is adventurous enough for an S&P500 company.

[{'label': 'NEU', 'score': 0.9585056900978088}]
Free speech is non negotiable

[{'label': 'NEU', 'score': 0.9331392049789429}]
To be clear we have not yet made changes to Twitter's content moderation policies



❓ How did you do? Did you find any surprising results? 

- We were able correctly assess the predictions both models would make for the tweets

❓ Are there any instances where the two models gave different predictions for the same tweet?

- All tweets resulted in different predicitons by each model
- The Bertweet model was better at assessing the sample tweets