# Sentiment Evaluation of Twitter and YouTube Data
## Tasks

1. Install packages and load evaluation datasets with Google NLP scores
2. Run VADER over evaluation texts
3. Run BERT over evaluation texts
4. Evaluate against sentiment annotations and compare with Google NLP

### Install requirements. 

The following cell contains all the necessary dependencies needed for this task. If you run the cell everything will be installed. 

* [`vaderSentiment`](https://github.com/cjhutto/vaderSentiment) is a Python package for a Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text.
* [`transformers`](https://huggingface.co/) is a Python package for creating and working with transformers. [Here](https://huggingface.co/docs) is the documentation of `transformers`.
* [`torch`](https://pytorch.org/) is a Python machine learning framework. We need this here for `transformers` since this package uses internally `torch`. [Here](https://pytorch.org/docs/stable/index.html) is the documentation of `torch`.
* [`pandas`](https://pandas.pydata.org/docs/index.html) is a Python package for creating and working with tabular data. [Here](https://pandas.pydata.org/docs/reference/index.html) is the documentation of `pandas`.

In [1]:
! pip install vaderSentiment
! pip install transformers sentencepiece
! pip install torch torchvision torchaudio
! pip install pandas




[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip





You may need to restart the Kernel after installing the dependencies!

### Import requirements
The cell below imports all necessary dependancies. Make sure they are installed (see cell above).

In [2]:
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


# 1. Load evaluation datasets and Google NLP scores

## 1.1 Load datasets
First read the Twitter and Youtube Comments CSV files (`Twitter-Sentiment.csv` and `YouTubeComments-Sentiment.csv`) and save them in a pandas Dataframe.

In [3]:
# Read Twitter data
twitter_data = pd.read_csv("Twitter-Sentiment.csv")
# print(twitter_data)

# Read Youtube data
youtube_data = pd.read_csv("YouTubeComments-Sentiment.csv")
# print(youtube_data)

# 2. Run VADER over evaluation texts

## 2.1 Run VADER over the first tweet

In this task you should use VADER for sentiment analysis. For this we use the `vaderSentiment` package. You first have to intatiate a new `SentimentIntensityAnalyzer` and use the `polarity_scores` method of it for the analysis. Apply this for the first tweet. Is it a good classification?

[Here](https://github.com/cjhutto/vaderSentiment) under 'Code Examples' you can find some example code how to use this package.

In [4]:
#  Intatiate a new SentimentIntensityAnalyzer
vader = SentimentIntensityAnalyzer()

# Cassify first tweet and print
first_tweet = twitter_data["text"][0]
first_tweet_classification = vader.polarity_scores(first_tweet)

print(f"First Tweet: {first_tweet}\n")
print(f"Classification first Tweet: {first_tweet_classification}\n")

First Tweet: ?RT @justinbiebcr: The bigger the better....if you know what I mean ;)

Classification first Tweet: {'neg': 0.0, 'neu': 0.853, 'pos': 0.147, 'compound': 0.2263}



The analyzed tweet is predominantly neutral (neu: 0.853) but leans slightly positive overall (compound: 0.2263 and pos: 0.147).
There’s no detectable negativity (neg: 0.0), so the tone of the tweet is likely neutral to mildly positive.

The classification is reasonable but not perfect. VADER captures the neutral structure and mild positivity but misses the playful, suggestive tone implied by the wink emoji and double entendre. It also overlooks the broader context and cultural nuances, such as the implied humor in "if you know what I mean." While suitable for general analysis, it lacks the sophistication to interpret subtle humor, innuendo, or contextual cues in tweets like this.

## 2.2 Run VADER over each text

Now use VADER for all the text data of the Twitter and the Youtube dataframe. Create a new column in the dataframes called `VADER_compound` where you save the `compound` result (look at the output dictonary of the `polarity_scores` method).

*Important: Make sure `compound` is a float*

If this runs slow on your computer you can use the precomputed values in the provided CSV files which are present in the column `VADER_compund_precomputed` for further tasks.

In [5]:
# Using VADER for sentiment analysis of twitter data
vader = SentimentIntensityAnalyzer()
twitter_data["VADER_compound"] = 0.0

#for i in range(10):
for i in range(len(twitter_data["text"])):
    # use polarity_scores method to get the sentiment scores
    sentiment_dict = vader.polarity_scores(twitter_data["text"][i])
    # Save the compound result as float in the dataset. 
    # Notice: .loc is way slower here.... but worked for us ;)
    twitter_data.loc[i, "VADER_compound"] = sentiment_dict["compound"]

In [6]:
# Using VADER for sentiment analysis of YouTube data
vader = SentimentIntensityAnalyzer()
youtube_data["VADER_compound"] = 0.0

#for i in range(10):
for i in range(len(youtube_data["text"])):
    # use polarity_scores method to get the sentiment scores
    sentiment_dict = vader.polarity_scores(youtube_data["text"][i])
    # Save the compound result as float in the dataset. 
    # Notice: .loc is way slower here.... but worked for us ;)
    youtube_data.loc[i, "VADER_compound"] = sentiment_dict["compound"]

## 2.3 VADER as a classifier

To get the three Classes `Positive`, `Negative` and `Neutral` we use the compound score with the following thresholds:

* `compound > 0.5`: `"Positive"`
* `compound < -0.5`: `"Negative"`
* `else`: `"Neutral"`

Create a new column called `VADER_class` which contains the three computed classes.

In [7]:
# Create new column for computed classes
twitter_data["VADER_class"] = "Neutral"
youtube_data["VADER_class"] = "Neutral"

# Classify Twitter Data
twitter_data.loc[twitter_data["VADER_compound"] > 0.5, "VADER_class"] = "Positive"
twitter_data.loc[twitter_data["VADER_compound"] < 0.5, "VADER_class"] = "Negative"

# Classify YouTube Data
youtube_data.loc[youtube_data["VADER_compound"] > 0.5, "VADER_class"] = "Positive"
youtube_data.loc[youtube_data["VADER_compound"] < 0.5, "VADER_class"] = "Negative"

# 3. Use a BERT based model for sentiment analysis

## 3.1 BERT
BERT (Bidirectional Encoder Representation from Transformers) is a machine learning technique for natural language processing. There are already pretrained models available in the `transformers` package. You can look [here](https://huggingface.co/models?sort=downloads&search=sentiment) and choose a model for the next tasks. (We suggest [this](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) (`"cardiffnlp/twitter-roberta-base-sentiment-latest"`) model, but you can use any available, just make sure it is suitable for sentiment analysis).

First create a `pipeline` where you set your model by the `model` keyword argument. You can then use this method to pass text which should be classified. [Here](https://huggingface.co/blog/sentiment-analysis-python#2-how-to-use-pre-trained-sentiment-analysis-models-with-python) is a tutorial how to use this.

As before save the classes in a new row 'BERT_class'. The call to your pipeline returns a dictionary where there is a key `label` which contains already the `Positive`, `Negative` or `Neutral` class (Be aware that this is based on the model you choose, sometimes these classes are named differently so you have to rename them by hand, this is not the case if you use the suggested model).

Based on you computer this may take some time, if it is too slow for you, you can again use the precomputed classes `'BERT_class_precomputed'` in the CSV Files for further tasks.

In [9]:
# Using BERT-Base-Uncased model for sentiment analysis
#sentiment_pipeline = pipeline(model=f"cardiffnlp/twitter-roberta-base-sentiment-latest")

# Create new column for computed BERT classes
#twitter_data["BERT_class"] = "Neutral"
#youtube_data["BERT_class"] = "Neutral"

twitter_data

# column_to_classify = "BERT_class_precomputed"
# column_to_classify = "BERT_class"


Unnamed: 0,label,text,googleScore,VADER_compound_precomputed,BERT_class_precomputed,VADER_compound,VADER_class
0,Positive,?RT @justinbiebcr: The bigger the better....if...,0.3,0.2263,Positive,0.2263,Negative
1,Positive,"Listening to the ""New Age"" station on @Slacker...",0.2,0.0000,Neutral,0.0000,Negative
2,Neutral,I favorited a YouTube video -- Drake and Josh ...,0.0,0.4019,Positive,0.4019,Negative
3,Positive,i didnt mean knee high I ment in lengt it goes...,0.8,0.8632,Positive,0.8632,Positive
4,Neutral,I wana see the vid Kyan,0.0,0.0000,Neutral,0.0000,Negative
...,...,...,...,...,...,...,...
4204,Neutral,"So far, i'm seeing the opposite of what you're...",0.4,0.0000,Negative,0.0000,Negative
4205,Neutral,RT @Nescreation I'm Yours w/ hearts Ladies Cam...,0.3,0.6486,Neutral,0.6486,Positive
4206,Positive,"RT @JoseCarol: If you fall, GET UP!, if you're...",0.3,0.6531,Positive,0.6531,Positive
4207,Neutral,@MakikiGirl I'm giving my 2 Japanese Chins a b...,-0.1,-0.2023,Negative,-0.2023,Negative


# 4. Evaluate against sentiment annotations and compare with Google NLP

## 4.1 Convert GoogleNLP scores to classes

As with VADER and BERT, compute classes from the GoogleNLP score, which is given in the column `googleScore`. For this use following thresholds:

* `googleScore > 0.3`: `"Positive"`
* `googleScore < -0.3`: `"Negativ"`
* `else`: `"Neutral"`

Save the classes in a new column named `GoogleNLP_class`.


In [16]:
# Create new column for Google NLP classes
twitter_data["GoogleNLP_class"] = "Neutral"
youtube_data["GoogleNLP_class"] = "Neutral"

# Classify Twitter Data
twitter_data.loc[twitter_data["googleScore"] > 0.3, "GoogleNLP_class"] = "Positive"
twitter_data.loc[twitter_data["googleScore"] < -0.3, "GoogleNLP_class"] = "Negative"

# Classify YouTube Data
youtube_data.loc[youtube_data["googleScore"] > 0.3, "GoogleNLP_class"] = "Positive"
youtube_data.loc[youtube_data["googleScore"] < -0.3, "GoogleNLP_class"] = "Negative"

youtube_data
twitter_data

Unnamed: 0,label,text,googleScore,VADER_compound_precomputed,BERT_class_precomputed,VADER_compound,VADER_class,GoogleNLP_class
0,Positive,?RT @justinbiebcr: The bigger the better....if...,0.3,0.2263,Positive,0.2263,Negative,Neutral
1,Positive,"Listening to the ""New Age"" station on @Slacker...",0.2,0.0000,Neutral,0.0000,Negative,Neutral
2,Neutral,I favorited a YouTube video -- Drake and Josh ...,0.0,0.4019,Positive,0.4019,Negative,Neutral
3,Positive,i didnt mean knee high I ment in lengt it goes...,0.8,0.8632,Positive,0.8632,Positive,Positive
4,Neutral,I wana see the vid Kyan,0.0,0.0000,Neutral,0.0000,Negative,Neutral
...,...,...,...,...,...,...,...,...
4204,Neutral,"So far, i'm seeing the opposite of what you're...",0.4,0.0000,Negative,0.0000,Negative,Positive
4205,Neutral,RT @Nescreation I'm Yours w/ hearts Ladies Cam...,0.3,0.6486,Neutral,0.6486,Positive,Neutral
4206,Positive,"RT @JoseCarol: If you fall, GET UP!, if you're...",0.3,0.6531,Positive,0.6531,Positive,Neutral
4207,Neutral,@MakikiGirl I'm giving my 2 Japanese Chins a b...,-0.1,-0.2023,Negative,-0.2023,Negative,Neutral


## 4.2 Evaluate on Twitter

First, let's calculate the accuracy for all three classifiers on the Twitter and Youtube data, print the results.

### Accuracy Formula
$$
\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Samples}}
$$

In [20]:
# Get number of Samples
number_twitter_samples = len(twitter_data.index)
number_youtube_samples = len(youtube_data.index)

# Your Code goes here!
correct_predictiions_VADER_on_twitter_data = twitter_data[twitter_data["VADER_class"]==twitter_data["label"]].shape[0]
correct_predictiions_VADER_on_twitter_data


4209


758

Next calculate the precision of the `"Positive"` class for the Twitter and Youtube data.
This is calculated as follows:
$
\begin{align}
    precision = \frac{TP}{TP + FP}
\end{align}
$
*Note: Here the Positive samples are the one with the the class `"Positive"`*

In [None]:
# Your Code goes here!


Now calculate the recall score. This is done by:
$
\begin{align}
    recall = \frac{TP}{TP + FN}
\end{align}
$
*Note: Here the Positive samples are the one with the the class `"Positive"`*

In [None]:
# Your Code goes here!


Calculate the Recall and the Precision score now also for the negative class. The Precision is calculated as:
$
\begin{align}
    precision = \frac{TP}{TP + FP}
\end{align}
$
*Note: Here the Positive samples are the one with the the class `"Negative"`*

And the Recall is calculated as:
$
\begin{align}
    recall = \frac{TP}{TP + FN}
\end{align}
$
*Note: Here the Positive samples are the one with the the class `"Negative"`*

In [None]:
# Your Code goes here!


# To learn more
1. What was the best performing method for Youtube? Did that fit your expectations?
2. What was the best performing method for Twitter? Did that fit your expectations?
4. Do you observe any differences between prediction of positive and negative sentiment? What is the role of the imbalance between postive and negative classes in the calculation of accuracy?
