# Sentiment Analysis using VADER and Huggingface

• VADER (Valence Aware Dictionary and Sentiment Reasoner): A lexicon and rule-based sentiment analysis tool that's particularly good at analyzing social media text.           
• Huggingface Transformers: Using pre-trained models such as BERT or DistilBERT for sentiment analysis, which is useful for understanding deeper contexts in sentences. 

### Exercise 1: Sentiment Analysis using VADER 

In [20]:
# Importing necessary libraries
import nltk
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [21]:
# Initializing the VADER SentimentIntensityAnalyzer 
analyzer = SentimentIntensityAnalyzer()

In [22]:
# Defining the sample sentences to analyze
sentences = [
    "I am so happy with the service.",
    "This movie was a waste of time.",
    "It was an okay experience.",
    "Best purchase I've made in years!",
    "I don't like this app, it's too slow."
]

In [23]:
# Analyzing sentiment 
for sentence in sentences: 
    score = analyzer.polarity_scores(sentence) 
    print(f"Text: {sentence}") 
    print(f"Sentiment Scores: {score}") 
    print(f"Sentiment: {'Positive' if score['compound'] >= 0.05 else 'Negative' if score['compound'] <= -0.05 else 'Neutral'}") 
    print("\n") 

Text: I am so happy with the service.
Sentiment Scores: {'neg': 0.0, 'neu': 0.559, 'pos': 0.441, 'compound': 0.6948}
Sentiment: Positive


Text: This movie was a waste of time.
Sentiment Scores: {'neg': 0.318, 'neu': 0.682, 'pos': 0.0, 'compound': -0.4215}
Sentiment: Negative


Text: It was an okay experience.
Sentiment Scores: {'neg': 0.0, 'neu': 0.678, 'pos': 0.322, 'compound': 0.2263}
Sentiment: Positive


Text: Best purchase I've made in years!
Sentiment Scores: {'neg': 0.0, 'neu': 0.527, 'pos': 0.473, 'compound': 0.6696}
Sentiment: Positive


Text: I don't like this app, it's too slow.
Sentiment Scores: {'neg': 0.232, 'neu': 0.768, 'pos': 0.0, 'compound': -0.2755}
Sentiment: Negative




#### 1. What do the sentiment scores (positive, neutral, negative, and compound) represent? 

The sentiment scores represent the following:  Positive Score – The proportion of positive sentiment in the text.                 
Neutral Score – The proportion of neutral sentiment in the text.                                                    
Negative Score – The proportion of negative sentiment in the text.                                     
Compound Score – A normalized score that sums up the sentiment, ranging from -1 (most negative) to +1 (most positive).

#### 2. How can you classify a sentence as positive, negative, or neutral based on the compound score? 

A sentence can be classified based on the compound score from VADER as follows: Positive: If the compound score is ≥ 0.05, the sentence is classified as positive.         
Negative: If the compound score is ≤ -0.05, the sentence is classified as negative.                                
Neutral: If the compound score is between -0.05 and 0.05, the sentence is classified as neutral.

### Exercise 2: Sentiment Analysis Using Huggingface Transformers

In [10]:
from transformers import pipeline

In [17]:
# Load the sentiment analysis pipeline from Huggingface 
sentiment_pipeline = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


In [18]:
# Defining the sample sentences to analyze
sentences = [
    "I love this new phone.",
    "I had a terrible experience with customer support.",
    "The movie was not bad, but not great either.",
    "Absolutely loved the restaurant!",
    "The product arrived damaged, very disappointed."
]

In [25]:
# Analyzing sentiment using Huggingface model 
for sentence in sentences: 
    result = sentiment_pipeline(sentence)[0] 
    print(f"Sentence: {sentence}") 
    print(f"Sentiment Label: {result['label']}, Confidence Score: {result['score']:.4f}") 
    print("\n")

Sentence: I am so happy with the service.
Sentiment Label: POSITIVE, Confidence Score: 0.9999


Sentence: This movie was a waste of time.
Sentiment Label: NEGATIVE, Confidence Score: 0.9998


Sentence: It was an okay experience.
Sentiment Label: POSITIVE, Confidence Score: 0.9998


Sentence: Best purchase I've made in years!
Sentiment Label: POSITIVE, Confidence Score: 0.9997


Sentence: I don't like this app, it's too slow.
Sentiment Label: NEGATIVE, Confidence Score: 0.9992




#### 1. What are the labels provided by the Huggingface model for sentiment analysis?

The Huggingface sentiment analysis model typically provides two labels: "POSITIVE" – Indicates that the sentiment is positive.    
"NEGATIVE" – Indicates that the sentiment is negative.

#### 2. How do the confidence scores relate to the model's prediction? 

The confidence score (ranging from 0 to 1) represents the model’s certainty in its prediction.
A higher score means the model is more confident in classifying the sentiment as positive or negative.
For example, if the model outputs "POSITIVE" with a score of 0.98, it means it is 98% confident that the text conveys positive sentiment.

### Exercise 3: Compare VADER and Huggingface

In [27]:
# Analyzing and Comparing performance of VADER and Huggingface on the same set of text data
for sentence in sentences:
    hf_result = sentiment_pipeline(sentence)[0]
    vader_result = analyzer.polarity_scores(sentence)
    
    print(f"Sentence: {sentence}")
    print(f"Huggingface Sentiment: {hf_result['label']}, Confidence Score: {hf_result['score']:.4f}")
    print(f"VADER Sentiment: {'Positive' if vader_result['compound'] >= 0.05 else 'Negative' if vader_result['compound'] <= -0.05 else 'Neutral'}, Compound Score: {vader_result['compound']:.4f}\n")

Sentence: I am so happy with the service.
Huggingface Sentiment: POSITIVE, Confidence Score: 0.9999
VADER Sentiment: Positive, Compound Score: 0.6948

Sentence: This movie was a waste of time.
Huggingface Sentiment: NEGATIVE, Confidence Score: 0.9998
VADER Sentiment: Negative, Compound Score: -0.4215

Sentence: It was an okay experience.
Huggingface Sentiment: POSITIVE, Confidence Score: 0.9998
VADER Sentiment: Positive, Compound Score: 0.2263

Sentence: Best purchase I've made in years!
Huggingface Sentiment: POSITIVE, Confidence Score: 0.9997
VADER Sentiment: Positive, Compound Score: 0.6696

Sentence: I don't like this app, it's too slow.
Huggingface Sentiment: NEGATIVE, Confidence Score: 0.9992
VADER Sentiment: Negative, Compound Score: -0.2755



#### 1. How do the results of VADER and Huggingface compare in terms of sentiment classification? 

VADER is rule-based and relies on a predefined lexicon, making it better suited for short, informal texts like social media posts. Huggingface Transformers use deep learning, making them more accurate for understanding context and complex sentence structures. In general, both methods agree on simple, clearly positive or negative sentences. However, Huggingface tends to be more reliable for nuanced language.

#### 2. Which method provides a more accurate prediction for complex sentences (e.g., sentences with sarcasm)?

Huggingface provides more accurate predictions for complex sentences, especially those with sarcasm or subtle sentiment shifts. VADER struggles with sarcasm because it mainly relies on word-based sentiment scores and lacks deep contextual understanding.

Example: For a sarcastic sentence like "Oh great, another Monday!", VADER may classify it as positive due to the word "great," while Huggingface is more likely to detect the sarcasm correctly.

#### 3. Which method is faster? Why might that be the case?

VADER is faster than Huggingface because it is a rule-based approach that uses a simple lexicon and predefined scoring rules to analyze sentiment. It runs directly on CPU without requiring heavy model inference.
Huggingface Transformers are slower because they use deep learning models, which require more computational power and time to process each sentence. It involves floating-point operations and matrix multiplications, which are computationally expensive.

If speed is the priority, VADER is the better choice. If accuracy and contextual understanding are important, Huggingface Transformers are more reliable.

### Exercise 4: Evaluating Sentiment Analysis Performance

In [43]:
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [44]:
# Step 1: Create a Test Dataset
data = {
    "Sentence": [
        "I love this new phone!", "I had a terrible experience with customer support.", 
        "The movie was not bad, but not great either.", "Absolutely loved the restaurant!", 
        "The product arrived damaged, very disappointed.", "This is the best purchase I've made!", 
        "I don't think I will buy this again.", "Service was okay, nothing special.",
        "The app crashes frequently, very frustrating.", "I am very satisfied with the service."
    ],
    "True Sentiment": ["positive", "negative", "neutral", "positive", "negative", 
                        "positive", "negative", "neutral", "negative", "positive"]
}
df = pd.DataFrame(data)

In [45]:
# Step 2: Perform Sentiment Analysis
# Initialize Sentiment Analyzers
analyzer = SentimentIntensityAnalyzer()
sentiment_pipeline = pipeline("sentiment-analysis")

vader_predictions = []
huggingface_predictions = []

for sentence in df["Sentence"]:
    hf_result = sentiment_pipeline(sentence)[0]
    vader_result = analyzer.polarity_scores(sentence)
    
    hf_sentiment = "positive" if hf_result["label"] == "POSITIVE" else "negative"
    vader_sentiment = "positive" if vader_result["compound"] >= 0.05 else "negative" if vader_result["compound"] <= -0.05 else "neutral"
    
    vader_predictions.append(vader_sentiment)
    huggingface_predictions.append(hf_sentiment)
    
    print(f"Sentence: {sentence}")
    print(f"Huggingface Sentiment: {hf_sentiment}, Confidence Score: {hf_result['score']:.4f}")
    print(f"VADER Sentiment: {vader_sentiment}, Compound Score: {vader_result['compound']:.4f}\n")

# Store predictions in DataFrame
df["VADER Prediction"] = vader_predictions
df["Huggingface Prediction"] = huggingface_predictions

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


Sentence: I love this new phone!
Huggingface Sentiment: positive, Confidence Score: 0.9998
VADER Sentiment: positive, Compound Score: 0.6696

Sentence: I had a terrible experience with customer support.
Huggingface Sentiment: negative, Confidence Score: 0.9995
VADER Sentiment: negative, Compound Score: -0.1027

Sentence: The movie was not bad, but not great either.
Huggingface Sentiment: negative, Confidence Score: 0.9963
VADER Sentiment: negative, Compound Score: -0.5448

Sentence: Absolutely loved the restaurant!
Huggingface Sentiment: positive, Confidence Score: 0.9999
VADER Sentiment: positive, Compound Score: 0.6689

Sentence: The product arrived damaged, very disappointed.
Huggingface Sentiment: negative, Confidence Score: 0.9998
VADER Sentiment: negative, Compound Score: -0.7425

Sentence: This is the best purchase I've made!
Huggingface Sentiment: positive, Confidence Score: 0.9999
VADER Sentiment: positive, Compound Score: 0.6696

Sentence: I don't think I will buy this again.

In [46]:
# Step 3: Calculate Evaluation Metrics
label_mapping = {"positive": 1, "neutral": 0, "negative": -1}
df["True Sentiment Num"] = df["True Sentiment"].map(label_mapping)
df["VADER Prediction Num"] = df["VADER Prediction"].map(label_mapping)
df["Huggingface Prediction Num"] = df["Huggingface Prediction"].map(label_mapping)

# Evaluate both models
vader_metrics = [
    accuracy_score(df["True Sentiment Num"], df["VADER Prediction Num"]),
    precision_score(df["True Sentiment Num"], df["VADER Prediction Num"], average="macro"),
    recall_score(df["True Sentiment Num"], df["VADER Prediction Num"], average="macro"),
    f1_score(df["True Sentiment Num"], df["VADER Prediction Num"], average="macro")
]

huggingface_metrics = [
    accuracy_score(df["True Sentiment Num"], df["Huggingface Prediction Num"]),
    precision_score(df["True Sentiment Num"], df["Huggingface Prediction Num"], average="macro"),
    recall_score(df["True Sentiment Num"], df["Huggingface Prediction Num"], average="macro"),
    f1_score(df["True Sentiment Num"], df["Huggingface Prediction Num"], average="macro")
]

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [47]:
# Print Results
print("VADER Performance:")
print(f"Accuracy: {vader_metrics[0]:.2f}, Precision: {vader_metrics[1]:.2f}, Recall: {vader_metrics[2]:.2f}, F1-Score: {vader_metrics[3]:.2f}\n")

print("Huggingface Performance:")
print(f"Accuracy: {huggingface_metrics[0]:.2f}, Precision: {huggingface_metrics[1]:.2f}, Recall: {huggingface_metrics[2]:.2f}, F1-Score: {huggingface_metrics[3]:.2f}\n")


VADER Performance:
Accuracy: 0.70, Precision: 0.53, Recall: 0.58, F1-Score: 0.56

Huggingface Performance:
Accuracy: 0.80, Precision: 0.56, Recall: 0.67, F1-Score: 0.60



#### 1. How do the models perform in terms of accuracy, precision, recall, and F1-score? 

Accuracy: Huggingface outperforms VADER in terms of overall accuracy, with a significant margin of 0.10 (80% vs. 70%).

Precision: Huggingface has a slightly better precision (0.56 vs. 0.53), which means it performs marginally better at identifying positive or negative sentiments when it labels them.

Recall: Huggingface again outperforms VADER in recall (0.67 vs. 0.58), indicating it is better at identifying the true positives or negatives.

F1-Score: Huggingface has a higher F1-score (0.60 vs. 0.56), indicating a more balanced performance between precision and recall.

#### 2. Which model performs better in predicting positive sentiment? Negative sentiment?

Positive Sentiment: Both models predict positive sentiment with high confidence. However, Huggingface seems to perform better when you look at the overall metrics, particularly the recall, which is better for Huggingface (0.67 vs. 0.58). This implies Huggingface does a better job of identifying positive sentiment instances.

Negative Sentiment: Huggingface again performs better for negative sentiment predictions based on the recall (0.67 vs. 0.58) and overall accuracy. Huggingface has higher recall, meaning it correctly identifies more instances of negative sentiment.

#### 3. What might cause discrepancies between the two models' predictions?

Several factors could explain the discrepancies in predictions between VADER and Huggingface:

Model Type: VADER is a rule-based sentiment analysis tool that relies on lexicons and predefined rules, while Huggingface uses a machine learning model (likely fine-tuned on a specific dataset) that learns from large-scale data to classify sentiment. This difference in approach can result in variations in how each model interprets sentiment.

Training Data: Huggingface likely uses a more diverse and larger dataset for training, which might allow it to capture more nuances in sentiment, whereas VADER relies on a fixed set of rules and lexicons. This can lead to better handling of complex or ambiguous sentences by Huggingface.

Context Sensitivity: Huggingface, being a machine learning model, might be more sensitive to context and subtleties in phrasing. For example, in the sentence "I don't think I will buy this again," VADER labels it as neutral, while Huggingface predicts it as negative. Huggingface may better capture the implied sentiment, whereas VADER's rule-based approach struggles with such subtleties.

Thresholds for Classification: VADER uses a compound score that can be mapped to sentiment (positive, negative, neutral), and sometimes the threshold for deciding the sentiment can be more conservative or less nuanced compared to the confidence score-based approach of Huggingface, which might lead to differing sentiment labels for borderline cases.

Confidence Scores: Huggingface provides a confidence score (e.g., 0.9998), which may offer more clarity on how sure the model is about the classification. VADER doesn’t provide a confidence score, which might result in less informative outputs when comparing the models directly.