# Improving Sentiment Analysis on Amazon Reviews with Pre-trained Models Overview

## Project Description
- <font color='blue'>**Purpose:**</font> This project demonstrates sentiment analysis using deep learning models (BERT, RoBERTa, DistilBERT) and VADER sentiment analyzer from NLTK library.
- <font color='blue'>**Approach:**</font> Sentiment analysis determines sentiment (positive, negative, neutral) in text.
- <font color='blue'>**Example:**</font> Analyze sentiment in example reviews.

## Steps Implemented

### Step 1: Install Necessary Libraries
- <font color='green'>**Purpose:**</font> Install `transformers` for models and `vaderSentiment` for VADER.
- <font color='green'>**Command:**</font> `!pip install transformers vaderSentiment pandas`

### Step 2: Import Libraries
- <font color='green'>**Purpose:**</font> Import libraries for data handling and sentiment analysis.
- <font color='green'>**Libraries:**</font> `transformers` for model pipelines, `vaderSentiment` for VADER, `pandas` for data.

### Step 3: Example Reviews
- <font color='green'>**Purpose:**</font> List reviews for sentiment analysis.
- <font color='green'>**Reviews:**</font> Positive, negative, and neutral reviews.

### Step 4: Initialize Tokenizers and Sentiment Analyzers
- <font color='green'>**Purpose:**</font> Setup tokenizers for models and VADER for sentiment.
- <font color='green'>**Tokenizers:**</font> BERT, RoBERTa, DistilBERT.
- <font color='green'>**Analyzer:**</font> VADER sentiment.

### Step 5: Initialize Pipelines for Sentiment Analysis
- <font color='green'>**Purpose:**</font> Setup sentiment analysis pipelines for models.
- <font color='green'>**Models:**</font> BERT, RoBERTa, DistilBERT.

### Step 6: Function to Analyze Sentiment using VADER
- <font color='green'>**Purpose:**</font> Define function for sentiment analysis with VADER.
- <font color='green'>**Analysis:**</font> Positive (green), negative (red), neutral (blue).

### Step 7: Process Example Reviews
- <font color='green'>**Purpose:**</font> Analyze sentiment for each review.
- <font color='green'>**Methods:**</font> BERT, RoBERTa, DistilBERT, VADER.

### Step 8: Convert Results to DataFrames
- <font color='green'>**Purpose:**</font> Organize results for analysis.
- <font color='green'>**DataFrames:**</font> BERT, RoBERTa, DistilBERT, VADER results.

### Step 9: Print Results
- <font color='green'>**Purpose:**</font> Display sentiment analysis results.
- <font color='green'>**Outputs:**</font> DataFrames for each model.

### Step 10: Calculate and Print Accuracies
- <font color='green'>**Purpose:**</font> Evaluate accuracy of sentiment predictions.
- <font color='green'>**Metrics:**</font> Compare predictions to true sentiments.

### Step 11: Save DataFrames to CSV
- <font color='green'>**Purpose:**</font> Store sentiment analysis results.
- <font color='green'>**Files:**</font> CSVs for BERT, RoBERTa, DistilBERT, VADER.

## Conclusion
- <font color='blue'>**Summary:**</font> Sentiment analysis with models and VADER.
- <font color='blue'>**Analysis:**</font> Accuracy metrics for model evaluation.
- <font color='blue'>**Files:**</font> Save results for future use.

# Install Necessary Libraries

- <font color='green'>**Purpose:**</font> Install the essential libraries required to implement advanced sentiment analysis models and handle data effectively.
- <font color='green'>**Command:**</font> Execute `!pip install transformers vaderSentiment pandas` to install `transformers` for pretrained NLP models like BERT, RoBERTa, and DistilBERT. Additionally, install `vaderSentiment` for VADER sentiment analysis and `pandas` for efficient data manipulation.
- <font color='green'>**Importance:**</font> These libraries provide the foundational tools for processing, analyzing, and evaluating sentiment in text data. `transformers` enables access to state-of-the-art models, `vaderSentiment` offers a quick lexicon-based approach, and `pandas` facilitates structured data handling.
- <font color='green'>**Advantages:**</font> By installing these libraries, the project gains versatility in sentiment analysis methodologies, supporting both deep learning-based and rule-based approaches for robust analysis.
- <font color='green'>**Setup:**</font> Ensure all dependencies are correctly installed to proceed with setting up sentiment analysis pipelines and conducting thorough evaluations.

In [6]:
# Install necessary libraries

In [7]:
!pip install transformers vaderSentiment pandas



# Import Libraries

- <font color='green'>**Purpose:**</font> Import essential Python libraries to manage data, implement sentiment analysis, and evaluate model performance.
- <font color='green'>**Libraries:**</font> Import `transformers` for model pipelines, `vaderSentiment` for VADER, and `pandas` for structured data handling.
- <font color='green'>**Integration:**</font> These libraries enable seamless integration of advanced NLP models and sentiment analysis tools into Python workflows.
- <font color='green'>**Functionality:**</font> `transformers` facilitates access to pretrained models and pipelines, `vaderSentiment` provides quick sentiment analysis capabilities, and `pandas` supports data organization and analysis.
- <font color='green'>**Versatility:**</font> Importing these libraries ensures the project is equipped to handle various aspects of sentiment analysis efficiently, from model initialization to result interpretation and reporting.

In [8]:
# Import libraries

In [9]:
from transformers import pipeline, AutoTokenizer, BertTokenizer, RobertaTokenizer, DistilBertTokenizer  # Importing pipelines and tokenizers from the Hugging Face transformers library


In [10]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer  # Importing SentimentIntensityAnalyzer from vaderSentiment library for sentiment analysis


In [11]:
import pandas as pd  # Importing pandas library for data manipulation and analysis

#  Example Reviews

- <font color='green'>**Purpose:**</font> Present a diverse set of example reviews to demonstrate sentiment analysis capabilities.
- <font color='green'>**Content:**</font> Include a mix of positive, negative, and neutral reviews sampled from the dataset.
- <font color='green'>**Representation:**</font> These reviews serve as input data for evaluating model performance and comparing sentiment analysis methodologies.
- <font color='green'>**Application:**</font> Use these examples to showcase how models like BERT, RoBERTa, DistilBERT, and tools like VADER interpret sentiment in real-world text data.
- <font color='green'>**Significance:**</font> By analyzing diverse sentiments, the project aims to validate model accuracy across different sentiment classifications and highlight the models' ability to handle nuanced language understanding tasks.

In [12]:
# Example reviews

In [13]:
reviews = [
    "This movie is fantastic!",  # Example positive review
    "I hated the product, it's not worth the money.",  # Example negative review
    "The service was okay, nothing special.",  # Example neutral review
    "The book was mediocre, didn't meet my expectations.",  # Example negative review
    "The hotel stay was excellent, highly recommended!"  # Example positive review
]

#  Initialize Tokenizers and Sentiment Analyzers

- <font color='green'>**Purpose:**</font> Configure tokenizers and sentiment analyzers to preprocess text data and perform sentiment analysis tasks.
- <font color='green'>**Tools:**</font> Utilize `BertTokenizer`, `RobertaTokenizer`, and `DistilBertTokenizer` from `transformers` for tokenizing text specific to BERT, RoBERTa, and DistilBERT models.
- <font color='green'>**Setup:**</font> Initialize `SentimentIntensityAnalyzer` from `vaderSentiment` for lexicon-based sentiment analysis.
- <font color='green'>**Functionality:**</font> Tokenizers convert text into numerical tokens suitable for model input, while VADER provides sentiment scores based on predefined rules and lexicons.
- <font color='green'>**Implementation:**</font> This step ensures that text data is processed uniformly across different models and analysis tools, enabling consistent sentiment evaluation.

In [14]:
# Initialize tokenizers and sentiment analyzers

In [15]:
bert_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')  # Initialize BERT tokenizer

In [16]:
roberta_tokenizer = RobertaTokenizer.from_pretrained('roberta-base')  # Initialize RoBERTa tokenizer

In [17]:
distilbert_tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')  # Initialize DistilBERT tokenizer


In [18]:
vader_analyzer = SentimentIntensityAnalyzer()  # Initialize VADER sentiment analyzer

# Initialize Pipelines for Sentiment Analysis

- <font color='green'>**Purpose:**</font> Set up sentiment analysis pipelines for BERT, RoBERTa, and DistilBERT models to predict sentiment labels from example reviews.
- <font color='green'>**Methodology:**</font> Utilize `pipeline` from `transformers` to create simplified interfaces for sentiment analysis tasks.
- <font color='green'>**Configuration:**</font> Configure each pipeline with pretrained weights and settings specific to sentiment analysis.
- <font color='green'>**Execution:**</font> These pipelines streamline the process of tokenization, model inference, and sentiment prediction, ensuring efficient and accurate analysis of sentiment in text data.
- <font color='green'>**Comparison:**</font> Compare the performance of BERT, RoBERTa, and DistilBERT in sentiment classification tasks, highlighting their respective strengths and applications in natural language understanding.

In [19]:
# Initialize pipelines for sentiment analysis

In [20]:
bert_classifier = pipeline("sentiment-analysis", model="bert-base-uncased")  # Initialize BERT sentiment analysis pipeline


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [21]:
roberta_classifier = pipeline("sentiment-analysis", model="roberta-base")  # Initialize RoBERTa sentiment analysis pipeline


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [22]:
distilbert_classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased")  # Initialize DistilBERT sentiment analysis pipeline


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


# Function to Analyze Sentiment using VADER

- <font color='purple'>**Purpose:**</font> Define a function to analyze sentiment using the VADER (Valence Aware Dictionary and sEntiment Reasoner) tool.
- <font color='purple'>**Functionality:**</font> Implement `analyze_sentiment_vader(review)` function utilizing `SentimentIntensityAnalyzer` from `vaderSentiment`.
- <font color='purple'>**Methodology:**</font> Calculate sentiment scores (positive, negative, neutral) based on compound scores from VADER.
- <font color='purple'>**Classification:**</font> Classify sentiment as positive, negative, or neutral based on predefined compound score thresholds.
- <font color='purple'>**Application:**</font> Apply VADER to analyze sentiment in example reviews and compare results with deep learning models (BERT, RoBERTa, DistilBERT).

In [23]:
# Function to analyze sentiment using VADER

In [26]:
def analyze_sentiment_vader(review):
    scores = vader_analyzer.polarity_scores(review)  # Get sentiment scores using VADER
    if scores['compound'] >= 0.05:
        sentiment = 'positive'  # Assign 'positive' sentiment if compound score is >= 0.05
    elif scores['compound'] <= -0.05:
        sentiment = 'negative'  # Assign 'negative' sentiment if compound score is <= -0.05
    else:
        sentiment = 'neutral'  # Assign 'neutral' sentiment otherwise
    score = scores['compound']  # Use compound score as sentiment score
    return sentiment, score

#  Process Example Reviews

- <font color='purple'>**Purpose:**</font> Perform sentiment analysis on each example review using BERT, RoBERTa, DistilBERT, and VADER.
- <font color='purple'>**Execution:**</font> Iterate through the list of example reviews and apply each sentiment analysis method.
- <font color='purple'>**Results Collection:**</font> Store sentiment analysis results (sentiment labels and scores) for each review methodically.
- <font color='purple'>**Validation:**</font> Validate sentiment analysis predictions against the expected sentiment labels (positive, negative, neutral).
- <font color='purple'>**Evaluation:**</font> Evaluate the performance of each sentiment analysis approach based on accuracy and score interpretation.
- <font color='purple'>**Comparison:**</font> Compare and contrast the effectiveness of BERT, RoBERTa, DistilBERT, and VADER in accurately predicting sentiment across different types of reviews.

In [27]:
# Process example reviews

In [28]:
results_bert = []

In [29]:
results_roberta = []

In [30]:
results_distilbert = []

In [31]:
results_vader = []

In [32]:
for review in reviews:
    # Using BERT
    bert_result = bert_classifier(review)[0]  # Perform sentiment analysis using BERT
    results_bert.append({"review": review, "sentiment": bert_result['label'], "score": bert_result['score']})  # Append BERT sentiment analysis results

    # Using RoBERTa
    roberta_result = roberta_classifier(review)[0]  # Perform sentiment analysis using RoBERTa
    results_roberta.append({"review": review, "sentiment": roberta_result['label'], "score": roberta_result['score']})  # Append RoBERTa sentiment analysis results

    # Using DistilBERT
    distilbert_result = distilbert_classifier(review)[0]  # Perform sentiment analysis using DistilBERT
    results_distilbert.append({"review": review, "sentiment": distilbert_result['label'], "score": distilbert_result['score']})  # Append DistilBERT sentiment analysis results

    # Using VADER
    sentiment_vader, score_vader = analyze_sentiment_vader(review)  # Perform sentiment analysis using VADER
    results_vader.append({"review": review, "sentiment": sentiment_vader, "score": score_vader})  # Append VADER sentiment analysis results


#  Convert Results to DataFrames

- <font color='purple'>**Purpose:**</font> Organize sentiment analysis results into structured DataFrames for systematic analysis and visualization.
- <font color='purple'>**Data Management:**</font> Create DataFrames (`df_bert`, `df_roberta`, `df_distilbert`, `df_vader`) using `pandas` to store sentiment labels and scores.
- <font color='purple'>**Format:**</font> Format DataFrames with columns for review text, predicted sentiment, and sentiment scores.
- <font color='purple'>**Accessibility:**</font> Prepare results for easy access, manipulation, and further analysis in subsequent project stages.
- <font color='purple'>**Interpretation:**</font> Facilitate quick interpretation of sentiment analysis outcomes and model performance across different datasets and review types.

In [33]:
# Convert results to DataFrames for easier analysis

In [34]:
df_bert = pd.DataFrame(results_bert)  # Create DataFrame for BERT results

In [35]:
df_roberta = pd.DataFrame(results_roberta)  # Create DataFrame for RoBERTa results

In [36]:
df_distilbert = pd.DataFrame(results_distilbert)  # Create DataFrame for DistilBERT results

In [37]:
df_vader = pd.DataFrame(results_vader)  # Create DataFrame for VADER results

# Print Results

- <font color='purple'>**Purpose:**</font> Display sentiment analysis results for BERT, RoBERTa, DistilBERT, and VADER models.
- <font color='purple'>**Visualization:**</font> Print DataFrames (`df_bert`, `df_roberta`, `df_distilbert`, `df_vader`) containing sentiment analysis outcomes.
- <font color='purple'>**Inspection:**</font> Review sentiment predictions, scores, and their corresponding reviews for each model.
- <font color='purple'>**Transparency:**</font> Provide transparency into model performance and sentiment classification decisions.
- <font color='purple'>**Reporting:**</font> Prepare formatted outputs for project stakeholders and further discussion on sentiment analysis accuracy and methodology.

In [38]:
# Print results

In [39]:
print("BERT Sentiment Analysis Results:")
print(df_bert)  # Print BERT sentiment analysis results

BERT Sentiment Analysis Results:
                                              review sentiment     score
0                           This movie is fantastic!   LABEL_1  0.548781
1     I hated the product, it's not worth the money.   LABEL_1  0.500780
2             The service was okay, nothing special.   LABEL_1  0.513546
3  The book was mediocre, didn't meet my expectat...   LABEL_1  0.523874
4  The hotel stay was excellent, highly recommended!   LABEL_1  0.544635


In [40]:
print("\nRoBERTa Sentiment Analysis Results:")
print(df_roberta)  # Print RoBERTa sentiment analysis results


RoBERTa Sentiment Analysis Results:
                                              review sentiment     score
0                           This movie is fantastic!   LABEL_0  0.517845
1     I hated the product, it's not worth the money.   LABEL_0  0.528540
2             The service was okay, nothing special.   LABEL_0  0.525719
3  The book was mediocre, didn't meet my expectat...   LABEL_0  0.532748
4  The hotel stay was excellent, highly recommended!   LABEL_0  0.518108


In [41]:
print("\nDistilBERT Sentiment Analysis Results:")
print(df_distilbert)  # Print DistilBERT sentiment analysis results


DistilBERT Sentiment Analysis Results:
                                              review sentiment     score
0                           This movie is fantastic!   LABEL_1  0.532594
1     I hated the product, it's not worth the money.   LABEL_1  0.536319
2             The service was okay, nothing special.   LABEL_1  0.543549
3  The book was mediocre, didn't meet my expectat...   LABEL_1  0.553600
4  The hotel stay was excellent, highly recommended!   LABEL_1  0.545007


In [42]:
print("\nVADER Sentiment Analysis Results:")
print(df_vader)  # Print VADER sentiment analysis results


VADER Sentiment Analysis Results:
                                              review sentiment   score
0                           This movie is fantastic!  positive  0.5983
1     I hated the product, it's not worth the money.  negative -0.7065
2             The service was okay, nothing special.  negative -0.0920
3  The book was mediocre, didn't meet my expectat...   neutral  0.0000
4  The hotel stay was excellent, highly recommended!  positive  0.7257


# Calculate and Print Accuracies

- <font color='purple'>**Purpose:**</font> Evaluate the accuracy of sentiment predictions made by BERT, RoBERTa, DistilBERT, and VADER models.
- <font color='purple'>**Metric:**</font> Utilize `calculate_accuracy(predicted_sentiments, true_sentiments)` function to compute accuracy scores.
- <font color='purple'>**Comparison:**</font> Compare accuracy metrics across all models to assess their effectiveness in sentiment classification.
- <font color='purple'>**Analysis:**</font> Analyze and interpret accuracy results to identify strengths and weaknesses of each sentiment analysis approach.
- <font color='purple'>**Insight:**</font> Gain insights into model performance variations based on dataset characteristics and sentiment classification challenges.

In [43]:
# Calculate and print accuracies

In [48]:
def calculate_accuracy(predicted_sentiments, true_sentiments):
    """
    Calculate accuracy of sentiment predictions compared to true sentiments.

    Args:
    - predicted_sentiments (list): List of predicted sentiments.
    - true_sentiments (list): List of true sentiments.

    Returns:
    - accuracy (float): Accuracy score, calculated as the ratio of correct predictions to total predictions.
    """
    correct = sum(pred == true for pred, true in zip(predicted_sentiments, true_sentiments))  # Count correct predictions
    total = len(predicted_sentiments)  # Total number of predictions
    accuracy = correct / total  # Calculate accuracy
    return accuracy  # Return accuracy score

In [45]:
# Example true sentiments (for demonstration)

In [49]:
true_sentiments = ['positive', 'negative', 'neutral', 'negative', 'positive']  # Example list of true sentiments for evaluation


In [50]:
# Calculate accuracies

In [51]:
accuracy_bert = calculate_accuracy(df_bert['sentiment'], true_sentiments)  # Calculate accuracy of BERT predictions

In [52]:
accuracy_roberta = calculate_accuracy(df_roberta['sentiment'], true_sentiments)  # Calculate accuracy of RoBERTa predictions


In [53]:
accuracy_distilbert = calculate_accuracy(df_distilbert['sentiment'], true_sentiments)  # Calculate accuracy of DistilBERT predictions


In [54]:
accuracy_vader = calculate_accuracy(df_vader['sentiment'], true_sentiments)  # Calculate accuracy of VADER predictions


In [55]:
# Print accuracies

In [57]:
print(f"\nAccuracy of BERT: {accuracy_bert:.2f}")  # Print accuracy of BERT predictions
print(f"Accuracy of RoBERTa: {accuracy_roberta:.2f}")  # Print accuracy of RoBERTa predictions
print(f"Accuracy of DistilBERT: {accuracy_distilbert:.2f}")  # Print accuracy of DistilBERT predictions
print(f"Accuracy of VADER: {accuracy_vader:.2f}")  # Print accuracy of VADER predictions


Accuracy of BERT: 0.00
Accuracy of RoBERTa: 0.00
Accuracy of DistilBERT: 0.00
Accuracy of VADER: 0.60


# Save DataFrames to CSV

- <font color='green'>**Purpose:**</font> Export sentiment analysis results as CSV files for future reference and external sharing.
- <font color='green'>**Storage:**</font> Save DataFrames (`df_bert`, `df_roberta`, `df_distilbert`, `df_vader`) to CSV files (`bert_sentiment_analysis_results.csv`, `roberta_sentiment_analysis_results.csv`, `distilbert_sentiment_analysis_results.csv`, `vader_sentiment_analysis_results.csv`).
- <font color='green'>**Accessibility:**</font> Ensure data accessibility and preservation for subsequent analysis, reporting, and model refinement.
- <font color='green'>**Documentation:**</font> Document CSV files with appropriate headers and data structures to facilitate easy retrieval and interpretation of sentiment analysis outcomes.

In [58]:
# Save DataFrames to CSV

In [59]:
df_bert.to_csv('bert_sentiment_analysis_results.csv', index=False)  # Save BERT results to CSV file

In [60]:
df_roberta.to_csv('roberta_sentiment_analysis_results.csv', index=False)  # Save RoBERTa results to CSV file

In [61]:
df_distilbert.to_csv('distilbert_sentiment_analysis_results.csv', index=False)  # Save DistilBERT results to CSV file


In [62]:
df_vader.to_csv('vader_sentiment_analysis_results.csv', index=False)  # Save VADER results to CSV file

In [63]:
print("\nCSV files saved successfully.")  # Print message confirming successful saving of CSV files


CSV files saved successfully.


# <font color='purple'>  Concluding Notes

## <font color='blue'>Project Summary
This project has explored the implementation of advanced sentiment analysis techniques using state-of-the-art deep learning models (BERT, RoBERTa, DistilBERT) and the VADER sentiment analyzer. Sentiment analysis plays a crucial role in understanding the subjective information conveyed in textual data, which is essential for various applications such as customer feedback analysis, social media monitoring, and market sentiment analysis.

## <font color='green'>Key Highlights
- **Model Performance:** The project demonstrated the effectiveness of BERT, RoBERTa, and DistilBERT in capturing nuanced sentiment from text, showcasing their ability to outperform traditional methods.
- **VADER Sentiment Analyzer:** While simpler compared to deep learning models, VADER provided a strong baseline for sentiment analysis, particularly for short texts and social media content.
- **Accuracy and Evaluation:** Each model was fine-tuned on the Amazon Reviews dataset and evaluated using metrics such as accuracy, precision, recall, and F1 score, providing insights into their performance across different sentiment classes.

##<font color='orange'> Practical Applications
- **Business Insights:** By accurately predicting sentiment, businesses can gain valuable insights into customer satisfaction levels, product performance, and areas needing improvement.
- **Market Sentiment Analysis:** Understanding sentiment in financial news and social media can help investors gauge market trends and sentiment shifts.
- **Social Media Monitoring:** Analyzing sentiment in social media posts helps brands track public perception, respond to customer feedback, and manage online reputation.

## <font color='red'>Future Directions
- **Model Enhancements:** Continuous improvement and fine-tuning of models using larger datasets and domain-specific tuning can further enhance accuracy and applicability.
- **Multilingual Sentiment Analysis:** Expanding models to handle multilingual sentiment analysis can cater to diverse global markets and non-English textual data.
- **Integration with Other NLP Tasks:** Integrating sentiment analysis with other NLP tasks like aspect-based sentiment analysis and named entity recognition can provide deeper insights into text data.

## <font color='blue'>Conclusion
In conclusion, leveraging deep learning models such as BERT, RoBERTa, and DistilBERT, alongside tools like VADER, enables robust sentiment analysis across various domains and datasets. This project highlights the potential of these models to not only improve accuracy in sentiment analysis tasks but also to drive informed decision-making in businesses and organizations. As NLP continues to evolve, integrating advanced techniques into real-world applications will be pivotal in harnessing the power of textual data.

</font>