## 4. Sentiment Analysis Model Selection
### Choose a Model:
- Select a sentiment analysis model or library such as VADER, TextBlob, or transformers (BERT, RoBERTa).

### Model Training (if applicable):
- If using a custom model, split the data into training and testing sets.
- Train the model on labeled data (if available) or use pre-trained models for sentiment classification.

# Process

## 1. Model Options
There are several models and libraries available for sentiment analysis. In this example, we will consider the following options:
- **TextBlob**: A simple rule-based library that provides a quick way to perform sentiment analysis.
- **VADER (Valence Aware Dictionary and sEntiment Reasoner)**: A rule-based model specifically attuned to sentiments expressed in social media.
- **Transformers (e.g., BERT, RoBERTa)**: Advanced machine learning models that can be fine-tuned for sentiment analysis.

## 2. Model Training (if applicable)
For advanced models like transformers, training involves fine-tuning the model on a labeled dataset. Here, we will demonstrate how to use a pre-trained model from the Hugging Face library.

## 3. Model Evaluation
Evaluate the performance of the chosen model using metrics like accuracy, precision, recall, and F1-score.

## 4. Model Selection
Select the best-performing model based on the evaluation metrics.

### Example Code for Sentiment Model Selection

#### Using TextBlob for Sentiment Analysis

In [1]:
import pandas as pd
from textblob import TextBlob

def get_sentiment_textblob(text):
    analysis = TextBlob(text)
    if analysis.sentiment.polarity > 0:
        return 'positive'
    elif analysis.sentiment.polarity < 0:
        return 'negative'
    else:
        return 'neutral'

# Load the preprocessed CSV file
file_path_eda = '2.1_preprocessed_reviews.csv'
df_preprocessed = pd.read_csv(file_path_eda)

# Apply TextBlob sentiment analysis
df_preprocessed['Sentiment_TextBlob'] = df_preprocessed['Comment'].apply(get_sentiment_textblob)

# Display the first few rows with the new sentiment column
df_preprocessed[['Username', 'Rating', 'Date', 'Comment', 'Sentiment_TextBlob']].head()

Unnamed: 0,Username,Rating,Date,Comment,Sentiment_TextBlob
0,A***.,5,1 day ago,highly responsive accurate sensor great feelin...,positive
1,Tiar Y.,5,14 Mar 2024,actually quite hesitate whrn buy mouse cu craz...,positive
2,Metheldis R.,5,31 Jan 2024,cheapest price itemsuperbly fast drop delivery...,positive
3,Jeff T.,5,10 Mar 2021,skeptical buy store zero review product decide...,positive
4,D***.,5,18 Sep 2021,fadt delivery well packaging box still intact ...,positive


#### Using VADER for Sentiment Analysis

In [2]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import nltk
nltk.download('vader_lexicon')

sid = SentimentIntensityAnalyzer()

def get_sentiment_vader(text):
    scores = sid.polarity_scores(text)
    if scores['compound'] > 0:
        return 'positive'
    elif scores['compound'] < 0:
        return 'negative'
    else:
        return 'neutral'

# Apply VADER sentiment analysis
df_preprocessed['Sentiment_VADER'] = df_preprocessed['Comment'].apply(get_sentiment_vader)

# Display the first few rows with the new sentiment column
df_preprocessed[['Username', 'Rating', 'Date', 'Comment', 'Sentiment_VADER']].head()

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\lokma\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


Unnamed: 0,Username,Rating,Date,Comment,Sentiment_VADER
0,A***.,5,1 day ago,highly responsive accurate sensor great feelin...,positive
1,Tiar Y.,5,14 Mar 2024,actually quite hesitate whrn buy mouse cu craz...,negative
2,Metheldis R.,5,31 Jan 2024,cheapest price itemsuperbly fast drop delivery...,positive
3,Jeff T.,5,10 Mar 2021,skeptical buy store zero review product decide...,positive
4,D***.,5,18 Sep 2021,fadt delivery well packaging box still intact ...,positive


#### Using Transformers for Sentiment Analysis

In [3]:
from transformers import pipeline

# Load pre-trained sentiment-analysis pipeline with a specified model
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
sentiment_pipeline = pipeline('sentiment-analysis', model=model_name)

def get_sentiment_transformers(text):
    result = sentiment_pipeline(text)[0]
    return result['label'].lower()

# Apply Transformers sentiment analysis
df_preprocessed['Sentiment_Transformers'] = df_preprocessed['Comment'].apply(get_sentiment_transformers)

# Display the first few rows with the new sentiment column
df_preprocessed[['Username', 'Rating', 'Date', 'Comment', 'Sentiment_Transformers']].head()





Unnamed: 0,Username,Rating,Date,Comment,Sentiment_Transformers
0,A***.,5,1 day ago,highly responsive accurate sensor great feelin...,positive
1,Tiar Y.,5,14 Mar 2024,actually quite hesitate whrn buy mouse cu craz...,negative
2,Metheldis R.,5,31 Jan 2024,cheapest price itemsuperbly fast drop delivery...,negative
3,Jeff T.,5,10 Mar 2021,skeptical buy store zero review product decide...,positive
4,D***.,5,18 Sep 2021,fadt delivery well packaging box still intact ...,positive


#### Model Evaluation
Evaluate the performance of each model using metrics like accuracy, precision, recall, and F1-score.

In [4]:
from sklearn.metrics import classification_report

# Assuming TextBlob results as pseudo-ground truth
true_labels = df_preprocessed['Sentiment_TextBlob']  # Using TextBlob as the baseline

# Evaluation for TextBlob (should be perfect as it's compared with itself)
print("TextBlob Sentiment Analysis Evaluation")
print(classification_report(true_labels, df_preprocessed['Sentiment_TextBlob']))

# Evaluation for VADER
print("VADER Sentiment Analysis Evaluation")
print(classification_report(true_labels, df_preprocessed['Sentiment_VADER']))

# Evaluation for Transformers
print("Transformers Sentiment Analysis Evaluation")
print(classification_report(true_labels, df_preprocessed['Sentiment_Transformers']))

TextBlob Sentiment Analysis Evaluation
              precision    recall  f1-score   support

    negative       1.00      1.00      1.00        16
     neutral       1.00      1.00      1.00        52
    positive       1.00      1.00      1.00       333

    accuracy                           1.00       401
   macro avg       1.00      1.00      1.00       401
weighted avg       1.00      1.00      1.00       401

VADER Sentiment Analysis Evaluation
              precision    recall  f1-score   support

    negative       0.35      0.44      0.39        16
     neutral       0.66      0.85      0.74        52
    positive       0.95      0.89      0.92       333

    accuracy                           0.87       401
   macro avg       0.65      0.73      0.68       401
weighted avg       0.89      0.87      0.88       401

Transformers Sentiment Analysis Evaluation
              precision    recall  f1-score   support

    negative       0.15      0.94      0.25        16
     neutra

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


#### Model Selection
Based on the evaluation metrics, choose the model that performs the best.

## Conclusion
After evaluating the performance of different models, we select the model that achieves the highest accuracy and F1-score for sentiment analysis of customer reviews on the Logitech G502 Hero High Performance Gaming Mouse.

### Next Steps
Once the best model is selected, you can use it to analyze the sentiments of new customer reviews and derive meaningful insights.
