# E-commerce Product Review Sentiment Analyzer

### Introduction

The project aims to develop a sentiment analyzer for e-commerce product reviews sourced from Aliexpress, focusing on the Electronics category. The Project involves various aspects of data science, including data acquisition, preprocessing, model development, deployment, and pipeline implementation.

### About Company

AliExpress is a globally renowned e-commerce platform that connects consumers with millions of products at competitive prices.  AliExpress provides a seamless shopping experience, empowering individuals and businesses to discover, purchase, and sell quality products from trusted sellers across the globe.

### Problem Statement

E-commerce platforms grapple with the challenge of analyzing vast amounts of customer feedback to accurately understand product sentiment. Understanding this sentiment is critical for businesses to make informed decisions regarding product improvements, marketing strategies, and customer satisfaction. However, manually analyzing thousands of product reviews is time-consuming and inefficient. Consequently, an automated sentiment analysis solution is required to effectively process and interpret these reviews.

### Problem Objectives

- Develop an accurate and efficient sentiment analysis model for e-commerce product reviews.

- Extract valuable insights from customer reviews to inform product improvement and marketing strategies.

- Enhance customer satisfaction by providing businesses with a deeper understanding of customer sentiment.

### Project Overview

This project aimed to develop a sentiment analysis model to understand customer feedback on e-commerce products. We focused on classifying reviews into positive, and negative sentiment.

`1. Data Acquisition`

We obtained text reviews and star ratings from a database, comprising more than 10,000 rows.

`2. Model Development/Deployment`

We trained multiple machine learning model to accurately predict the sentiment expressed in reviews.

`3. Insights`

We analyzed the model's predictions to identify trends and patterns in customer sentiment.

### Sentiment Analysis Methodology

- Our methodology involved several key steps, including data collection, preprocessing, feature engineering, model training, and evaluation.

  1. Data Acquisition
  2. Data Preprocessing
  3. Feature Engineering
  4. Model Development/Training
  5. Model Evaluation
  6. Model Deployment

### Team Apha

- This presentation explores the application of sentiment analysis to e-commerce product reviews, revealing valuable insights for businesses.

### Team Members
    1. Ifechukwu Akaeze

    2. Daniel Edet Onofiok

    3. Okediran Tope Emmanuel

    4. Dr. Ezeuchu Emmanuel Uzond

    5. Modinat Gbemisola Adesope

    6. Adebayo Olalekan

    7. Khadijat Oludolapo Adebiyi

    8. Chinua Mbajekwe

    9. Chinwe Njoku

    10. Vincent C. Ajaegbu

    11. Ayodele Kehinde Richard

    12. Precious Odinakachi Loveday

    13. Ifeoluwa Adeniyi

# 1. Data Acquisition
 - Import relevant library (pandas)
 - Load Dataset
 - Extract the text reviews (Feedback_translated) and rating columns from the Dataset and Create a new DataFrame
 - Save the new DataFrame to a CSV file for reference

`Import library`

In [3]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

`Load dataset`

In [5]:
ecomm_data = pd.read_csv('Team Alpha dataset - Team Alpha dataset.csv')

In [6]:
ecomm_data.head()

Unnamed: 0,productId,Rating,Date,Feedback_translated,Feedback,Unnamed: 5,Name,Country,Upvotes,Downvotes
0,1005010000000000,100,18-May-24,Very good packaging well protected but not yet...,trÃ¨s bon emballage bien protÃ©gÃ© mais pas en...,,a***r,FR,0,0
1,1005010000000000,60,29-May-24,"lights are extremely bright, we used 1.2v batt...","lights are extremely bright, we used 1.2v batt...",,Amazon Shopper,US,0,0
2,1005010000000000,100,25-May-24,I like it very much for my son. It is as the d...,Me gusto mucho para mi hijo. Es tal cual la de...,,R***S,CL,0,0
3,1005010000000000,100,23-Apr-24,"corresponds to the description, fast delivery,...","corresponds to the description, fast delivery,...",,Amazon Shopper,UA,0,0
4,1005010000000000,100,3-May-24,As described. Good quality. Batteries not incl...,As described. Good quality. Batteries not incl...,,J***h,HU,0,0


In [7]:
# Check for columns data count, null count, data types
ecomm_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2192 entries, 0 to 2191
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   productId            2192 non-null   int64  
 1   Rating               2192 non-null   int64  
 2   Date                 2192 non-null   object 
 3   Feedback_translated  1306 non-null   object 
 4   Feedback             1306 non-null   object 
 5   Unnamed: 5           0 non-null      float64
 6   Name                 2192 non-null   object 
 7   Country              2190 non-null   object 
 8   Upvotes              2192 non-null   int64  
 9   Downvotes            2192 non-null   int64  
dtypes: float64(1), int64(4), object(5)
memory usage: 171.4+ KB


In [8]:
# drop duplicate and irrelevant columns - 'feedback' column translated to english as feedback translated, 'Unamed: 5' column - with all entry as nan, hence its irrelevance
# ecomm_data = ecomm_data.drop(['Feedback', 'Unnamed: 5'], axis = 1)

`Extract the Feedback_translated and rating columns from the Dataset and Create a Nee DataFrame`

In [10]:
extract_ecomm_data = ecomm_data[['Feedback_translated', 'Rating']]

In [11]:
# Display the new DataFrame
extract_ecomm_data

Unnamed: 0,Feedback_translated,Rating
0,Very good packaging well protected but not yet...,100
1,"lights are extremely bright, we used 1.2v batt...",60
2,I like it very much for my son. It is as the d...,100
3,"corresponds to the description, fast delivery,...",100
4,As described. Good quality. Batteries not incl...,100
...,...,...
2187,Very good,100
2188,,100
2189,"looks fine, not tried yet!.......................",100
2190,,100


`Save the new DataFrame to a CSV file`

In [13]:
# Save as csv file (file name - 'extracted_reviews.csv')
extract_ecomm_data.to_csv('extracted_reviews.csv', index = False)

# 2. Data Preprocessing
- Check/Treat missing data
- Create a Sentiment column based on Rating column and Encode using LabelEncoder (to convert Sentiment column into numerical values for model development)
- Text Cleaning
- Tokenization

`Check/Treat missing data`
- Check for count of rows in column with missing values
- Display rows where the column has missing values (NaN)
- Determine criteria to treat/replace NaN, checking if all the ratings equal 100 or not
- Replace NaN with the determined criteria
- Apply and confirm replacement of missing value

In [16]:
extract_ecomm_data

Unnamed: 0,Feedback_translated,Rating
0,Very good packaging well protected but not yet...,100
1,"lights are extremely bright, we used 1.2v batt...",60
2,I like it very much for my son. It is as the d...,100
3,"corresponds to the description, fast delivery,...",100
4,As described. Good quality. Batteries not incl...,100
...,...,...
2187,Very good,100
2188,,100
2189,"looks fine, not tried yet!.......................",100
2190,,100


In [17]:
# Check for count of rows in column with missing values
extract_ecomm_data.isnull().sum()

Feedback_translated    886
Rating                   0
dtype: int64

In [18]:
# Display rows where 'feedback_translated' column has missing values (NaN)
missing_feedback_translated = extract_ecomm_data[extract_ecomm_data['Feedback_translated'].isnull()]
missing_feedback_translated

Unnamed: 0,Feedback_translated,Rating
29,,100
30,,100
31,,100
32,,100
33,,100
...,...,...
2185,,100
2186,,100
2188,,100
2190,,100


In [19]:
# Determine criteria to treat/replace NaN 
# Let's check if all the ratings for the rows where the feedback_translated column is NaN have the value 100 to help determine how to treat the NaN
all_missing_ratings_are_100 = missing_feedback_translated['Rating'].eq(100).all()

print(f"All ratings for missing feedback are 100: {all_missing_ratings_are_100}")

All ratings for missing feedback are 100: False


In [20]:
# Given that not all rows in the rating column are equal to 100, we will use"No Feedback" to replace the NaN in the feedback_translated column
# In order not to make assumptions that might skew the sentiment interpretation and to maintain the integrity and accuracy of the sentiment analysis

In [21]:
# Replace NaN in 'feedback_translated' column with "No Feedback" using .loc[] to modify the Extracted DataFrame correctly.
extract_ecomm_data.loc[:, 'Feedback_translated'] = extract_ecomm_data['Feedback_translated'].fillna('No Feedback')

In [22]:
# Apply and confirm replacement of missing value
extract_ecomm_data

Unnamed: 0,Feedback_translated,Rating
0,Very good packaging well protected but not yet...,100
1,"lights are extremely bright, we used 1.2v batt...",60
2,I like it very much for my son. It is as the d...,100
3,"corresponds to the description, fast delivery,...",100
4,As described. Good quality. Batteries not incl...,100
...,...,...
2187,Very good,100
2188,No Feedback,100
2189,"looks fine, not tried yet!.......................",100
2190,No Feedback,100


`Create a Sentiment column based on Ratings column`

In [24]:
# Get the minimum and maximum ratings
min_rating = extract_ecomm_data['Rating'].min()
max_rating = extract_ecomm_data['Rating'].max()

print(f'Min Rating: {min_rating}')
print(f'Max Rating: {max_rating}')

Min Rating: 20
Max Rating: 100


In [25]:
# Rating Categories:
# Positive Sentiment: Ratings from 75 to 100
# Negative Sentiment: Ratings from 0 to 74

In [26]:
# Function to categorize ratings into sentiments based on specified ranges
def categorize_sentiment(Rating):
    if Rating >= 75:
        return 'Positive'
    else:
        return 'Negative'

# Create the 'sentiment' column based on the 'rating' column
extract_ecomm_data.loc[:, 'Sentiment'] = extract_ecomm_data['Rating'].apply(categorize_sentiment)

# Display the DataFrame with the new 'sentiment' column
extract_ecomm_data

Unnamed: 0,Feedback_translated,Rating,Sentiment
0,Very good packaging well protected but not yet...,100,Positive
1,"lights are extremely bright, we used 1.2v batt...",60,Negative
2,I like it very much for my son. It is as the d...,100,Positive
3,"corresponds to the description, fast delivery,...",100,Positive
4,As described. Good quality. Batteries not incl...,100,Positive
...,...,...,...
2187,Very good,100,Positive
2188,No Feedback,100,Positive
2189,"looks fine, not tried yet!.......................",100,Positive
2190,No Feedback,100,Positive


In [27]:
# Encode using LabelEncoder (to convert Sentiment column into numerical values for model development)
from sklearn.preprocessing import LabelEncoder

# Encode Sentiment using labelEncoder (binary encoding for 'Positive'/'Negative')
encoder = LabelEncoder()

# Fit and transform the 'Sentiment' column
extract_ecomm_data.loc[:, 'Sentiment_Encoded'] = encoder.fit_transform(extract_ecomm_data['Sentiment'])

# Check the distribution of sentiments
sentiment_distribution = extract_ecomm_data['Sentiment_Encoded'].value_counts()

print(sentiment_distribution)

# Display the final DataFrame with encoded sentiment
extract_ecomm_data

Sentiment_Encoded
1    2040
0     152
Name: count, dtype: int64


Unnamed: 0,Feedback_translated,Rating,Sentiment,Sentiment_Encoded
0,Very good packaging well protected but not yet...,100,Positive,1
1,"lights are extremely bright, we used 1.2v batt...",60,Negative,0
2,I like it very much for my son. It is as the d...,100,Positive,1
3,"corresponds to the description, fast delivery,...",100,Positive,1
4,As described. Good quality. Batteries not incl...,100,Positive,1
...,...,...,...,...
2187,Very good,100,Positive,1
2188,No Feedback,100,Positive,1
2189,"looks fine, not tried yet!.......................",100,Positive,1
2190,No Feedback,100,Positive,1


*The Sentiment Distribution indicates that most customer feedback is positive, with 2040 positive sentiments compared to only 152 negative sentiments. This suggests high customer satisfaction with the products, while also highlighting a minimal level of negative sentiment.*

`Text Cleaning: *Remove noise, special characters, and irrelevant information*`
- Import relevant libraries and download stopwords from NLTK
- Define the stop words
- Define Function to clean the text
- Apply the cleaning function to the 'Feedback_translated' column
- View the cleaned text

In [30]:
# import relevant libraries and download stopwords from NLTK

import re                                  # re - Regular Expression, useful for text cleaning (e.g., removing special characters).
import nltk                                # nltk - Natural Language Toolkit, a library used for natural language processing tasks like tokenization and stopword removal.
from nltk.corpus import stopwords
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\DELL\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [31]:
# Define the stop words
stop_words = set(stopwords.words('english'))

In [32]:
# Define Function to clean the text
def clean_text(text):
    # Remove special characters and numbers, keeping only letters and spaces
    text = re.sub(r'[^a-zA-Z\s]', '', text)                                              # This regex matches anything that is not a letter or space and replaces it with an empty string
    # Convert text to lowercase to ensure uniformity
    text = text.lower()                                                                  # This helps in reducing the complexity of the analysis by treating 'Word' and 'word' as the same
    # Remove extra spaces between words
    text = re.sub(r'\s+', ' ', text).strip()                                             # This replaces multiple spaces with a single space and trims leading/trailing spaces
    # Remove stop words (common words that may not add significant meaning to the text)
    text = ' '.join([word for word in text.split() if word not in stop_words])           # This creates a list of words, excluding stop words, and joins them back into a string
    return text                                                                          # Return the cleaned text

In [33]:
# Apply the cleaning function to the 'Feedback_translated' column
extract_ecomm_data.loc[:, 'Cleaned_Feedback_translated'] = extract_ecomm_data['Feedback_translated'].apply(clean_text)

In [34]:
# View the cleaned text
extract_ecomm_data['Cleaned_Feedback_translated'].head(5)

0            good packaging well protected yet mounted
1    lights extremely bright used v battery instead...
2        like much son description put batteries thank
3    corresponds description fast delivery well pac...
4    described good quality batteries included fast...
Name: Cleaned_Feedback_translated, dtype: object

`Tokenization: *Split text into tokens*`
- Import relevant libraries and download punkt from NLTK
- Do a sample text to test the tokenization
- Perform word tokenization
- Tokenize each feedback_translated using NLTK's word_tokenize

In [36]:
# Import relevant libraries and download punkt from NLTK

from nltk.tokenize import word_tokenize
nltk.download('punkt')                    # Download the 'punkt' resource, which is necessary for tokenizing text, especially for splitting sentences into words

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\DELL\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [37]:
# Do a sample text to test the tokenization - Taking the first feedback as an example
sample_text = extract_ecomm_data['Cleaned_Feedback_translated'].iloc[0]
sample_text

'good packaging well protected yet mounted'

In [38]:
# Perform word tokenization
tokens = word_tokenize(sample_text)
tokens

['good', 'packaging', 'well', 'protected', 'yet', 'mounted']

In [39]:
# Tokenize each feedback_translated using NLTK's word_tokenize
extract_ecomm_data.loc[:, 'Feedback_tokens'] = extract_ecomm_data['Cleaned_Feedback_translated'].apply(word_tokenize)

In [40]:
cleaned_ecomm_review = extract_ecomm_data
cleaned_ecomm_review

Unnamed: 0,Feedback_translated,Rating,Sentiment,Sentiment_Encoded,Cleaned_Feedback_translated,Feedback_tokens
0,Very good packaging well protected but not yet...,100,Positive,1,good packaging well protected yet mounted,"[good, packaging, well, protected, yet, mounted]"
1,"lights are extremely bright, we used 1.2v batt...",60,Negative,0,lights extremely bright used v battery instead...,"[lights, extremely, bright, used, v, battery, ..."
2,I like it very much for my son. It is as the d...,100,Positive,1,like much son description put batteries thank,"[like, much, son, description, put, batteries,..."
3,"corresponds to the description, fast delivery,...",100,Positive,1,corresponds description fast delivery well pac...,"[corresponds, description, fast, delivery, wel..."
4,As described. Good quality. Batteries not incl...,100,Positive,1,described good quality batteries included fast...,"[described, good, quality, batteries, included..."
...,...,...,...,...,...,...
2187,Very good,100,Positive,1,good,[good]
2188,No Feedback,100,Positive,1,feedback,[feedback]
2189,"looks fine, not tried yet!.......................",100,Positive,1,looks fine tried yet,"[looks, fine, tried, yet]"
2190,No Feedback,100,Positive,1,feedback,[feedback]


# 3. Feature Engineering
- Bag of Words Vectors (BoW)
- TF-IDF - Term Frequency-Inverse Document Frequency
- Data Splitting

`Bag of Words Vectors (BoW): Convert text into a matrix of token counts using CountVectorizer from sklearn.`
- Import the relevant libraries
- Initialize the CountVectorizer (BoW)
- Apply BoW to the 'Cleaned_Feedback_translated' column
- Check the shape of the BoW feature matrix

In [43]:
# import the relevant libraries
from sklearn.feature_extraction.text import CountVectorizer

In [44]:
# Initialize the CountVectorizer (BoW)
bow_vectorizer = CountVectorizer()

In [45]:
# Apply BoW to the 'Cleaned_Feedback_translated' column
bow_matrix = bow_vectorizer.fit_transform(cleaned_ecomm_review['Cleaned_Feedback_translated'])

In [46]:
# Check the shape of the BoW feature matrix
bow_matrix.shape                               # Prints the dimensions of the transformed BoW matrix

(2192, 2655)

`TF-IDF - Term Frequency-Inverse Document Frequency: Use TfidfVectorizer to account for word frequency while downweighting common words that appear in many feedback reviews.`
- Import the relevant libraries
- Initialize the TfidfVectorizer
- Apply TF-IDF to the 'Cleaned_Feedback_translated' column
- Check the shape of the TF-IDF feature matrix

In [48]:
# Import the relevant libraries
from sklearn.feature_extraction.text import TfidfVectorizer

In [49]:
# Initialize the TfidfVectorizer
tfidf_vectorizer = TfidfVectorizer()

In [50]:
# Apply TF-IDF to the 'Cleaned_Feedback_translated' column
tfidf_matrix = tfidf_vectorizer.fit_transform(cleaned_ecomm_review['Cleaned_Feedback_translated'])

In [51]:
# Check the shape of the TF-IDF feature matrix
tfidf_matrix.shape                                # Prints the dimensions of the TF-IDF matrix

(2192, 2655)

`Data Splitting`
- Split the dataset into training and test sets (Train-Test Split) 

In [53]:
# Split the dataset into training and test sets
from sklearn.model_selection import train_test_split

# BoW Train-Test Split
X_train_bow,X_test_bow,y_train,y_test = train_test_split(bow_matrix, cleaned_ecomm_review['Sentiment_Encoded'], 
                                                                 test_size = 0.2, random_state = 42, stratify = cleaned_ecomm_review['Sentiment_Encoded'])    # x = bow_matrix, y = Sentiment_Encoded (Target column)
                                                                                                                                                              # stratify argument to maintain the class balance
# TF-IDF Train-Test Split
X_train_tfidf,X_test_tfidf,y_train,y_test = train_test_split(tfidf_matrix, cleaned_ecomm_review['Sentiment_Encoded'], 
                                                                 test_size = 0.2, random_state = 42, stratify = cleaned_ecomm_review['Sentiment_Encoded'])

# 4. Model Development
- Pre-trained VADER model
- Custom Models

`Pre-trained VADER Model` - `VADER: Valence Aware Dictionary and sEntiment Reasoner is a pre-trained sentiment analysis tool designed to analyze text for sentiment polarity and intensity, particularly effective for social media and short texts. It uses a lexicon of sentiment related words and rules to score and classify text as positive, negative, or neutral.`
- Import relevant Libraries and download 'vader_lexicon' from NLTK
- Initialize the VADER sentiment analyzer
- Create a Function to calculate sentiment score using VADER
- Apply VADER sentiment analysis to the Cleaned_Feedback_translated column

In [56]:
# Import relevant Libraries and download 'vader_lexicon' from NLTK
from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\DELL\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [57]:
# Initialize the VADER sentiment analyzer
vader_analyzer = SentimentIntensityAnalyzer()

In [58]:
# Create a Function to calculate sentiment score using VADER
def vader_sentiment(text):
    sentiment_score = vader_analyzer.polarity_scores(text)
     
    # Adjust thresholds for sentiment classification
    threshold_positive = 0.01
    threshold_negative = -0.01
    
    # Classify the sentiment as positive or negative based on the compound score
    if sentiment_score['compound'] >= threshold_positive:
        return 'Positive'
    else:
        return 'Negative'

In [59]:
# Apply VADER sentiment analysis to the Cleaned_Feedback_translated column
cleaned_ecomm_review.loc[:, 'vader_sentiment'] = cleaned_ecomm_review['Cleaned_Feedback_translated'].apply(vader_sentiment)

In [60]:
cleaned_ecomm_review

Unnamed: 0,Feedback_translated,Rating,Sentiment,Sentiment_Encoded,Cleaned_Feedback_translated,Feedback_tokens,vader_sentiment
0,Very good packaging well protected but not yet...,100,Positive,1,good packaging well protected yet mounted,"[good, packaging, well, protected, yet, mounted]",Positive
1,"lights are extremely bright, we used 1.2v batt...",60,Negative,0,lights extremely bright used v battery instead...,"[lights, extremely, bright, used, v, battery, ...",Positive
2,I like it very much for my son. It is as the d...,100,Positive,1,like much son description put batteries thank,"[like, much, son, description, put, batteries,...",Positive
3,"corresponds to the description, fast delivery,...",100,Positive,1,corresponds description fast delivery well pac...,"[corresponds, description, fast, delivery, wel...",Positive
4,As described. Good quality. Batteries not incl...,100,Positive,1,described good quality batteries included fast...,"[described, good, quality, batteries, included...",Positive
...,...,...,...,...,...,...,...
2187,Very good,100,Positive,1,good,[good],Positive
2188,No Feedback,100,Positive,1,feedback,[feedback],Negative
2189,"looks fine, not tried yet!.......................",100,Positive,1,looks fine tried yet,"[looks, fine, tried, yet]",Positive
2190,No Feedback,100,Positive,1,feedback,[feedback],Negative


`Custom Models`:
`Naive Bayes Model` - preferred for sentiment analysis due to its simplicity, speed, and effectiveness in handling high-dimensional text data, like BoW or TF-IDF features.
- Import relevant Libraries
- Initialize and Train the Naive Bayes model (BoW and TF-IDF)
- Make predictions on the test set (BoW and TF-IDF)
- Transform the cleaned feedback translated into numerical feature representations: Bag of Words format using fitted CountVectorizer and TF-IDF format using fitted TfidfVectorizer.
- Predict sentiments using the Naive Bayes model for both Bag of Words and TF-IDF features, and update the Cleaned_ecomm_review DataFrame with the new sentiment predictions in 'BoW' and 'TF-IDF' columns. 

In [62]:
# Import relevant Libraries
from sklearn.naive_bayes import MultinomialNB

In [63]:
# Initialize and Train the Naive Bayes model on BoW
nb_bow = MultinomialNB()
nb_bow.fit(X_train_bow, y_train)

# Make predictions on the test set (BoW)
y_pred_bow = nb_bow.predict(X_test_bow)

In [64]:
# Initialize and Train the Naive Bayes model on TF-IDF
nb_tfidf = MultinomialNB()
nb_tfidf.fit(X_train_tfidf, y_train)

# Make predictions on the test set (TF-IDF)
y_pred_tfidf = nb_tfidf.predict(X_test_tfidf)

In [65]:
# Transform the cleaned feedback translated into numerical feature representations: Bag of Words format using fitted CountVectorizer and TF-IDF format using fitted TfidfVectorizer.
# bow_test = bow_vectorizer.transform(cleaned_ecomm_review['Cleaned_Feedback_translated'])
# tfidf_test = tfidf_vectorizer.transform(cleaned_ecomm_review['Cleaned_Feedback_translated'])

This code transforms the cleaned feedback translated into numerical feature representations using Bag of Words and TF-IDF formats, enabling the machine learning models to analyze and predict sentiments effectively.

In [67]:
# Predict sentiments using the Naive Bayes model for both Bag of Words and TF-IDF features, and update the DataFrame with the new sentiment predictions in 'BoW' and 'TF-IDF' columns. 
# cleaned_ecomm_review.loc[:, 'BoW'] = nb_bow.predict(bow_test)
# cleaned_ecomm_review.loc[:, 'TF-IDF'] = nb_tfidf.predict(tfidf_test)
# cleaned_ecomm_review

# 5. Model Evaluation:
- Assess model performance using accuracy and F1 score metrics.
- Optionally, conduct hyperparameter tuning for improved performance.

#### Assess model performance using accuracy and F1 score metrics.
- Import necessary libraries for evaluation
- Evaluate VADER Model Prediction
- Evaluate Naive Bayes (BoW) Model
- Evaluate Naive Bayes (TF-IDF) Model
- Create a DataFrame to consolidate the evaluation metrics (accuracy and F1 score) for the three models: (VADER, Naive Bayes with BoW, and Naive Bayes with TF-IDF).

In [70]:
# Import necessary libraries for evaluation
from sklearn.metrics import accuracy_score, f1_score, classification_report

In [71]:
# Evaluate VADER Model Prediction
print("Evaluation for Vader:")
print("VADER Classification Report:")
vader_accuracy = (accuracy_score(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'])) * 100              # Calculate accuracy - measures how many predictions were correct.
vader_f1 = (f1_score(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'], average='weighted')) * 100      # Calculate F1 score - a weighted average of precision and recall, useful in imbalanced datasets.
vader_classification_report = classification_report(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'])
print(vader_classification_report)

# Evaluate Naive Bayes (BoW) Model
print("Evaluation for Naive Bayes (BoW):")
print("Classification Report (BoW):")
print(classification_report(y_test, y_pred_bow))
bow_accuracy = (accuracy_score(y_test, y_pred_bow)) * 100 
bow_f1 = (f1_score(y_test, y_pred_bow, average='weighted')) * 100

# Evaluate Naive Bayes (TF-IDF) Model
print("Evaluation for Naive Bayes (TF-IDF):")
print("Classification Report (TF-IDF):")
print(classification_report(y_test, y_pred_tfidf))
tfidf_accuracy = (accuracy_score(y_test, y_pred_tfidf)) * 100  # Calculate accuracy
tfidf_f1 = (f1_score(y_test, y_pred_tfidf, average='weighted')) * 100  # Calculate F1 score

# Create a DataFrame to consolidate the evaluation metrics (accuracy and F1 score) for the three models: (VADER, Naive Bayes with BoW, and Naive Bayes with TF-IDF).
# Create a dictionary with the model names and their corresponding accuracy and F1 scores
evaluation_metrics = {
    'Model': ['VADER', 'Naive Bayes (BoW)', 'Naive Bayes (TF-IDF)'],
    'Accuracy': [vader_accuracy, bow_accuracy, tfidf_accuracy],
    'F1 Score': [vader_f1, bow_f1, tfidf_f1]
}

# Convert the dictionary into a DataFrame
evaluation_data = pd.DataFrame(evaluation_metrics)

# Define a function to format the numbers as percentages
def format_percentage(value):
    return f'{value:.2f}%'

# Use apply with the custom function for both columns
evaluation_data['Accuracy'] = evaluation_data['Accuracy'].apply(format_percentage)
evaluation_data['F1 Score'] = evaluation_data['F1 Score'].apply(format_percentage)

# print output
evaluation_data

Evaluation for Vader:
VADER Classification Report:
              precision    recall  f1-score   support

    Negative       0.08      0.59      0.14       152
    Positive       0.94      0.50      0.65      2040

    accuracy                           0.50      2192
   macro avg       0.51      0.54      0.40      2192
weighted avg       0.88      0.50      0.61      2192

Evaluation for Naive Bayes (BoW):
Classification Report (BoW):
              precision    recall  f1-score   support

           0       0.53      0.30      0.38        30
           1       0.95      0.98      0.97       409

    accuracy                           0.93       439
   macro avg       0.74      0.64      0.67       439
weighted avg       0.92      0.93      0.93       439

Evaluation for Naive Bayes (TF-IDF):
Classification Report (TF-IDF):
              precision    recall  f1-score   support

           0       0.00      0.00      0.00        30
           1       0.93      1.00      0.96       409


Unnamed: 0,Model,Accuracy,F1 Score
0,VADER,50.27%,61.48%
1,Naive Bayes (BoW),93.39%,92.53%
2,Naive Bayes (TF-IDF),93.17%,89.87%


#### Summary of Model Evaluations
`The evaluations of the VADER sentiment analysis model and two Naive Bayes models (Bag of Words and TF-IDF) highlight their performance in predicting Negative and Positive sentiments:`

VADER Performance:
- Negative: Precision: 8%, Recall: 59%, F1-Score: 14%.
- Positive: Precision: 94%, Recall: 50%, F1-Score: 65%.
- Overall Accuracy: 50%, Macro F1: 40%, Weighted F1: 62%.
- Conclusion: The VADER model demonstrates strong performance in identifying positive sentiments but struggles significantly with accurately classifying negative sentiments.
---
Naive Bayes (Bag of Words) Performance:
- Negative: Precision: 53%, Recall: 30%, F1-Score: 38%.
- Positive: Precision: 95%, Recall: 98%, F1-Score: 97%.
- Overall Accuracy: 93%, Macro F1: 67%, Weighted F1: 93%.
- Conclusion: The Bag of Words approach shows high accuracy driven by strong performance in identifying positive sentiments, although it exhibits weaknesses in detecting negative sentiments.
---
Naive Bayes (TF-IDF) Performance:
- Negative: Precision: 0%, Recall: 0%, F1-Score: 0%.
- Positive: Precision: 93%, Recall: 100%, F1-Score: 96%.
- Overall Accuracy: 93%, Macro F1: 48%, Weighted F1: 90%.
- Conclusion: The TF-IDF model excels in predicting positive sentiments but completely fails to identify negative sentiments, leading to an overall low performance in that regard.
---
`Given that all models performed well in predicting Positive Sentiment but struggled with predicting Negative Sentiment due to the class imbalance of the data, we will train other models such as:`
- Logistics Regression
- Xgboost
- Optionally, conduct hyperparameter tuning for improved performance.

In [73]:
# Evaluate VADER Model Prediction
#print("Evaluation for Vader:")
#vader_accuracy = (accuracy_score(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'])) * 100              # Calculate accuracy
#vader_f1 = (f1_score(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'], average='weighted')) * 100      # Calculate F1 score
#vader_classification_report = classification_report(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'])  

#print("VADER Classification Report:")
#print(vader_classification_report)
#print(f"VADER Accuracy: {vader_accuracy:.2f}%")
#print(f"VADER F1 Score: {vader_f1:.2f}%")

In [74]:
# Evaluate Naive Bayes (BoW) Model
#print("Evaluation for Naive Bayes (BoW):")
#bow_accuracy = (accuracy_score(y_test, y_pred_bow)) * 100  # Calculate accuracy
#bow_f1 = (f1_score(y_test, y_pred_bow, average='weighted')) * 100  # Calculate F1 score

#print("Classification Report (BoW):")
#print(classification_report(y_test, y_pred_bow))
#print(f"Accuracy (BoW): {bow_accuracy:.2f}%")
#print(f"F1 Score (BoW): {bow_f1:.2f}%")

In [75]:
# Evaluate Naive Bayes (TF-IDF) Model
#print("Evaluation for Naive Bayes (TF-IDF):")
#tfidf_accuracy = (accuracy_score(y_test, y_pred_tfidf)) * 100  # Calculate accuracy
#tfidf_f1 = (f1_score(y_test, y_pred_tfidf, average='weighted')) * 100  # Calculate F1 score

#print("Classification Report (TF-IDF):")
#print(classification_report(y_test, y_pred_tfidf))
#print(f"Accuracy (TF-IDF): {tfidf_accuracy:.2f}%")
#print(f"F1 Score (TF-IDF): {tfidf_f1:.2f}%")

#### Logistics Regression Model.
- Import necessary libraries
- Train the model and make predictions on BoW and TF-IDF
#### XGBoost Model.
- Import necessary libraries
- Train the model and make predictions on BoW and TF-IDF
#### Evaluate All Models.

In [77]:
# Import necessary libraries
from sklearn.linear_model import LogisticRegression
from sklearn.utils import class_weight
import xgboost as xgb

In [78]:
# Initialize, Train, and Predict the Logistic Regression model on BoW
log_bow = LogisticRegression(class_weight='balanced')                    # Intialize
log_bow.fit(X_train_bow, y_train)                                        # Train
log_y_pred_bow = log_bow.predict(X_test_bow)                             # Predict

In [79]:
# Initialize, Train, and Predict the Logistic Regression model on TF-IDF
log_tfidf = LogisticRegression(class_weight='balanced')                  # Initialize
log_tfidf.fit(X_train_tfidf, y_train)                                    # Train
log_y_pred_tfidf = log_tfidf.predict(X_test_tfidf)                       # Predict

In [80]:
# Calculate scale_pos_weight and Initialize for XGBoost Model
positive_count = 409
negative_count = 30
scale_pos_weight = negative_count / positive_count

# Initialize for XGBoost Model
xgb_bow = xgb.XGBClassifier(scale_pos_weight=scale_pos_weight)
xgb_tfidf = xgb.XGBClassifier(scale_pos_weight=scale_pos_weight)

In [81]:
# Train and Predict the XGBoost model on BoW
xgb_bow.fit(X_train_bow, y_train)                                       # Train
xgb_y_pred_bow = xgb_bow.predict(X_test_bow)                            # Predict

In [82]:
# Train and Predict the XGBoost model on TF-IDF
xgb_tfidf.fit(X_train_tfidf, y_train)                                   # Train
xgb_y_pred_tfidf = xgb_tfidf.predict(X_test_tfidf)                      # Predict

In [83]:
# Evaluate the models
# VADER Model Prediction
print("Evaluation for Vader:")
print("VADER Classification Report:")
print(vader_classification_report)
vader_accuracy = (accuracy_score(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'])) * 100             # Calculate accuracy - measures how many predictions were correct.
vader_f1 = (f1_score(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'], average='weighted')) * 100     # Calculate F1 score - a weighted average of precision and recall, useful in imbalanced datasets.
vader_classification_report = classification_report(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'])

# Naive Bayes (BoW) Model
print("Evaluation for Naive Bayes (BoW):")
print("Classification Report (BoW):")
print(classification_report(y_test, y_pred_bow))
bow_accuracy = (accuracy_score(y_test, y_pred_bow)) * 100 
bow_f1 = (f1_score(y_test, y_pred_bow, average='weighted')) * 100

# Naive Bayes (TF-IDF) Model
print("Evaluation for Naive Bayes (TF-IDF):")
print("Classification Report (TF-IDF):")
print(classification_report(y_test, y_pred_tfidf))
tfidf_accuracy = (accuracy_score(y_test, y_pred_tfidf)) * 100  # Calculate accuracy
tfidf_f1 = (f1_score(y_test, y_pred_tfidf, average='weighted')) * 100  # Calculate F1 score

# Logistic Regression (BoW) Model
print("Evaluation for Logistic Regression (BoW):")
print("Classification Report (BoW):")
print(classification_report(y_test, log_y_pred_bow))
log_bow_accuracy = (accuracy_score(y_test, log_y_pred_bow)) * 100 
log_bow_f1 = (f1_score(y_test, log_y_pred_bow, average='weighted')) * 100

# Logistic Regression (TF-IDF) Model
print("Evaluation for Logistic Regression (TF-IDF):")
print("Classification Report (TF-IDF):")
print(classification_report(y_test, log_y_pred_tfidf))
log_tfidf_accuracy = (accuracy_score(y_test, log_y_pred_tfidf)) * 100  # Calculate accuracy
log_tfidf_f1 = (f1_score(y_test, log_y_pred_tfidf, average='weighted')) * 100  # Calculate F1 score

# XGBoost (BoW) Model
print("Evaluation for XGBoost (BoW):")
print("Classification Report (BoW):")
print(classification_report(y_test, xgb_y_pred_bow))
xgb_bow_accuracy = (accuracy_score(y_test, xgb_y_pred_bow)) * 100 
xgb_bow_f1 = (f1_score(y_test, xgb_y_pred_bow, average='weighted')) * 100

# XGBoost (TF-IDF) Model
print("Evaluation for XGBoost (TF-IDF):")
print("Classification Report (TF-IDF):")
print(classification_report(y_test, xgb_y_pred_tfidf))
xgb_tfidf_accuracy = (accuracy_score(y_test, xgb_y_pred_tfidf)) * 100  # Calculate accuracy
xgb_tfidf_f1 = (f1_score(y_test, xgb_y_pred_tfidf, average='weighted')) * 100  # Calculate F1 score

# Create a DataFrame to consolidate the evaluation metrics (accuracy and F1 score) for the three models: (VADER, Naive Bayes with BoW, and Naive Bayes with TF-IDF).
# Create a dictionary with the model names and their corresponding accuracy and F1 scores
evaluation_metrics1 = {
    'Model': ['VADER', 'Naive Bayes (BoW)', 'Naive Bayes (TF-IDF)', 'Logistic Regression (BoW)', 'Logistic Regression (TF-IDF)', 'XGBoost (BoW)', 'XGBoost (TF-IDF)'],
    'Accuracy': [vader_accuracy, bow_accuracy, tfidf_accuracy, log_bow_accuracy, log_tfidf_accuracy, xgb_bow_accuracy, xgb_tfidf_accuracy],
    'F1 Score': [vader_f1, bow_f1, tfidf_f1, log_bow_f1, log_tfidf_f1, xgb_bow_f1, xgb_tfidf_f1]
}

# Convert the dictionary into a DataFrame
evaluation_data1 = pd.DataFrame(evaluation_metrics1)

# Define a function to format the numbers as percentages
def format_percentage(value):
    return f'{value:.2f}%'

# Use apply with the custom function for both columns
evaluation_data1['Accuracy'] = evaluation_data1['Accuracy'].apply(format_percentage)
evaluation_data1['F1 Score'] = evaluation_data1['F1 Score'].apply(format_percentage)

# print output
evaluation_data1

Evaluation for Vader:
VADER Classification Report:
              precision    recall  f1-score   support

    Negative       0.08      0.59      0.14       152
    Positive       0.94      0.50      0.65      2040

    accuracy                           0.50      2192
   macro avg       0.51      0.54      0.40      2192
weighted avg       0.88      0.50      0.61      2192

Evaluation for Naive Bayes (BoW):
Classification Report (BoW):
              precision    recall  f1-score   support

           0       0.53      0.30      0.38        30
           1       0.95      0.98      0.97       409

    accuracy                           0.93       439
   macro avg       0.74      0.64      0.67       439
weighted avg       0.92      0.93      0.93       439

Evaluation for Naive Bayes (TF-IDF):
Classification Report (TF-IDF):
              precision    recall  f1-score   support

           0       0.00      0.00      0.00        30
           1       0.93      1.00      0.96       409


Unnamed: 0,Model,Accuracy,F1 Score
0,VADER,50.27%,61.48%
1,Naive Bayes (BoW),93.39%,92.53%
2,Naive Bayes (TF-IDF),93.17%,89.87%
3,Logistic Regression (BoW),92.94%,92.99%
4,Logistic Regression (TF-IDF),91.80%,92.58%
5,XGBoost (BoW),82.46%,86.17%
6,XGBoost (TF-IDF),83.14%,86.64%


#### Summary of All Models Evaluation:
1. `VADER` struggles heavily with negative sentiment classification. It has a very low F1-score (14%) for negative sentiment, although it performs decently with positive sentiment (65%). This results in a 50% accuracy, showing it is not reliable for balanced sentiment analysis.

2. `Naive Bayes (BoW)` performs well for positive sentiment with an F1-score of 97%, but struggles with negative sentiment detection, reflected in its lower recall (30%) and F1-score (38%) for negatives. Despite this, it achieves a strong overall accuracy of 93%.

3. `Naive Bayes (TF-IDF)` has an alarming weakness with negative sentiment detection, with 0% precision, recall, and F1-score for negatives, indicating it cannot identify any negative sentiments. Its strong performance in positive sentiment detection, however, leads to a 93% accuracy.

4. `Logistic Regression (BoW)` offers a balanced performance across both sentiment classes. It achieves a respectable F1-score of 49% for negatives and 96% for positives, making it more reliable overall with 93% accuracy. This indicates it is effective for balanced sentiment classification.

5. `Logistic Regression (TF-IDF)` shows similar balance to the BoW version, with improved recall for negative sentiment (67%) and still strong positive sentiment detection. It provides a robust 92% accuracy and good F1-scores across the board.

6. `XGBoost (BoW)` performs very well for positive sentiment detection (98% precision), but its negative sentiment detection suffers, with only 24% precision and a lower F1-score (36%). This leads to a lower overall accuracy of 82%, making it less reliable for balanced sentiment tasks.

7. `XGBoost (TF-IDF)`, like the BoW version, shows strong performance in detecting positive sentiment, but struggles with negatives. Its 37% F1-score for negative sentiment is a slight improvement, but the accuracy of 83% still reflects its imbalance in sentiment classification.

`Overall Insight:`
Logistic Regression, particularly with the Bag of Words (BoW) and TF-IDF features, offers the most balanced performance for both positive and negative sentiment classification. Naive Bayes shows strength in positive sentiment detection, but is significantly weaker for negative sentiment, while XGBoost models, though effective with positive sentiment, are less capable with negative classification.

#### Next Steps.
1. The need to conduct hyperparameter tuning on logistic regression (BoW/TF-IDF) and xgboost (BoW/TF-IDF).
2. Choose the better model after hyperparameter tuning;
- logistic regression (BoW) or logistic regression (TF-IDF)
- xgboost (BoW) or xgboost (TF-IDF)
3. Model Deployment

# Hyperparameter Tuning
To conduct hyperparameter tuning on Logistic Regression (BoW/TF-IDF) and XGBoost (BoW/TF-IDF) models, we'll use GridSearchCV, which helps find the optimal combination of hyperparameters for each model.
- Import the relevant libraries
---
#### Step-by-Step Plan for Hyperparameter Tuning:
`Logistic Regression (BoW and TF-IDF):`
- Hyperparameters to Tune:
  - C: Inverse regularization strength (try different values, e.g., [0.01, 0.1, 1, 10, 100])
  - penalty: Regularization type (e.g., ['l1', 'l2'])
  - solver: Optimization algorithm (e.g., ['liblinear', 'saga'])
  - max_iter: Maximum Iteration (e.g., [100, 200, 300])
---
`XGBoost (BoW and TF-IDF):`
- Hyperparameters to Tune:
  - n_estimators: Number of boosting rounds (try different values, e.g., [50, 100, 150, 200])
  - max_depth: Maximum depth of a tree (try different values, e.g., [3, 5, 7])
  - learning_rate: Step size shrinkage (e.g., [0.01, 0.1, 0.2])
  - subsample: Fraction of samples used for training (e.g., [0.6, 0.8, 1.0])
  - colsample_bytree: Fraction of features used for each tree (e.g., [0.6, 0.8, 1.0])
---
The hyperparameter tuning numbers were selected based on common practices for Logistic Regression and XGBoost, serving as standard starting points for balancing complexity, regularization, and learning behavior. While these values are widely used, they are customizable for different datasets.

In [87]:
# Import Relevant libraries
from sklearn.model_selection import GridSearchCV

In [205]:
# Define parameter grids
# Define simplified parameter grids to reduce the number of combinations and speed up GridSearchCV
log_reg_params = {
    'C': [0.01, 0.1, 1, 10, 100],  # Regularization strength for Logistic Regression; smaller values represent stronger regularization
    'penalty': ['l1', 'l2'],  # Regularization types, 'l1' is Lasso (sparse solutions), 'l2' is Ridge (more regularized solutions)
    'solver': ['liblinear', 'saga'],  # Optimization solvers; 'liblinear' for small datasets, 'saga' for larger, more complex problems
    'max_iter': [100, 200, 300]  # Maximum number of iterations for the solver to converge; higher values ensure convergence for complex models
}

xgb_params = {
    'n_estimators': [50, 100],  # Number of trees (boosting rounds) in XGBoost                              
    'max_depth': [3, 5],  # Maximum depth of trees to control complexity and prevent overfitting                           
    'learning_rate': [0.1],  # Step size shrinkage to make the boosting process more conservative
    'subsample': [0.8],  # Fraction of samples used for training each tree to reduce overfitting  
    'colsample_bytree': [0.8]  # Fraction of features used for building each tree to increase diversity among trees
}


# Initialize models
log_reg_bow = LogisticRegression(class_weight = 'balanced')
log_reg_tfidf = LogisticRegression(class_weight = 'balanced')
xgb_bow = xgb.XGBClassifier(scale_pos_weight = scale_pos_weight)
xgb_tfidf = xgb.XGBClassifier(scale_pos_weight = scale_pos_weight)

#log_bow = LogisticRegression(class_weight='balanced')
#log_tfidf = LogisticRegression(class_weight='balanced')
#xgb_bow = xgb.XGBClassifier(scale_pos_weight=scale_pos_weight)
#xgb_tfidf = xgb.XGBClassifier(scale_pos_weight=scale_pos_weight)

# Use 4-fold cross-validation (cv = 4) instead of 5-fold to reduce the computational cost and time
# GridSearchCV for Logistic Regression (BoW)
grid_log_reg_bow = GridSearchCV(log_bow, log_reg_params, scoring='f1_weighted', cv = 5)  # Perform hyperparameter tuning on Logistic Regression using GridSearchCV for BoW features with F1-weighted scoring and 4-fold CV
grid_log_reg_bow.fit(X_train_bow, y_train)  # Fit the model on the BoW training data and labels

# GridSearchCV for Logistic Regression (TF-IDF)
grid_log_reg_tfidf = GridSearchCV(log_tfidf, log_reg_params, scoring='f1_weighted', cv = 5)  # Perform hyperparameter tuning on Logistic Regression using GridSearchCV for TF-IDF features
grid_log_reg_tfidf.fit(X_train_tfidf, y_train)  # Fit the model on the TF-IDF training data and labels

# GridSearchCV for XGBoost (BoW)
grid_xgb_bow = GridSearchCV(xgb_bow, xgb_params, scoring='f1_weighted', cv = 5)  # Perform hyperparameter tuning on XGBoost using BoW features
grid_xgb_bow.fit(X_train_bow, y_train)  # Fit the XGBoost model on the BoW training data

# GridSearchCV for XGBoost (TF-IDF)
grid_xgb_tfidf = GridSearchCV(xgb_tfidf, xgb_params, scoring='f1_weighted', cv = 5)  # Perform hyperparameter tuning on XGBoost using TF-IDF features
grid_xgb_tfidf.fit(X_train_tfidf, y_train)  # Fit the XGBoost model on the TF-IDF training data

# Get the best models from the grid search
best_log_reg_bow = grid_log_reg_bow.best_estimator_  # Get the best-tuned Logistic Regression model for BoW
best_log_reg_tfidf = grid_log_reg_tfidf.best_estimator_  # Get the best-tuned Logistic Regression model for TF-IDF
best_xgb_bow = grid_xgb_bow.best_estimator_  # Get the best-tuned XGBoost model for BoW
best_xgb_tfidf = grid_xgb_tfidf.best_estimator_  # Get the best-tuned XGBoost model for TF-IDF

# Evaluate on the test set
log_reg_bow_pred = best_log_reg_bow.predict(X_test_bow)  # Predict using the best Logistic Regression (BoW) model
log_reg_tfidf_pred = best_log_reg_tfidf.predict(X_test_tfidf)  # Predict using the best Logistic Regression (TF-IDF) model
xgb_bow_pred = best_xgb_bow.predict(X_test_bow)  # Predict using the best XGBoost (BoW) model
xgb_tfidf_pred = best_xgb_tfidf.predict(X_test_tfidf)  # Predict using the best XGBoost (TF-IDF) model


# Performance metrics for each model
# Logistic Regression (BoW) Model
print("Evaluation for Logistic Regression (BoW) after Hyperparameter Tuning:")
print("Classification Report (BoW):")  # Display classification report for Log Reg (BoW) predictions
print(classification_report(y_test, log_reg_bow_pred))  # Print precision, recall, F1 score for each class
log_reg_bow_accuracy = (accuracy_score(y_test, log_reg_bow_pred)) * 100  # Calculate accuracy for Logistic Regression (BoW)
log_reg_bow_f1 = (f1_score(y_test, log_reg_bow_pred, average='weighted')) * 100  # Calculate weighted F1 score for Logistic Regression (BoW)

# Logistic Regression (TF-IDF) Model
print("Evaluation for Logistic Regression (TF-IDF) after Hyperparameter Tuning:")
print("Classification Report (TF-IDF):")  # Display classification report for Log Reg (TF-IDF) predictions
print(classification_report(y_test, log_reg_tfidf_pred))  # Print precision, recall, F1 score for each class
log_reg_tfidf_accuracy = (accuracy_score(y_test, log_reg_tfidf_pred)) * 100  # Calculate accuracy for Logistic Regression (TF-IDF)
log_reg_tfidf_f1 = (f1_score(y_test, log_reg_tfidf_pred, average='weighted')) * 100  # Calculate weighted F1 score for Logistic Regression (TF-IDF)

# XGBoost (BoW) Model
print("Evaluation for XGBoost (BoW) after Hyperparameter Tuning:")
print("Classification Report (BoW):")  # Display classification report for XGB (BoW) predictions
print(classification_report(y_test, xgb_bow_pred))  # Print precision, recall, F1 score for each class
xgb_bow_accuracy = (accuracy_score(y_test, xgb_bow_pred)) * 100  # Calculate accuracy for XGBoost (BoW)
xgb_bow_f1 = (f1_score(y_test, xgb_bow_pred, average='weighted')) * 100  # Calculate weighted F1 score for XGBoost (BoW)

# XGBoost (TF-IDF) Model
print("Evaluation for XGBoost (TF-IDF)after Hyperparameter Tuning:")
print("Classification Report (TF-IDF):")  # Display classification report for XGB (TF-IDF) predictions
print(classification_report(y_test, xgb_tfidf_pred))  # Print precision, recall, F1 score for each class
xgb_tfidf_accuracy = (accuracy_score(y_test, xgb_tfidf_pred)) * 100  # Calculate accuracy for XGBoost (TF-IDF)
xgb_tfidf_f1 = (f1_score(y_test, xgb_tfidf_pred, average='weighted')) * 100  # Calculate weighted F1 score for XGBoost (TF-IDF)

# Create a DataFrame to consolidate the evaluation metrics (accuracy and F1 score).
# Create a dictionary with the model names and their corresponding accuracy and F1 scores
evaluation_metrics2 = {
    'Model': ['Log Reg (BoW)', 'Log Reg (TF-IDF)', 'XGB (BoW)', 'XGB (TF-IDF)'],  # Model names
    'Accuracy': [log_reg_bow_accuracy, log_reg_tfidf_accuracy, xgb_bow_accuracy, xgb_tfidf_accuracy],  # Corresponding accuracies
    'F1 Score': [log_reg_bow_f1, log_reg_tfidf_f1, xgb_bow_f1, xgb_tfidf_f1]  # Corresponding F1 scores
}

# Convert the dictionary into a DataFrame
evaluation_data2 = pd.DataFrame(evaluation_metrics2)  # Create a DataFrame for evaluation metrics

# Define a function to format the numbers as percentages
def format_percentage(value):
    return f'{value:.2f}%'

# Use apply with the custom function for both columns
evaluation_data2['Accuracy'] = evaluation_data2['Accuracy'].apply(format_percentage)  # Format accuracy as percentage
evaluation_data2['F1 Score'] = evaluation_data2['F1 Score'].apply(format_percentage)  # Format F1 score as percentage

# Display the evaluation metrics DataFrame
evaluation_data2  # Show the DataFrame with formatted percentages for accuracy and F1 score

Evaluation for Logistic Regression (BoW) after Hyperparameter Tuning:
Classification Report (BoW):
              precision    recall  f1-score   support

           0       0.52      0.50      0.51        30
           1       0.96      0.97      0.96       409

    accuracy                           0.93       439
   macro avg       0.74      0.73      0.74       439
weighted avg       0.93      0.93      0.93       439

Evaluation for Logistic Regression (TF-IDF) after Hyperparameter Tuning:
Classification Report (TF-IDF):
              precision    recall  f1-score   support

           0       0.54      0.43      0.48        30
           1       0.96      0.97      0.97       409

    accuracy                           0.94       439
   macro avg       0.75      0.70      0.72       439
weighted avg       0.93      0.94      0.93       439

Evaluation for XGBoost (BoW) after Hyperparameter Tuning:
Classification Report (BoW):
              precision    recall  f1-score   support



Unnamed: 0,Model,Accuracy,F1 Score
0,Log Reg (BoW),93.39%,93.34%
1,Log Reg (TF-IDF),93.62%,93.29%
2,XGB (BoW),81.55%,85.63%
3,XGB (TF-IDF),82.46%,86.30%


#### Model Evaluation Insights After Hyperparameter Tuning
---
`1. Logistic Regression (BoW):`
- F1-score: 51% (negatives), 96% (positives)
- Accuracy: 93%
- **Insight: Slight improvement in negative sentiment detection but remains stable in positive classification.**
---
`2. Logistic Regression (TF-IDF):`
- F1-score: 48% (negatives), 97% (positives)
- Accuracy: 94%
- **Insight: Recall for negatives decreases (43%), indicating reduced effectiveness in capturing negative sentiment.**
---
`3. XGBoost (BoW):`
- F1-score: 37% (negatives), 89% (positives)
- Accuracy: 82%
- **Insight: Slight improvement in recall for negatives (80%), but still lacks reliability for balanced tasks.**
---
`4. XGBoost (TF-IDF):`
- F1-score: 39% (negatives), 90% (positives)
- Accuracy: 82%
- **Insight: Shows slight improvements for negative sentiment detection but remains imbalanced.**
---

**Best Model Decision**
- Based on the insights from the evaluations:

  - 1. Logistic Regression (BoW) is the best-performing model with the highest accuracy (93.39%) and the best balance in F1 score (93.34%). Its reliability in classifying both positive and negative sentiments, combined with high recall for positive sentiment, makes it a robust choice.

  - 2. Logistic Regression (TF-IDF) also performs well but shows a slight decrease in negative sentiment detection.

  - 3. XGBoost models, despite improvements, still lag in overall performance compared to Logistic Regression, particularly in handling negative sentiment.

**Conclusion**
- **Deploy Logistic Regression (BoW) as the best model for sentiment classification due to its strong performance across all metrics and its balanced handling of both sentiment classes.**

# 6. Model Deployment

#### 1. Flask Deployment Steps
   - Create a project folder.
   - Set up a virtual environment: conda create -p alpha_team_sentiment_analyzer_model_deploy_venv python==3.9 scikit-learn==1.3.0 -y 
   - Activate the virtual environment: conda activate alpha_team_sentiment_analyzer_model_deploy_venv
   - Install required packages: pip install flask scikit-learn pandas numpy
   - Save your trained model and vectorizer using pickle.
   - Create a app.py file with Flask app code.
   - Define the prediction route and load the model/vectorizer in app.py.
   - Run the Flask app:python app.py
   - Test the API using cURL or Postman.
---
#### 2. Streamlit Deployment Steps
   - Create a project folder.
   - Set up a virtual environment (same as with flask).
   - Activate the virtual environment.(same as with flask).
   - Install required packages: pip install streamlit scikit-learn pandas numpy
   - Save your trained model and vectorizer using pickle.
   - Create a streamlit_app.py file with Streamlit app code.
   - Load the model/vectorizer in streamlit_app.py.
   - Define the Streamlit app layout and prediction logic.
   - Run the Streamlit app: streamlit run streamlit_app.py
   - Test the app in a web browser.
---
#### 3. Integrated Guide for Deploying Machine Learning Models with Flask and Streamlit Steps
   - Create a project folder.
   - Set up a virtual environment: conda create -p alpha_team_sentiment_analyzer_model_deploy_venv python==3.9 scikit-learn==1.3.0 -y
   - Activate the virtual environment: conda activate alpha_team_sentiment_analyzer_model_deploy_venv
   - Install required packages: pip install flask streamlit scikit-learn pandas numpy
   - Save your trained model and vectorizer using pickle.
   - Create app.py for Flask deployment.
   - Load model and vectorizer.
   - Define the prediction route.
   - Create streamlit_app.py for Streamlit deployment.
   - Load model and vectorizer.
   - Define the app layout and prediction logic.
   - Run the Flask app: python app.py
   - Run the Streamlit app: streamlit run streamlit_app.py
   - Test both applications using cURL/Postman for Flask and a web browser for Streamlit.

#### Save Trained Model and Vectorizer using pickle

In [92]:
# import pickle
import pickle

# Saving the Logistic Regression bow Model (after Hyperparameter tuning)
with open('log_reg_bow_model.pkl', 'wb') as model_file:
    pickle.dump(grid_log_reg_bow, model_file)  # Save the trained model

# Saving the Vectorizer for Bag of Words from the feature engin
with open('vectorizer.pkl', 'wb') as vectorizer_file:
    pickle.dump(bow_vectorizer, vectorizer_file)  # Save the vectorizer