# E-commerce Product Review Sentiment Analyzer

### Introduction
The project aims to develop a sentiment analyzer for e-commerce product reviews sourced from Aliexpress, focusing on the Electronics category. The Project involves various aspects of data science, including data acquisition, preprocessing, model development, deployment, and pipeline implementation.

### About Company
AliExpress is a globally renowned e-commerce platform that connects consumers with millions of products at competitive prices. AliExpress provides a seamless shopping experience, empowering individuals and businesses to discover, purchase, and sell quality products from trusted sellers across the globe.

### Problem Statement
E-commerce platforms grapple with the challenge of analyzing vast amounts of customer feedback to accurately understand product sentiment. Understanding this sentiment is critical for businesses to make informed decisions regarding product improvements, marketing strategies, and customer satisfaction. However, manually analyzing thousands of product reviews is time-consuming and inefficient. Consequently, an automated sentiment analysis solution is required to effectively process and interpret these reviews.

### Problem Objectives
- Develop an accurate and efficient sentiment analysis model for e-commerce product reviews.

- Extract valuable insights from customer reviews to inform product improvement and marketing strategies.

- Enhance customer satisfaction by providing businesses with a deeper understanding of customer sentiment.

### Project Overview
This project aimed to develop a sentiment analysis model to understand customer feedback on e-commerce products. We focused on classifying reviews into positive, and negative sentiment.

1. Data Acquisition

We obtained text reviews and star ratings from a database, comprising more than 10,000 rows.

2. Model Development/Deployment

We trained multiple machine learning model to accurately predict the sentiment expressed in reviews.

3. Insights

We analyzed the model's predictions to identify trends and patterns in customer sentiment.

### Sentiment Analysis Methodology
Our methodology involved several key steps, including data collection, preprocessing, feature engineering, model training, and evaluation.

1. Data Acquisition
2. Data Preprocessing
3. Feature Engineering
4. Model Development/Training
5. Model Evaluation
6. Model Deployment

### Project Contributors
1. Ifechukwu Akaeze

2. Daniel Edet Onofiok

3. Okediran Tope Emmanuel

4. Dr. Ezeuchu Emmanuel Uzond

5. Modinat Gbemisola Adesope

6. Adebayo Olalekan

7. Khadijat Oludolapo Adebiyi

8. Chinua Mbajekwe

9. Chinwe Njoku

10. Vincent C. Ajaegbu

11. Ayodele Kehinde Richard

12. Precious Odinakachi Loveday

13. Ifeoluwa Adeniyi

# 1. Data Acquisition
 - Import relevant library (pandas)
 - Load Dataset
 - Extract the text reviews (Feedback_translated) and rating columns from the Dataset and Create a new DataFrame
 - Save the new DataFrame to a CSV file for reference

`Import library`

In [3]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

`Load dataset`

In [5]:
ecomm_data = pd.read_csv('Team Alpha dataset - Team Alpha dataset.csv')

In [6]:
ecomm_data.head()

Unnamed: 0,productId,Rating,Date,Feedback_translated,Feedback,Unnamed: 5,Name,Country,Upvotes,Downvotes
0,1005010000000000,100,18-May-24,Very good packaging well protected but not yet...,trÃ¨s bon emballage bien protÃ©gÃ© mais pas en...,,a***r,FR,0,0
1,1005010000000000,60,29-May-24,"lights are extremely bright, we used 1.2v batt...","lights are extremely bright, we used 1.2v batt...",,Amazon Shopper,US,0,0
2,1005010000000000,100,25-May-24,I like it very much for my son. It is as the d...,Me gusto mucho para mi hijo. Es tal cual la de...,,R***S,CL,0,0
3,1005010000000000,100,23-Apr-24,"corresponds to the description, fast delivery,...","corresponds to the description, fast delivery,...",,Amazon Shopper,UA,0,0
4,1005010000000000,100,3-May-24,As described. Good quality. Batteries not incl...,As described. Good quality. Batteries not incl...,,J***h,HU,0,0


In [7]:
# Check for columns data count, null count, data types
ecomm_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2192 entries, 0 to 2191
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   productId            2192 non-null   int64  
 1   Rating               2192 non-null   int64  
 2   Date                 2192 non-null   object 
 3   Feedback_translated  1306 non-null   object 
 4   Feedback             1306 non-null   object 
 5   Unnamed: 5           0 non-null      float64
 6   Name                 2192 non-null   object 
 7   Country              2190 non-null   object 
 8   Upvotes              2192 non-null   int64  
 9   Downvotes            2192 non-null   int64  
dtypes: float64(1), int64(4), object(5)
memory usage: 171.4+ KB


`Extract the Feedback_translated and rating columns from the Dataset and Create a Nee DataFrame`

In [9]:
extract_ecomm_data = ecomm_data[['Feedback_translated', 'Rating']]

In [10]:
# Display the new DataFrame
extract_ecomm_data

Unnamed: 0,Feedback_translated,Rating
0,Very good packaging well protected but not yet...,100
1,"lights are extremely bright, we used 1.2v batt...",60
2,I like it very much for my son. It is as the d...,100
3,"corresponds to the description, fast delivery,...",100
4,As described. Good quality. Batteries not incl...,100
...,...,...
2187,Very good,100
2188,,100
2189,"looks fine, not tried yet!.......................",100
2190,,100


`Save the new DataFrame to a CSV file`

In [98]:
# Save as csv file (file name - 'extracted_reviews.csv')
extract_ecomm_data.to_csv('extracted_reviews.csv', index = False)

# 2. Data Preprocessing
- Check/Treat missing data
- Text Cleaning
- Tokenization
- Create a Sentiment column based on Rating column and Encode using LabelEncoder (to convert Sentiment column into numerical values for model development)

`Check/Treat missing data`
- Check for count of rows in column with missing values
- Display rows where the column has missing values (NaN)
- Determine criteria to treat/replace NaN, checking if all the ratings equal 100 or not
- Replace NaN with the determined criteria
- Apply and confirm replacement of missing value

In [15]:
extract_ecomm_data

Unnamed: 0,Feedback_translated,Rating
0,Very good packaging well protected but not yet...,100
1,"lights are extremely bright, we used 1.2v batt...",60
2,I like it very much for my son. It is as the d...,100
3,"corresponds to the description, fast delivery,...",100
4,As described. Good quality. Batteries not incl...,100
...,...,...
2187,Very good,100
2188,,100
2189,"looks fine, not tried yet!.......................",100
2190,,100


In [16]:
# Check for count of rows in column with missing values
extract_ecomm_data.isnull().sum()

Feedback_translated    886
Rating                   0
dtype: int64

In [17]:
# Display rows where 'feedback_translated' column has missing values (NaN)
missing_feedback_translated = extract_ecomm_data[extract_ecomm_data['Feedback_translated'].isnull()]
missing_feedback_translated

Unnamed: 0,Feedback_translated,Rating
29,,100
30,,100
31,,100
32,,100
33,,100
...,...,...
2185,,100
2186,,100
2188,,100
2190,,100


In [18]:
# Determine criteria to treat/replace NaN 
# Let's check if all the ratings for the rows where the feedback_translated column is NaN have the value 100 to help determine how to treat the NaN
all_missing_ratings_are_100 = missing_feedback_translated['Rating'].eq(100).all()

print(f"All ratings for missing feedback are 100: {all_missing_ratings_are_100}")

All ratings for missing feedback are 100: False


In [19]:
# Given that not all rows in the rating column are equal to 100, we will still drop the NaN in the feedback_translated column - the ratings are irrelevant, since there no feedbacks to backup the ratings, 

In [20]:
# Drop NaN in 'feedback_translated' column
extract_ecomm_data = extract_ecomm_data.dropna()

In [21]:
# Apply and confirm replacement of missing value
extract_ecomm_data

Unnamed: 0,Feedback_translated,Rating
0,Very good packaging well protected but not yet...,100
1,"lights are extremely bright, we used 1.2v batt...",60
2,I like it very much for my son. It is as the d...,100
3,"corresponds to the description, fast delivery,...",100
4,As described. Good quality. Batteries not incl...,100
...,...,...
2146,The lock is damaged.,40
2183,I recommend all the old man to the seller.,100
2184,"Put it, it works.",100
2187,Very good,100


`Text Cleaning: *Remove noise, special characters, and irrelevant information*`
- Import relevant libraries and download stopwords from NLTK
- Define the stop words
- Define Function to clean the text
- Apply the cleaning function to the 'Feedback_translated' column
- View the cleaned text

In [23]:
# import relevant libraries and download stopwords from NLTK

import re                                  # re - Regular Expression, useful for text cleaning (e.g., removing special characters).
import nltk                                # nltk - Natural Language Toolkit, a library used for natural language processing tasks like tokenization and stopword removal.
from nltk.corpus import stopwords
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\DELL\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [24]:
# Define the stop words
stop_words = set(stopwords.words('english'))

In [25]:
# Define Function to clean the text
def clean_text(text):
    # Remove special characters and numbers, keeping only letters and spaces
    text = re.sub(r'[^a-zA-Z\s]', '', text)                                              # This regex matches anything that is not a letter or space and replaces it with an empty string
    # Convert text to lowercase to ensure uniformity
    text = text.lower()                                                                  # This helps in reducing the complexity of the analysis by treating 'Word' and 'word' as the same
    # Remove extra spaces between words
    text = re.sub(r'\s+', ' ', text).strip()                                             # This replaces multiple spaces with a single space and trims leading/trailing spaces
    # Remove stop words (common words that may not add significant meaning to the text)
    text = ' '.join([word for word in text.split() if word not in stop_words])           # This creates a list of words, excluding stop words, and joins them back into a string
    return text                                                                          # Return the cleaned text

In [26]:
# Apply the cleaning function to the 'Feedback_translated' column
extract_ecomm_data['Cleaned_Feedback_translated'] = extract_ecomm_data['Feedback_translated'].apply(clean_text)

In [27]:
# View the cleaned text
extract_ecomm_data

Unnamed: 0,Feedback_translated,Rating,Cleaned_Feedback_translated
0,Very good packaging well protected but not yet...,100,good packaging well protected yet mounted
1,"lights are extremely bright, we used 1.2v batt...",60,lights extremely bright used v battery instead...
2,I like it very much for my son. It is as the d...,100,like much son description put batteries thank
3,"corresponds to the description, fast delivery,...",100,corresponds description fast delivery well pac...
4,As described. Good quality. Batteries not incl...,100,described good quality batteries included fast...
...,...,...,...
2146,The lock is damaged.,40,lock damaged
2183,I recommend all the old man to the seller.,100,recommend old man seller
2184,"Put it, it works.",100,put works
2187,Very good,100,good


`Tokenization: *Split text into tokens*`
- Import relevant libraries and download punkt from NLTK
- Do a sample text to test the tokenization
- Perform word tokenization
- Tokenize each feedback_translated using NLTK's word_tokenize

In [29]:
# Import relevant libraries and download punkt from NLTK

from nltk.tokenize import word_tokenize
nltk.download('punkt')                    # Download the 'punkt' resource, which is necessary for tokenizing text, especially for splitting sentences into words

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\DELL\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [30]:
# Do a sample text to test the tokenization - Taking the first feedback as an example
sample_text = extract_ecomm_data['Cleaned_Feedback_translated'].iloc[0]
sample_text

'good packaging well protected yet mounted'

In [31]:
# Perform word tokenization
tokens = word_tokenize(sample_text)
tokens

['good', 'packaging', 'well', 'protected', 'yet', 'mounted']

In [32]:
# Tokenize each feedback_translated using NLTK's word_tokenize
extract_ecomm_data['Feedback_tokens'] = extract_ecomm_data['Cleaned_Feedback_translated'].apply(word_tokenize)

In [33]:
cleaned_ecomm_review = extract_ecomm_data
cleaned_ecomm_review

Unnamed: 0,Feedback_translated,Rating,Cleaned_Feedback_translated,Feedback_tokens
0,Very good packaging well protected but not yet...,100,good packaging well protected yet mounted,"[good, packaging, well, protected, yet, mounted]"
1,"lights are extremely bright, we used 1.2v batt...",60,lights extremely bright used v battery instead...,"[lights, extremely, bright, used, v, battery, ..."
2,I like it very much for my son. It is as the d...,100,like much son description put batteries thank,"[like, much, son, description, put, batteries,..."
3,"corresponds to the description, fast delivery,...",100,corresponds description fast delivery well pac...,"[corresponds, description, fast, delivery, wel..."
4,As described. Good quality. Batteries not incl...,100,described good quality batteries included fast...,"[described, good, quality, batteries, included..."
...,...,...,...,...
2146,The lock is damaged.,40,lock damaged,"[lock, damaged]"
2183,I recommend all the old man to the seller.,100,recommend old man seller,"[recommend, old, man, seller]"
2184,"Put it, it works.",100,put works,"[put, works]"
2187,Very good,100,good,[good]


`Create a Sentiment column based on primarily on the Feecback_translated column and Combined Score (Sentiment_Score and Rating)`

In [35]:
#Extract rows with specific ratings

#specific_rating = 20
#specific_rating = 40
#specific_rating = 60
#specific_rating = 80
#specific_rating = 100

specific_rating = 60
sr_cleaned_ecomm_review = cleaned_ecomm_review[cleaned_ecomm_review['Rating'] == specific_rating]
sr_cleaned_ecomm_review.head(50)

Unnamed: 0,Feedback_translated,Rating,Cleaned_Feedback_translated,Feedback_tokens
1,"lights are extremely bright, we used 1.2v batt...",60,lights extremely bright used v battery instead...,"[lights, extremely, bright, used, v, battery, ..."
191,"Itâ€™s cute, lace blends fairly well but I did...",60,cute lace blends fairly well put bit foundatio...,"[cute, lace, blends, fairly, well, put, bit, f..."
203,The wig is very cute but it is not the same co...,60,wig cute color photo natural hair although fee...,"[wig, cute, color, photo, natural, hair, altho..."
211,Received in colombia,60,received colombia,"[received, colombia]"
265,The theme stickers bought very well but they p...,60,theme stickers bought well put like tree leaf ...,"[theme, stickers, bought, well, put, like, tre..."
302,I ordered this unit for my graduation day! I w...,60,ordered unit graduation day little disappointe...,"[ordered, unit, graduation, day, little, disap..."
373,Purchased for a 2020 model 3 but the unit did ...,60,purchased model unit fitline clips,"[purchased, model, unit, fitline, clips]"
374,"Fast ship from California, but my product was ...",60,fast ship california product faulty touchscree...,"[fast, ship, california, product, faulty, touc..."
404,"I cannot connect bt music with my phone, and a...",60,cannot connect bt music phone also cannot conn...,"[can, not, connect, bt, music, phone, also, ca..."
405,"Shvidko came, the most undone was occupied by ...",60,shvidko came undone occupied khvilin high rate...,"[shvidko, came, undone, occupied, khvilin, hig..."


`Observation 1:` While reviewing the Feedback_translated column with rating(20,40,60,80,100). There are instances where negative feedback received high scores and vice versa. To address this, we will create a sentiment column categorized as positive or negative based on the feedback_translated text and a combine score(Sentiment Score and Rating). The approach will be the following:

1. Analyze the feedback using the VADER for sentiment analysis.
2. Normalize the Rating to the range [-1, 1] to match VADER's scale.
3. Calculate the combined sentiment score (Weigh both components equally, or adjust weights as needed).
4. Define a function to classify sentiment based on the combined score.
5. Apply label encoding to convert the sentiment column into binary values (1 for positive, 0 for negative).

VADER: Valence Aware Dictionary and sEntiment Reasoner is a pre-trained sentiment analysis tool designed to analyze text for sentiment polarity and intensity, particularly effective for social media and short texts. It uses a lexicon of sentiment related words and rules to score and classify text as positive, negative, or neutral.`

In [37]:
# import relevant libraries
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import nltk
from sklearn.preprocessing import MinMaxScaler, LabelEncoder

In [38]:
# Initialize VADER
nltk.download('vader_lexicon')
vader_analyzer = SentimentIntensityAnalyzer()

# Step 1: Analyze Feedback_translated using VADER for sentiment analysis
def get_sentiment_score(review):
    return vader_analyzer.polarity_scores(review)['compound']
# Apply the function to create the SentimentScore column
cleaned_ecomm_review['SentimentScore'] = cleaned_ecomm_review['Cleaned_Feedback_translated'].apply(get_sentiment_score)

# Step 2: Normalize the Rating to the range [-1, 1] to match VADER's scale
scaler = MinMaxScaler(feature_range=(-1, 1))
cleaned_ecomm_review['NormalizedRating'] = scaler.fit_transform(cleaned_ecomm_review[['Rating']])

# Step 3: Calculate the combined sentiment score
# Weigh both components equally, or adjust weights as needed
cleaned_ecomm_review['CombinedScore'] = 0.5 * cleaned_ecomm_review['SentimentScore'] + 0.5 * cleaned_ecomm_review['NormalizedRating']

# Step 4: Define a function to classify sentiment based on the combined score
def classify_sentiment(score):
    if score > 0:
        return 'positive'  # Positive sentiment
    else:
        return 'negative'  # Negative sentiment
# Apply the function to create the Sentiment column
cleaned_ecomm_review['Sentiment'] = cleaned_ecomm_review['CombinedScore'].apply(classify_sentiment)

# Step 5: Apply label encoding to convert the Sentiment column into binary values
encoder = LabelEncoder()
cleaned_ecomm_review['Sentiment_Encoded'] = encoder.fit_transform(cleaned_ecomm_review['Sentiment'])
# Check the distribution of sentiments
sentiment_distribution = cleaned_ecomm_review['Sentiment_Encoded'].value_counts()
print(sentiment_distribution)

# Display the DataFrame with the new 'sentiment' column
cleaned_ecomm_review

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\DELL\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


Sentiment_Encoded
1    1213
0      93
Name: count, dtype: int64


Unnamed: 0,Feedback_translated,Rating,Cleaned_Feedback_translated,Feedback_tokens,SentimentScore,NormalizedRating,CombinedScore,Sentiment,Sentiment_Encoded
0,Very good packaging well protected but not yet...,100,good packaging well protected yet mounted,"[good, packaging, well, protected, yet, mounted]",0.7845,1.0,0.89225,positive,1
1,"lights are extremely bright, we used 1.2v batt...",60,lights extremely bright used v battery instead...,"[lights, extremely, bright, used, v, battery, ...",0.7579,0.0,0.37895,positive,1
2,I like it very much for my son. It is as the d...,100,like much son description put batteries thank,"[like, much, son, description, put, batteries,...",0.6124,1.0,0.80620,positive,1
3,"corresponds to the description, fast delivery,...",100,corresponds description fast delivery well pac...,"[corresponds, description, fast, delivery, wel...",0.5574,1.0,0.77870,positive,1
4,As described. Good quality. Batteries not incl...,100,described good quality batteries included fast...,"[described, good, quality, batteries, included...",0.4404,1.0,0.72020,positive,1
...,...,...,...,...,...,...,...,...,...
2146,The lock is damaged.,40,lock damaged,"[lock, damaged]",-0.4404,-0.5,-0.47020,negative,0
2183,I recommend all the old man to the seller.,100,recommend old man seller,"[recommend, old, man, seller]",0.3612,1.0,0.68060,positive,1
2184,"Put it, it works.",100,put works,"[put, works]",0.0000,1.0,0.50000,positive,1
2187,Very good,100,good,[good],0.4404,1.0,0.72020,positive,1


*The Sentiment Distribution indicates that most customer feedback is positive, with 1213 positive sentiments compared to only 93 negative sentiments. This suggests high customer satisfaction with the products, while also highlighting a minimal level of negative sentiment.*

In [40]:
#Extract rows with specific ratings

#specific_rating = 20
#specific_rating = 40
#specific_rating = 60
#specific_rating = 80
#specific_rating = 100

specific_rating = 100
sr_cleaned_ecomm_review = cleaned_ecomm_review[cleaned_ecomm_review['Rating'] == specific_rating]
sr_cleaned_ecomm_review

Unnamed: 0,Feedback_translated,Rating,Cleaned_Feedback_translated,Feedback_tokens,SentimentScore,NormalizedRating,CombinedScore,Sentiment,Sentiment_Encoded
0,Very good packaging well protected but not yet...,100,good packaging well protected yet mounted,"[good, packaging, well, protected, yet, mounted]",0.7845,1.0,0.89225,positive,1
2,I like it very much for my son. It is as the d...,100,like much son description put batteries thank,"[like, much, son, description, put, batteries,...",0.6124,1.0,0.80620,positive,1
3,"corresponds to the description, fast delivery,...",100,corresponds description fast delivery well pac...,"[corresponds, description, fast, delivery, wel...",0.5574,1.0,0.77870,positive,1
4,As described. Good quality. Batteries not incl...,100,described good quality batteries included fast...,"[described, good, quality, batteries, included...",0.4404,1.0,0.72020,positive,1
5,Works very well and is of good quality,100,works well good quality,"[works, well, good, quality]",0.6124,1.0,0.80620,positive,1
...,...,...,...,...,...,...,...,...,...
2143,"Excellent, exactly what I was after ðŸ¤©ðŸ‘",100,excellent exactly,"[excellent, exactly]",0.5719,1.0,0.78595,positive,1
2183,I recommend all the old man to the seller.,100,recommend old man seller,"[recommend, old, man, seller]",0.3612,1.0,0.68060,positive,1
2184,"Put it, it works.",100,put works,"[put, works]",0.0000,1.0,0.50000,positive,1
2187,Very good,100,good,[good],0.4404,1.0,0.72020,positive,1


`Observation 2:` After combining the VADER Sentiment Score with Rating to classify Feedback_translated, the sentiment classification is now more accurate. High ratings that masked negative sentiments are effectively addressed, leading to a more reliable distinction between "Positive" and "Negative" sentiments.

# 3. Feature Engineering
- Bag of Words Vectors (BoW)
- TF-IDF - Term Frequency-Inverse Document Frequency
- Data Splitting

`Bag of Words Vectors (BoW): Convert text into a matrix of token counts using CountVectorizer from sklearn.`
- Import the relevant libraries
- Initialize the CountVectorizer (BoW)
- Apply BoW to the 'Cleaned_Feedback_translated' column
- Check the shape of the BoW feature matrix

In [44]:
# import the relevant libraries
from sklearn.feature_extraction.text import CountVectorizer

In [45]:
# Initialize the CountVectorizer (BoW)
bow_vectorizer = CountVectorizer()

In [46]:
# Apply BoW to the 'Cleaned_Feedback_translated' column
bow_matrix = bow_vectorizer.fit_transform(cleaned_ecomm_review['Cleaned_Feedback_translated'])

In [47]:
# Check the shape of the BoW feature matrix
bow_matrix.shape                               # Prints the dimensions of the transformed BoW matrix

(1306, 2655)

`TF-IDF - Term Frequency-Inverse Document Frequency: Use TfidfVectorizer to account for word frequency while downweighting common words that appear in many feedback reviews.`
- Import the relevant libraries
- Initialize the TfidfVectorizer
- Apply TF-IDF to the 'Cleaned_Feedback_translated' column
- Check the shape of the TF-IDF feature matrix

In [49]:
# Import the relevant libraries
from sklearn.feature_extraction.text import TfidfVectorizer

In [50]:
# Initialize the TfidfVectorizer
tfidf_vectorizer = TfidfVectorizer()

In [51]:
# Apply TF-IDF to the 'Cleaned_Feedback_translated' column
tfidf_matrix = tfidf_vectorizer.fit_transform(cleaned_ecomm_review['Cleaned_Feedback_translated'])

In [52]:
# Check the shape of the TF-IDF feature matrix
tfidf_matrix.shape                                # Prints the dimensions of the TF-IDF matrix

(1306, 2655)

`Data Splitting`
- Split the dataset into training and test sets (Train-Test Split) 

In [54]:
# Split the dataset into training and test sets
from sklearn.model_selection import train_test_split

# BoW Train-Test Split
X_train_bow, X_test_bow, y_train_bow, y_test_bow = train_test_split(
    bow_matrix, cleaned_ecomm_review['Sentiment_Encoded'],
    test_size = 0.2, random_state = 42, stratify = cleaned_ecomm_review['Sentiment_Encoded']
)                                                                                                                                                             # x = bow_matrix, y = Sentiment_Encoded (Target column)
                                                                                                                                                              # stratify argument to maintain the class balance
# TF-IDF Train-Test Split
X_train_tfidf, X_test_tfidf, y_train_tfidf, y_test_tfidf = train_test_split(
    tfidf_matrix, cleaned_ecomm_review['Sentiment_Encoded'],
    test_size = 0.2, random_state = 42, stratify = cleaned_ecomm_review['Sentiment_Encoded']
)

# 4. Model Development
- Pre-trained VADER model
- Custom Models

Pre-trained VADER Model`
- Create a Function to calculate sentiment score using VADER
- Apply VADER sentiment analysis to the Cleaned_Feedback_translated column

In [57]:
# Create a Function to calculate sentiment score using VADER
def vader_sentiment(text):
    sentiment_score = vader_analyzer.polarity_scores(text)
     
    # Adjust thresholds for sentiment classification
    threshold_positive = 0.01
    threshold_negative = -0.01
    
    # Classify the sentiment as positive or negative based on the compound score
    if sentiment_score['compound'] >= threshold_positive:
        return 'positive'
    else:
        return 'negative'

In [58]:
# Apply VADER sentiment analysis to the Cleaned_Feedback_translated column

cleaned_ecomm_review['vader_sentiment'] = cleaned_ecomm_review['Cleaned_Feedback_translated'].apply(vader_sentiment)

In [59]:
cleaned_ecomm_review

Unnamed: 0,Feedback_translated,Rating,Cleaned_Feedback_translated,Feedback_tokens,SentimentScore,NormalizedRating,CombinedScore,Sentiment,Sentiment_Encoded,vader_sentiment
0,Very good packaging well protected but not yet...,100,good packaging well protected yet mounted,"[good, packaging, well, protected, yet, mounted]",0.7845,1.0,0.89225,positive,1,positive
1,"lights are extremely bright, we used 1.2v batt...",60,lights extremely bright used v battery instead...,"[lights, extremely, bright, used, v, battery, ...",0.7579,0.0,0.37895,positive,1,positive
2,I like it very much for my son. It is as the d...,100,like much son description put batteries thank,"[like, much, son, description, put, batteries,...",0.6124,1.0,0.80620,positive,1,positive
3,"corresponds to the description, fast delivery,...",100,corresponds description fast delivery well pac...,"[corresponds, description, fast, delivery, wel...",0.5574,1.0,0.77870,positive,1,positive
4,As described. Good quality. Batteries not incl...,100,described good quality batteries included fast...,"[described, good, quality, batteries, included...",0.4404,1.0,0.72020,positive,1,positive
...,...,...,...,...,...,...,...,...,...,...
2146,The lock is damaged.,40,lock damaged,"[lock, damaged]",-0.4404,-0.5,-0.47020,negative,0,negative
2183,I recommend all the old man to the seller.,100,recommend old man seller,"[recommend, old, man, seller]",0.3612,1.0,0.68060,positive,1,positive
2184,"Put it, it works.",100,put works,"[put, works]",0.0000,1.0,0.50000,positive,1,negative
2187,Very good,100,good,[good],0.4404,1.0,0.72020,positive,1,positive


`Custom Models`:
`Naive Bayes Model` - preferred for sentiment analysis due to its simplicity, speed, and effectiveness in handling high-dimensional text data, like BoW or TF-IDF features.
- Import relevant Libraries
- Initialize and Train the Naive Bayes model (BoW and TF-IDF)
- Make predictions on the test set (BoW and TF-IDF)
- Transform the cleaned feedback translated into numerical feature representations: Bag of Words format using fitted CountVectorizer and TF-IDF format using fitted TfidfVectorizer.
- Predict sentiments using the Naive Bayes model for both Bag of Words and TF-IDF features, and update the Cleaned_ecomm_review DataFrame with the new sentiment predictions in 'BoW' and 'TF-IDF' columns. 

In [61]:
# Import relevant Libraries
from sklearn.naive_bayes import MultinomialNB

In [62]:
# Initialize and Train the Naive Bayes model on BoW
nb_bow = MultinomialNB()
nb_bow.fit(X_train_bow, y_train_bow)

# Make predictions on the test set (BoW)
y_pred_bow = nb_bow.predict(X_test_bow)

In [63]:
# Initialize and Train the Naive Bayes model on TF-IDF
nb_tfidf = MultinomialNB()
nb_tfidf.fit(X_train_tfidf, y_train_tfidf)

# Make predictions on the test set (TF-IDF)
y_pred_tfidf = nb_tfidf.predict(X_test_tfidf)

In [64]:
# Transform the cleaned feedback translated into numerical feature representations: Bag of Words format using fitted CountVectorizer and TF-IDF format using fitted TfidfVectorizer.
# bow_test = bow_vectorizer.transform(cleaned_ecomm_review['Cleaned_Feedback_translated'])
# tfidf_test = tfidf_vectorizer.transform(cleaned_ecomm_review['Cleaned_Feedback_translated'])

This code transforms the cleaned feedback translated into numerical feature representations using Bag of Words and TF-IDF formats, enabling the machine learning models to analyze and predict sentiments effectively.

In [66]:
# Predict sentiments using the Naive Bayes model for both Bag of Words and TF-IDF features, and update the DataFrame with the new sentiment predictions in 'BoW' and 'TF-IDF' columns. 
# cleaned_ecomm_review.loc[:, 'BoW'] = nb_bow.predict(bow_test)
# cleaned_ecomm_review.loc[:, 'TF-IDF'] = nb_tfidf.predict(tfidf_test)
# cleaned_ecomm_review

# 5. Model Evaluation:
- Assess model performance using accuracy and F1 score metrics.
- Optionally, conduct hyperparameter tuning for improved performance.

#### Assess model performance using accuracy and F1 score metrics.
- Import necessary libraries for evaluation
- Evaluate VADER Model Prediction
- Evaluate Naive Bayes (BoW) Model
- Evaluate Naive Bayes (TF-IDF) Model
- Create a DataFrame to consolidate the evaluation metrics (accuracy and F1 score) for the three models: (VADER, Naive Bayes with BoW, and Naive Bayes with TF-IDF).

In [69]:
# Import necessary libraries for evaluation
from sklearn.metrics import accuracy_score, f1_score, classification_report

In [74]:
# Evaluate VADER Model Prediction
print("Evaluation for Vader:")
print("VADER Classification Report:")
print(vader_classification_report)
vader_accuracy = (accuracy_score(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'])) * 100              # Calculate accuracy - measures how many predictions were correct.
vader_f1 = (f1_score(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'], average='weighted')) * 100      # Calculate F1 score - a weighted average of precision and recall, useful in imbalanced datasets.
vader_classification_report = classification_report(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'])

# Evaluate Naive Bayes (BoW) Model
print("Evaluation for Naive Bayes (BoW):")
print("Classification Report (BoW):")
print(classification_report(y_test_bow, y_pred_bow))
bow_accuracy = (accuracy_score(y_test_bow, y_pred_bow)) * 100 
bow_f1 = (f1_score(y_test_bow, y_pred_bow, average='weighted')) * 100

# Evaluate Naive Bayes (TF-IDF) Model
print("Evaluation for Naive Bayes (TF-IDF):")
print("Classification Report (TF-IDF):")
print(classification_report(y_test_tfidf, y_pred_tfidf))
tfidf_accuracy = (accuracy_score(y_test_tfidf, y_pred_tfidf)) * 100  # Calculate accuracy
tfidf_f1 = (f1_score(y_test_tfidf, y_pred_tfidf, average='weighted')) * 100  # Calculate F1 score

# Create a DataFrame to consolidate the evaluation metrics (accuracy and F1 score) for the three models: (VADER, Naive Bayes with BoW, and Naive Bayes with TF-IDF).
# Create a dictionary with the model names and their corresponding accuracy and F1 scores
evaluation_metrics = {
    'Model': ['VADER', 'Naive Bayes (BoW)', 'Naive Bayes (TF-IDF)'],
    'Accuracy': [vader_accuracy, bow_accuracy, tfidf_accuracy],
    'F1 Score': [vader_f1, bow_f1, tfidf_f1]
}

# Convert the dictionary into a DataFrame
evaluation_data = pd.DataFrame(evaluation_metrics)

# Define a function to format the numbers as percentages
def format_percentage(value):
    return f'{value:.2f}%'

# Use apply with the custom function for both columns
evaluation_data['Accuracy'] = evaluation_data['Accuracy'].apply(format_percentage)
evaluation_data['F1 Score'] = evaluation_data['F1 Score'].apply(format_percentage)

# print output
evaluation_data

Evaluation for Vader:
VADER Classification Report:
              precision    recall  f1-score   support

    negative       0.32      0.80      0.46        93
    positive       0.98      0.87      0.92      1213

    accuracy                           0.87      1306
   macro avg       0.65      0.83      0.69      1306
weighted avg       0.94      0.87      0.89      1306

Evaluation for Naive Bayes (BoW):
Classification Report (BoW):
              precision    recall  f1-score   support

           0       0.43      0.16      0.23        19
           1       0.94      0.98      0.96       243

    accuracy                           0.92       262
   macro avg       0.68      0.57      0.60       262
weighted avg       0.90      0.92      0.91       262

Evaluation for Naive Bayes (TF-IDF):
Classification Report (TF-IDF):
              precision    recall  f1-score   support

           0       0.00      0.00      0.00        19
           1       0.93      1.00      0.96       243


Unnamed: 0,Model,Accuracy,F1 Score
0,VADER,86.60%,89.04%
1,Naive Bayes (BoW),92.37%,90.70%
2,Naive Bayes (TF-IDF),92.75%,89.26%


#### Summary of Model Evaluations
`The evaluations of the VADER sentiment analysis model and two Naive Bayes models (Bag of Words and TF-IDF) reveal each model’s strengths and limitations in classifying negative and positive sentiments:`

VADER Performance:
- Negative: Precision: 32%, Recall: 80%, F1-Score: 46%.
- Positive: Precision: 98%, Recall: 87%, F1-Score: 92%.
- Overall Accuracy: 87%, Macro F1: 69%, Weighted F1: 89%
- Insight: VADER performs well with positive sentiment but shows lower precision for negative sentiment, leading to misclassifications.
---
Naive Bayes (Bag of Words) Performance:
- Negative: Precision: 43%, Recall: 16%, F1-Score: 23%.
- Positive: Precision: 94%, Recall: 98%, F1-Score: 96%.
- Overall Accuracy: 92%, Macro F1: 60%, Weighted F1: 91%.
- Insight: This model achieves high accuracy and precision with positive sentiment but struggles with negative sentiment due to low recall.
---
Naive Bayes (TF-IDF) Performance:
- Negative: Precision: 0%, Recall: 0%, F1-Score: 0%.
- Positive: Precision: 93%, Recall: 100%, F1-Score: 96%.
- Overall Accuracy: 93%, Macro F1: 48%, Weighted F1: 89%.
- Insight: The TF-IDF model accurately predicts positive sentiment but completely fails to identify negative sentiment, affecting its macro F1 score..
---
`Given that all models accurately identify positive sentiment but show limitations in detecting negative sentiment, likely due to class imbalance of the data, we will train other models such as:`
- Logistics Regression
- Xgboost
- Optionally, conduct hyperparameter tuning for improved performance.

#### Logistics Regression Model.
- Import necessary libraries
- Train the model and make predictions on BoW and TF-IDF
#### XGBoost Model.
- Import necessary libraries
- Train the model and make predictions on BoW and TF-IDF
#### Evaluate All Models.

In [76]:
# Import necessary libraries
from sklearn.linear_model import LogisticRegression
from sklearn.utils import class_weight
import xgboost as xgb

In [78]:
# Initialize, Train, and Predict the Logistic Regression model on BoW
log_reg_bow = LogisticRegression(class_weight='balanced')                    # Intialize
log_reg_bow.fit(X_train_bow, y_train_bow)                                    # Train
log_reg_y_pred_bow = log_reg_bow.predict(X_test_bow)                         # Predict

In [80]:
# Initialize, Train, and Predict the Logistic Regression model on TF-IDF
log_reg_tfidf = LogisticRegression(class_weight='balanced')                  # Initialize
log_reg_tfidf.fit(X_train_tfidf, y_train_tfidf)                              # Train
log_reg_y_pred_tfidf = log_reg_tfidf.predict(X_test_tfidf)                   # Predict

In [82]:
# Calculate scale_pos_weight and Initialize for XGBoost Model
positive_count = 243
negative_count = 19
scale_pos_weight = negative_count / positive_count

# Initialize for XGBoost Model
xgb_bow = xgb.XGBClassifier(scale_pos_weight=scale_pos_weight)
xgb_tfidf = xgb.XGBClassifier(scale_pos_weight=scale_pos_weight)

In [84]:
# Train and Predict the XGBoost model on BoW
xgb_bow.fit(X_train_bow, y_train_bow)                                   # Train
xgb_y_pred_bow = xgb_bow.predict(X_test_bow)                            # Predict

In [86]:
# Train and Predict the XGBoost model on TF-IDF
xgb_tfidf.fit(X_train_tfidf, y_train_tfidf)                           # Train
xgb_y_pred_tfidf = xgb_tfidf.predict(X_test_tfidf)                    # Predict

In [88]:
# Evaluate the models
# VADER Model Prediction
print("Evaluation for Vader:")
print("VADER Classification Report:")
print(vader_classification_report)
vader_accuracy = (accuracy_score(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'])) * 100             # Calculate accuracy - measures how many predictions were correct.
vader_f1 = (f1_score(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'], average='weighted')) * 100     # Calculate F1 score - a weighted average of precision and recall, useful in imbalanced datasets.
vader_classification_report = classification_report(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'])

# Naive Bayes (BoW) Model
print("Evaluation for Naive Bayes (BoW):")
print("Classification Report (BoW):")
print(classification_report(y_test_bow, y_pred_bow))
bow_accuracy = (accuracy_score(y_test_bow, y_pred_bow)) * 100 
bow_f1 = (f1_score(y_test_bow, y_pred_bow, average='weighted')) * 100

# Naive Bayes (TF-IDF) Model
print("Evaluation for Naive Bayes (TF-IDF):")
print("Classification Report (TF-IDF):")
print(classification_report(y_test_tfidf, y_pred_tfidf))
tfidf_accuracy = (accuracy_score(y_test_tfidf, y_pred_tfidf)) * 100  # Calculate accuracy
tfidf_f1 = (f1_score(y_test_tfidf, y_pred_tfidf, average='weighted')) * 100  # Calculate F1 score

# Logistic Regression (BoW) Model
print("Evaluation for Logistic Regression (BoW):")
print("Classification Report (BoW):")
print(classification_report(y_test_bow, log_reg_y_pred_bow))
log_reg_bow_accuracy = (accuracy_score(y_test_bow, log_reg_y_pred_bow)) * 100 
log_reg_bow_f1 = (f1_score(y_test_bow, log_reg_y_pred_bow, average='weighted')) * 100

# Logistic Regression (TF-IDF) Model
print("Evaluation for Logistic Regression (TF-IDF):")
print("Classification Report (TF-IDF):")
print(classification_report(y_test_tfidf, log_reg_y_pred_tfidf))
log_reg_tfidf_accuracy = (accuracy_score(y_test_tfidf, log_reg_y_pred_tfidf)) * 100  # Calculate accuracy
log_reg_tfidf_f1 = (f1_score(y_test_tfidf, log_reg_y_pred_tfidf, average='weighted')) * 100  # Calculate F1 score

# XGBoost (BoW) Model
print("Evaluation for XGBoost (BoW):")
print("Classification Report (BoW):")
print(classification_report(y_test_bow, xgb_y_pred_bow))
xgb_bow_accuracy = (accuracy_score(y_test_bow, xgb_y_pred_bow)) * 100 
xgb_bow_f1 = (f1_score(y_test_bow, xgb_y_pred_bow, average='weighted')) * 100

# XGBoost (TF-IDF) Model
print("Evaluation for XGBoost (TF-IDF):")
print("Classification Report (TF-IDF):")
print(classification_report(y_test_tfidf, xgb_y_pred_tfidf))
xgb_tfidf_accuracy = (accuracy_score(y_test_tfidf, xgb_y_pred_tfidf)) * 100  # Calculate accuracy
xgb_tfidf_f1 = (f1_score(y_test_tfidf, xgb_y_pred_tfidf, average='weighted')) * 100  # Calculate F1 score

# Create a DataFrame to consolidate the evaluation metrics (accuracy and F1 score) for the three models: (VADER, Naive Bayes with BoW, and Naive Bayes with TF-IDF).
# Create a dictionary with the model names and their corresponding accuracy and F1 scores
evaluation_metrics1 = {
    'Model': ['VADER', 'Naive Bayes (BoW)', 'Naive Bayes (TF-IDF)', 'Logistic Regression (BoW)', 'Logistic Regression (TF-IDF)', 'XGBoost (BoW)', 'XGBoost (TF-IDF)'],
    'Accuracy': [vader_accuracy, bow_accuracy, tfidf_accuracy, log_reg_bow_accuracy, log_reg_tfidf_accuracy, xgb_bow_accuracy, xgb_tfidf_accuracy],
    'F1 Score': [vader_f1, bow_f1, tfidf_f1, log_reg_bow_f1, log_reg_tfidf_f1, xgb_bow_f1, xgb_tfidf_f1]
}

# Convert the dictionary into a DataFrame
evaluation_data1 = pd.DataFrame(evaluation_metrics1)

# Define a function to format the numbers as percentages
def format_percentage(value):
    return f'{value:.2f}%'

# Use apply with the custom function for both columns
evaluation_data1['Accuracy'] = evaluation_data1['Accuracy'].apply(format_percentage)
evaluation_data1['F1 Score'] = evaluation_data1['F1 Score'].apply(format_percentage)

# print output
evaluation_data1

Evaluation for Vader:
VADER Classification Report:
              precision    recall  f1-score   support

    negative       0.32      0.80      0.46        93
    positive       0.98      0.87      0.92      1213

    accuracy                           0.87      1306
   macro avg       0.65      0.83      0.69      1306
weighted avg       0.94      0.87      0.89      1306

Evaluation for Naive Bayes (BoW):
Classification Report (BoW):
              precision    recall  f1-score   support

           0       0.43      0.16      0.23        19
           1       0.94      0.98      0.96       243

    accuracy                           0.92       262
   macro avg       0.68      0.57      0.60       262
weighted avg       0.90      0.92      0.91       262

Evaluation for Naive Bayes (TF-IDF):
Classification Report (TF-IDF):
              precision    recall  f1-score   support

           0       0.00      0.00      0.00        19
           1       0.93      1.00      0.96       243


Unnamed: 0,Model,Accuracy,F1 Score
0,VADER,86.60%,89.04%
1,Naive Bayes (BoW),92.37%,90.70%
2,Naive Bayes (TF-IDF),92.75%,89.26%
3,Logistic Regression (BoW),89.31%,88.38%
4,Logistic Regression (TF-IDF),88.93%,88.15%
5,XGBoost (BoW),74.05%,79.99%
6,XGBoost (TF-IDF),72.52%,78.96%


#### Summary of All Models Evaluation:
1. `VADER` performs strongly in positive sentiment classification with a high F1-score of 92%. However, it struggles with negative sentiment, achieving only 46% in F1-score. This imbalance, despite its high overall accuracy (87%), limits VADER's reliability for balanced sentiment classification.

2. `Naive Bayes (BoW)` demonstrates strong positive sentiment detection (F1-score of 96%) but falls short on negative sentiment, with a low recall of 16% and an F1-score of 23%. The model’s high overall accuracy (92%) is influenced primarily by positive sentiment, making it less effective for balanced sentiment tasks.

3. `Naive Bayes (TF-IDF)` has significant limitations in negative sentiment detection, with 0% in precision, recall, and F1-score for negatives. Its high accuracy (93%) reflects strong positive sentiment performance, yet this extreme imbalance makes it unsuitable for contexts requiring reliable detection of both sentiment classes.

4. `Logistic Regression (BoW)` shows balanced but limited performance in detecting both sentiment classes, particularly for negatives, where it achieves only 11% recall and 12% in F1-score. However, it achieves an overall F1-score of 88% due to strong positive sentiment detection. With an accuracy of 89%, Logistic Regression (BoW) provides a more balanced option compared to models like Naive Bayes or XGBoost, though its negative sentiment detection remains limited.

5. `Logistic Regression (TF-IDF)` has similar results to the BoW model, with slightly better recall (11%) and an F1-score of 12% for negatives, along with a strong F1-score of 94% for positives. It holds a solid accuracy of 89%, making it one of the more balanced models for sentiment analysis, though, like the BoW variant, it could benefit from tuning to improve negative sentiment classification.

6. `XGBoost (BoW)` performs well with positive sentiment detection (F1-score of 84%), but its negative sentiment detection remains weak (F1-score of 24%), resulting in a lower overall accuracy of 74%. This shows it may not be the best fit for balanced sentiment analysis.

7. `XGBoost (TF-IDF)`follows a similar trend, with a slight improvement in negative sentiment detection (F1-score of 25%) but an overall accuracy of 73%, indicating a limited balance across sentiment classes.

`Overall Insight:`
Logistic Regression (both BoW and TF-IDF) provides the best balance across both sentiment classes among all models tested, with reasonable accuracy and F1-scores across classes. While VADER achieves strong accuracy, its imbalance between positive and negative sentiment limits its suitability for tasks needing balanced sentiment detection. Naive Bayes and XGBoost models are notably strong in positive sentiment detection but have difficulty reliably identifying negative sentiment, making them less suitable for balanced sentiment classification tasks.

#### Next Steps.
1. Hyperparameter Tuning: Focus on tuning Logistic Regression (BoW vs. TF-IDF) and XGBoost (BoW vs. TF-IDF) to improve detection, particularly for negative sentiment.
2. Final model selection after tuning should be made from;
- VADER (untuned as it’s lexicon-based),
- Logistic Regression (BoW) or Logistic Regression (TF-IDF),
- XGBoost (BoW) or XGBoost (TF-IDF).
3. Deployment: Proceed with the most balanced model after tuning and evaluation for optimal sentiment classification across both positive and negative sentiments.

# Hyperparameter Tuning
To conduct hyperparameter tuning on Logistic Regression (BoW/TF-IDF) and XGBoost (BoW/TF-IDF) models, we'll use GridSearchCV, which helps find the optimal combination of hyperparameters for each model.
- Import the relevant libraries
---
#### Step-by-Step Plan for Hyperparameter Tuning:
`Logistic Regression (BoW and TF-IDF):`
- Hyperparameters to Tune:
  - C: Inverse regularization strength (try different values, e.g., [0.01, 0.1, 1, 10, 100])
  - penalty: Regularization type (e.g., ['l1', 'l2'])
  - solver: Optimization algorithm (e.g., ['liblinear', 'saga'])
  - max_iter: Maximum Iteration (e.g., [100, 200, 300])
---
`XGBoost (BoW and TF-IDF):`
- Hyperparameters to Tune:
  - n_estimators: Number of boosting rounds (try different values, e.g., [50, 100, 150, 200])
  - max_depth: Maximum depth of a tree (try different values, e.g., [3, 5, 7])
  - learning_rate: Step size shrinkage (e.g., [0.01, 0.1, 0.2])
  - subsample: Fraction of samples used for training (e.g., [0.6, 0.8, 1.0])
  - colsample_bytree: Fraction of features used for each tree (e.g., [0.6, 0.8, 1.0])
---
The hyperparameter tuning numbers were selected based on common practices for Logistic Regression and XGBoost, serving as standard starting points for balancing complexity, regularization, and learning behavior. While these values are widely used, they are customizable for different datasets.

In [92]:
# Import Relevant libraries
from sklearn.model_selection import GridSearchCV

In [94]:
# Define parameter grids
# Define simplified parameter grids to reduce the number of combinations and speed up GridSearchCV
log_reg_params = {
    'C': [0.01, 0.1, 1, 10, 100],  # Regularization strength for Logistic Regression; smaller values represent stronger regularization
    'penalty': ['l1', 'l2'],  # Regularization types, 'l1' is Lasso (sparse solutions), 'l2' is Ridge (more regularized solutions)
    'solver': ['liblinear', 'saga'],  # Optimization solvers; 'liblinear' for small datasets, 'saga' for larger, more complex problems
    'max_iter': [100, 200, 300]  # Maximum number of iterations for the solver to converge; higher values ensure convergence for complex models
}

xgb_params = {
    'n_estimators': [50, 100],  # Number of trees (boosting rounds) in XGBoost                              
    'max_depth': [3, 5],  # Maximum depth of trees to control complexity and prevent overfitting                           
    'learning_rate': [0.1],  # Step size shrinkage to make the boosting process more conservative
    'subsample': [0.8],  # Fraction of samples used for training each tree to reduce overfitting  
    'colsample_bytree': [0.8]  # Fraction of features used for building each tree to increase diversity among trees
}


# Initialize models
log_reg_bow = LogisticRegression(class_weight = 'balanced')
log_reg_tfidf = LogisticRegression(class_weight = 'balanced')
xgb_bow = xgb.XGBClassifier(scale_pos_weight = scale_pos_weight)
xgb_tfidf = xgb.XGBClassifier(scale_pos_weight = scale_pos_weight)

#log_bow = LogisticRegression(class_weight='balanced')
#log_tfidf = LogisticRegression(class_weight='balanced')
#xgb_bow = xgb.XGBClassifier(scale_pos_weight=scale_pos_weight)
#xgb_tfidf = xgb.XGBClassifier(scale_pos_weight=scale_pos_weight)

# Use 4-fold cross-validation (cv = 4) instead of 5-fold to reduce the computational cost and time
# GridSearchCV for Logistic Regression (BoW)
grid_log_reg_bow = GridSearchCV(log_reg_bow, log_reg_params, scoring='f1_weighted', cv = 5)  # Perform hyperparameter tuning on Logistic Regression using GridSearchCV for BoW features with F1-weighted scoring and 4-fold CV
grid_log_reg_bow.fit(X_train_bow, y_train_bow)  # Fit the model on the BoW training data and labels

# GridSearchCV for Logistic Regression (TF-IDF)
grid_log_reg_tfidf = GridSearchCV(log_reg_tfidf, log_reg_params, scoring='f1_weighted', cv = 5)  # Perform hyperparameter tuning on Logistic Regression using GridSearchCV for TF-IDF features
grid_log_reg_tfidf.fit(X_train_tfidf, y_train_tfidf)  # Fit the model on the TF-IDF training data and labels

# GridSearchCV for XGBoost (BoW)
grid_xgb_bow = GridSearchCV(xgb_bow, xgb_params, scoring='f1_weighted', cv = 5)  # Perform hyperparameter tuning on XGBoost using BoW features
grid_xgb_bow.fit(X_train_bow, y_train_bow)  # Fit the XGBoost model on the BoW training data

# GridSearchCV for XGBoost (TF-IDF)
grid_xgb_tfidf = GridSearchCV(xgb_tfidf, xgb_params, scoring='f1_weighted', cv = 5)  # Perform hyperparameter tuning on XGBoost using TF-IDF features
grid_xgb_tfidf.fit(X_train_tfidf, y_train_tfidf)  # Fit the XGBoost model on the TF-IDF training data

# Get the best models from the grid search
best_log_reg_bow = grid_log_reg_bow.best_estimator_  # Get the best-tuned Logistic Regression model for BoW
best_log_reg_tfidf = grid_log_reg_tfidf.best_estimator_  # Get the best-tuned Logistic Regression model for TF-IDF
best_xgb_bow = grid_xgb_bow.best_estimator_  # Get the best-tuned XGBoost model for BoW
best_xgb_tfidf = grid_xgb_tfidf.best_estimator_  # Get the best-tuned XGBoost model for TF-IDF

# Evaluate on the test set
log_reg_bow_pred = best_log_reg_bow.predict(X_test_bow)  # Predict using the best Logistic Regression (BoW) model
log_reg_tfidf_pred = best_log_reg_tfidf.predict(X_test_tfidf)  # Predict using the best Logistic Regression (TF-IDF) model
xgb_bow_pred = best_xgb_bow.predict(X_test_bow)  # Predict using the best XGBoost (BoW) model
xgb_tfidf_pred = best_xgb_tfidf.predict(X_test_tfidf)  # Predict using the best XGBoost (TF-IDF) model


# Performance metrics for each model
# VADER Model Prediction
print("Evaluation for Vader:")
print("VADER Classification Report:")
print(vader_classification_report)
vader_accuracy = (accuracy_score(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'])) * 100             # Calculate accuracy - measures how many predictions were correct.
vader_f1 = (f1_score(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'], average='weighted')) * 100     # Calculate F1 score - a weighted average of precision and recall, useful in imbalanced datasets.
vader_classification_report = classification_report(cleaned_ecomm_review['Sentiment'], cleaned_ecomm_review['vader_sentiment'])

# Logistic Regression (BoW) Model
print("Evaluation for Logistic Regression (BoW) after Hyperparameter Tuning:")
print("Classification Report (BoW):")  # Display classification report for Log Reg (BoW) predictions
print(classification_report(y_test_bow, log_reg_bow_pred))  # Print precision, recall, F1 score for each class
log_reg_bow_accuracy = (accuracy_score(y_test_bow, log_reg_bow_pred)) * 100  # Calculate accuracy for Logistic Regression (BoW)
log_reg_bow_f1 = (f1_score(y_test_bow, log_reg_bow_pred, average='weighted')) * 100  # Calculate weighted F1 score for Logistic Regression (BoW)

# Logistic Regression (TF-IDF) Model
print("Evaluation for Logistic Regression (TF-IDF) after Hyperparameter Tuning:")
print("Classification Report (TF-IDF):")  # Display classification report for Log Reg (TF-IDF) predictions
print(classification_report(y_test_tfidf, log_reg_tfidf_pred))  # Print precision, recall, F1 score for each class
log_reg_tfidf_accuracy = (accuracy_score(y_test_tfidf, log_reg_tfidf_pred)) * 100  # Calculate accuracy for Logistic Regression (TF-IDF)
log_reg_tfidf_f1 = (f1_score(y_test_tfidf, log_reg_tfidf_pred, average='weighted')) * 100  # Calculate weighted F1 score for Logistic Regression (TF-IDF)

# XGBoost (BoW) Model
print("Evaluation for XGBoost (BoW) after Hyperparameter Tuning:")
print("Classification Report (BoW):")  # Display classification report for XGB (BoW) predictions
print(classification_report(y_test_bow, xgb_bow_pred))  # Print precision, recall, F1 score for each class
xgb_bow_accuracy = (accuracy_score(y_test_bow, xgb_bow_pred)) * 100  # Calculate accuracy for XGBoost (BoW)
xgb_bow_f1 = (f1_score(y_test_bow, xgb_bow_pred, average='weighted')) * 100  # Calculate weighted F1 score for XGBoost (BoW)

# XGBoost (TF-IDF) Model
print("Evaluation for XGBoost (TF-IDF)after Hyperparameter Tuning:")
print("Classification Report (TF-IDF):")  # Display classification report for XGB (TF-IDF) predictions
print(classification_report(y_test_tfidf, xgb_tfidf_pred))  # Print precision, recall, F1 score for each class
xgb_tfidf_accuracy = (accuracy_score(y_test_tfidf, xgb_tfidf_pred)) * 100  # Calculate accuracy for XGBoost (TF-IDF)
xgb_tfidf_f1 = (f1_score(y_test_tfidf, xgb_tfidf_pred, average='weighted')) * 100  # Calculate weighted F1 score for XGBoost (TF-IDF)

# Create a DataFrame to consolidate the evaluation metrics (accuracy and F1 score).
# Create a dictionary with the model names and their corresponding accuracy and F1 scores
evaluation_metrics2 = {
    'Model': ['VADER', 'Log Reg (BoW)', 'Log Reg (TF-IDF)', 'XGB (BoW)', 'XGB (TF-IDF)'],  # Model names
    'Accuracy': [vader_accuracy, log_reg_bow_accuracy, log_reg_tfidf_accuracy, xgb_bow_accuracy, xgb_tfidf_accuracy],  # Corresponding accuracies
    'F1 Score': [vader_f1, log_reg_bow_f1, log_reg_tfidf_f1, xgb_bow_f1, xgb_tfidf_f1]  # Corresponding F1 scores
}

# Convert the dictionary into a DataFrame
evaluation_data2 = pd.DataFrame(evaluation_metrics2)  # Create a DataFrame for evaluation metrics

# Define a function to format the numbers as percentages
def format_percentage(value):
    return f'{value:.2f}%'

# Use apply with the custom function for both columns
evaluation_data2['Accuracy'] = evaluation_data2['Accuracy'].apply(format_percentage)  # Format accuracy as percentage
evaluation_data2['F1 Score'] = evaluation_data2['F1 Score'].apply(format_percentage)  # Format F1 score as percentage

# Display the evaluation metrics DataFrame
evaluation_data2  # Show the DataFrame with formatted percentages for accuracy and F1 score

Evaluation for Vader:
VADER Classification Report:
              precision    recall  f1-score   support

    negative       0.32      0.80      0.46        93
    positive       0.98      0.87      0.92      1213

    accuracy                           0.87      1306
   macro avg       0.65      0.83      0.69      1306
weighted avg       0.94      0.87      0.89      1306

Evaluation for Logistic Regression (BoW) after Hyperparameter Tuning:
Classification Report (BoW):
              precision    recall  f1-score   support

           0       0.27      0.21      0.24        19
           1       0.94      0.95      0.95       243

    accuracy                           0.90       262
   macro avg       0.60      0.58      0.59       262
weighted avg       0.89      0.90      0.90       262

Evaluation for Logistic Regression (TF-IDF) after Hyperparameter Tuning:
Classification Report (TF-IDF):
              precision    recall  f1-score   support

           0       0.21      0.32   

Unnamed: 0,Model,Accuracy,F1 Score
0,VADER,86.60%,89.04%
1,Log Reg (BoW),90.08%,89.53%
2,Log Reg (TF-IDF),86.64%,87.79%
3,XGB (BoW),70.23%,77.31%
4,XGB (TF-IDF),70.99%,77.86%


#### Model Evaluation Insights After Hyperparameter Tuning
---
`1. VADER:`
- Accuracy: 86.60%
- F1 Score: 89.04%
- **Insight: Strong performance for positive sentiment but struggles with negatives. May not be suitable for balanced tasks.**
---
`2. Logistic Regression (BoW):`
- Accuracy: 90.08%
- F1 Score: 89.53%
- **Insight: Good balance with decent performance on negatives, making it reliable for sentiment classification.**
---
`3. Logistic Regression (TF-IDF):`
- Accuracy: 86.64%
- F1 Score: 87.79%
- **Insight: Stable performance but lower recall for negatives. May not be ideal for balanced tasks.**
---
`4. XGBoost (BoW):`
- Accuracy: 70.23%
- F1 Score: 77.31%
- **Insight: SLower accuracy and F1 scores indicate this model is less reliable for sentiment classification.**
---
`5. XGBoost (TF-IDF):`
- Accuracy: 70.99%
- F1 Score: 77.86%
- **Insight: Similar to the BoW variant, performance is lacking for balanced tasks.**
---

**Best Model Decision**
- Based on the insights from the evaluations:

  - Logistic Regression (BoW) stands out as the best candidate due to its high accuracy (90%) and strong F1 score (90%), indicating good performance across both classes.

**Conclusion**
- **Deploy Logistic Regression (BoW), focusing on its strengths in positive sentiment detection and moderate performance with negative sentiments, making it suitable for tasks requiring a balanced approach to sentiment classification.**

# 6. Model Deployment
#### 1. Flask Deployment Steps
   - Create a project folder.
   - Set up a virtual environment: conda create -p alpha_team_sentiment_analyzer_model_deploy_venv python==3.9 scikit-learn==1.3.0 -y 
   - Activate the virtual environment: conda activate alpha_team_sentiment_analyzer_model_deploy_venv
   - Install required packages: pip install flask scikit-learn pandas numpy
   - Save your trained model and vectorizer using pickle.
   - Create a app.py file with Flask app code.
   - Define the prediction route and load the model/vectorizer in app.py.
   - Run the Flask app:python app.py
   - Test the API using cURL or Postman.
---
#### 2. Streamlit Deployment Steps
   - Create a project folder.
   - Set up a virtual environment (same as with flask).
   - Activate the virtual environment.(same as with flask).
   - Install required packages: pip install streamlit scikit-learn pandas numpy
   - Save your trained model and vectorizer using pickle.
   - Create a streamlit_app.py file with Streamlit app code.
   - Load the model/vectorizer in streamlit_app.py.
   - Define the Streamlit app layout and prediction logic.
   - Run the Streamlit app: streamlit run streamlit_app.py
   - Test the app in a web browser.
---
#### 3. Integrated Guide for Deploying Machine Learning Models with Flask and Streamlit Steps
   - Create a project folder.
   - Set up a virtual environment: conda create -p alpha_team_sentiment_analyzer_model_deploy_venv python==3.9 scikit-learn==1.3.0 -y
   - Activate the virtual environment: conda activate alpha_team_sentiment_analyzer_model_deploy_venv
   - Install required packages: pip install flask streamlit scikit-learn pandas numpy
   - Save your trained model and vectorizer using pickle.
   - Create app.py for Flask deployment.
   - Load model and vectorizer.
   - Define the prediction route.
   - Create streamlit_app.py for Streamlit deployment.
   - Load model and vectorizer.
   - Define the app layout and prediction logic.
   - Run the Flask app: python app.py
   - Run the Streamlit app: streamlit run streamlit_app.py
   - Test both applications using cURL/Postman for Flask and a web browser for Streamlit.

#### Save Trained Model and Vectorizer using pickle 

In [96]:
# import pickle
import pickle

# Saving the Logistic Regression bow Model (after Hyperparameter tuning)
with open('log_reg_bow_model.pkl', 'wb') as model_file:
    pickle.dump(grid_log_reg_bow, model_file)  # Save the trained model

# Saving the Vectorizer for Bag of Words from the feature engineering
with open('vectorizer.pkl', 'wb') as vectorizer_file:
    pickle.dump(bow_vectorizer, vectorizer_file)  # Save the vectorizer