# NLP - Sentiment Analysis for Amazon Product Reviews
# Business Analysis - Understanding Customer's Negative Sentiments

During this data science project we used the following methods and techniques to confirm and verify that predictions toward whey protein currently available for consumers at Amazon's E-commerce website will receive positive reviews:

- A) Bag of words approach (VADER - Valence Aware Dictionary & Sentiment Reasoner)
- B) Roberta Model - Transformer-based model by Hugging Face
- C) BERT Neural Network (multilingual - English, Spanish, French, Italian, German)
- D) Text Blob approach
- E) Naive Bayes Algorithm approach

It's good to know that customers are happy with Amazon's current whey protein products. However, it's important to understand the root cause of negative sentiments towards these supplements as well - with the intention of continuously improving the proteins' quality.

## Prepare dataframes from previous model's results (Except for Naive Bayes Model)

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import math
import matplotlib
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import re

In [2]:
# Read scraped results from Roberta vs Vader CSV
df = pd.read_csv('Whey_Protein_Results_Roberta_vs_Vader.csv')

In [3]:
# Change data type for 'Review' to 'string' & fill empty cells (from CSV) with NA
df['Reviews'] = df['Reviews'].astype('string')
df = df.fillna('NA')
# Drop extra unnamed column
#col_0 = df.columns[0]
#df.drop(col_0, axis = 1, inplace = True)

In [4]:
df.head()

Unnamed: 0,ID,vader_neg,vader_neu,vader_pos,vader_compound,roberta_neg,roberta_neu,roberta_pos,Product_Name,Date,Rating_Score,Reviews,Link,Product_ID
0,0,0.028,0.667,0.305,0.972,0.001776,0.006046,0.992178,NatureWorks-HydroMATE-Electrolytes-Chocolate-C...,2023-01-25,5.0,I love this. I make it for myself and my kids...,https://www.amazon.com/NatureWorks-HydroMATE-E...,B0BRT77ZK8
1,1,0.0,0.89,0.11,0.3818,0.003589,0.056954,0.939457,NatureWorks-HydroMATE-Electrolytes-Chocolate-C...,2023-02-06,5.0,Takes away lightheadedness and makes my husba...,https://www.amazon.com/NatureWorks-HydroMATE-E...,B0BRT77ZK8
2,2,0.0,0.631,0.369,0.906,0.001234,0.008779,0.989986,NatureWorks-HydroMATE-Electrolytes-Chocolate-C...,2023-01-27,5.0,The chocolate tastes delicious! I drink it ev...,https://www.amazon.com/NatureWorks-HydroMATE-E...,B0BRT77ZK8
3,3,0.0,0.618,0.382,0.9146,0.001667,0.006437,0.991896,NatureWorks-HydroMATE-Electrolytes-Chocolate-C...,2023-01-27,5.0,I absolutely love this! My buddy gave me a fe...,https://www.amazon.com/NatureWorks-HydroMATE-E...,B0BRT77ZK8
4,4,0.013,0.873,0.114,0.9866,0.084602,0.33497,0.580428,NatureWorks-HydroMATE-Electrolytes-Chocolate-C...,2023-02-18,4.0,I like to work out regularly. This includes w...,https://www.amazon.com/NatureWorks-HydroMATE-E...,B0BRT77ZK8


In [5]:
# Read scraped results from TextBlob CSV
df2 = pd.read_csv('Whey_Protein_Textblob_Results.csv')

In [6]:
# Change data type for 'Review' to 'string' & fill empty cells (from CSV) with NA
df2['Reviews'] = df2['Reviews'].astype('string')
df2 = df2.fillna('NA')
# Drop extra unnamed column
#col_0 = df2.columns[0]
#df2.drop(col_0, axis = 1, inplace = True)

In [7]:
df2.head()

Unnamed: 0,ID,Product_Name,Date,Rating_Score,textblob_polarity,textblob_subjectivity,textblob_analysis,Reviews,Product_ID
0,0,NatureWorks-HydroMATE-Electrolytes-Chocolate-C...,2023-01-25,5.0,0.45625,0.628125,1,I love this. I make it for myself and my kids...,B0BRT77ZK8
1,1,NatureWorks-HydroMATE-Electrolytes-Chocolate-C...,2023-02-06,5.0,0.0,0.4,0,Takes away lightheadedness and makes my husba...,B0BRT77ZK8
2,2,NatureWorks-HydroMATE-Electrolytes-Chocolate-C...,2023-01-27,5.0,0.5,0.75,1,The chocolate tastes delicious! I drink it ev...,B0BRT77ZK8
3,3,NatureWorks-HydroMATE-Electrolytes-Chocolate-C...,2023-01-27,5.0,0.40625,0.575,1,I absolutely love this! My buddy gave me a fe...,B0BRT77ZK8
4,4,NatureWorks-HydroMATE-Electrolytes-Chocolate-C...,2023-02-18,4.0,0.171118,0.582585,1,I like to work out regularly. This includes w...,B0BRT77ZK8


In [8]:
# Read scraped results from TextBlob CSV
df3 = pd.read_csv('Whey_Protein_Bert_Results.csv')

In [9]:
# Change data type for 'Review' to 'string' & fill empty cells (from CSV) with NA
df3['Reviews'] = df3['Reviews'].astype('string')
df3 = df3.fillna('NA')
# Drop extra unnamed column
#col_0 = df3.columns[0]
#df3.drop(col_0, axis = 1, inplace = True)

In [10]:
df3.head()

Unnamed: 0,ID,Product_Name,Date,Rating_Score,bert_sentiment,Reviews,Product_ID
0,0,NatureWorks-HydroMATE-Electrolytes-Chocolate-C...,2023-01-25,5.0,5,I love this. I make it for myself and my kids...,B0BRT77ZK8
1,1,NatureWorks-HydroMATE-Electrolytes-Chocolate-C...,2023-02-06,5.0,5,Takes away lightheadedness and makes my husba...,B0BRT77ZK8
2,2,NatureWorks-HydroMATE-Electrolytes-Chocolate-C...,2023-01-27,5.0,5,The chocolate tastes delicious! I drink it ev...,B0BRT77ZK8
3,3,NatureWorks-HydroMATE-Electrolytes-Chocolate-C...,2023-01-27,5.0,5,I absolutely love this! My buddy gave me a fe...,B0BRT77ZK8
4,4,NatureWorks-HydroMATE-Electrolytes-Chocolate-C...,2023-02-18,4.0,4,I like to work out regularly. This includes w...,B0BRT77ZK8


## Roberta's Negative Sentiment Analysis

In [11]:
# Roberta - 'Positive' comment when 1-star was given by customer 
roberta_pos_1star = []
for i in range(0, 69):
    result = df.query('Rating_Score == 1') \
        .sort_values('roberta_pos', ascending = False)['Reviews'].values[i]
    roberta_pos_1star.append(result)

In [None]:
print(roberta_pos_1star[0]) # Bad Taste - Vanilla # Bad Taste After

In [None]:
print(roberta_pos_1star[1]) # Nausea # Weird Smell - Peanut Butter

In [None]:
print(roberta_pos_1star[2]) # Expensive

In [None]:
print(roberta_pos_1star[3]) # Excessive Sweet/sugary flavor

In [None]:
print(roberta_pos_1star[5]) # Coockies n cream # bad taste # Nausea

In [None]:
print(roberta_pos_1star[6]) # bad taste 

In [None]:
print(roberta_pos_1star[8]) # rare - different nutrition facts when comparing Amazon's advertising vs. Reality

In [None]:
print(roberta_pos_1star[11]) #bad taste

In [None]:
print(roberta_pos_1star[12]) #bad state

In [None]:
print(roberta_pos_1star[13]) #bad taste

In [None]:
print(roberta_pos_1star[15]) # bad taste

In [None]:
print(roberta_pos_1star[16]) # damaged package

In [None]:
print(roberta_pos_1star[17]) # Bad Texture

In [None]:
print(roberta_pos_1star[18]) # Peanut butter - Bad Taste

In [None]:
print(roberta_pos_1star[19]) # Damaged package

In [None]:
print(roberta_pos_1star[20]) # Quality compromised 

In [None]:
print(roberta_pos_1star[22]) # Damaged package - Seal damaged # 1/2 Empty

In [None]:
print(roberta_pos_1star[23]) # Damaged package - Seal damaged # 1/2 Empty

In [None]:
print(roberta_pos_1star[24]) # Bad taste - recipe change 

In [None]:
print(roberta_pos_1star[25]) # Damaged package - Seal damaged

In [None]:
print(roberta_pos_1star[26]) # rare - different nutrition facts when comparing Amazon's advertising vs. Reality

In [None]:
print(roberta_pos_1star[27]) # Bad Taste - Strawberry n cream

In [None]:
print(roberta_pos_1star[28]) # Damaged package - Seal damaged

In [None]:
print(roberta_pos_1star[30]) # Bad Taste - Strawberry n cream

In [None]:
print(roberta_pos_1star[31]) # Bad Taste - Vanilla

In [None]:
print(roberta_pos_1star[32]) # Damaged package - Seal damaged

In [None]:
print(roberta_pos_1star[33]) # Nausea

In [None]:
print(roberta_pos_1star[34]) # Damaged package - Seal damaged

In [None]:
print(roberta_pos_1star[35]) # Damaged package - Seal damaged

In [None]:
print(roberta_pos_1star[36]) # Damaged package - Seal damaged

In [None]:
print(roberta_pos_1star[37]) # Bad Taste - Vanilla

In [None]:
print(roberta_pos_1star[38]) # Bad Taste # Bad Batch

In [None]:
print(roberta_pos_1star[39]) # Bad Customer Service # return Issue

In [None]:
print(roberta_pos_1star[40]) # Bad Batch

In [None]:
print(roberta_pos_1star[41]) # Bad Taste - Coffee

In [None]:
print(roberta_pos_1star[42]) # Bad Taste # Nausea

In [None]:
print(roberta_pos_1star[43]) # Damaged package - Seal damaged

In [None]:
print(roberta_pos_1star[44]) # Rare - 1/2 empty sealed product

In [None]:
print(roberta_pos_1star[45]) # Damaged package - Seal damaged

In [None]:
print(roberta_pos_1star[46]) # Rare - Unsatisfied customer with scoop's dimensions

In [None]:
print(roberta_pos_1star[47]) # Bad Texture

In [None]:
print(roberta_pos_1star[51]) # Bad Scoop Sizing

In [None]:
print(roberta_pos_1star[52]) # Bad Taste - Vanilla 

In [None]:
print(roberta_pos_1star[53]) # Bad Taste # Bad Texture

In [None]:
print(roberta_pos_1star[54]) # Bad Taste # Bad Texture # quality decrease

In [None]:
print(roberta_pos_1star[55]) # Bad Scoop Sizing

In [None]:
print(roberta_pos_1star[56]) # Bad Taste # Bad Texture

In [None]:
print(roberta_pos_1star[57]) # Bad Taste - Chocolate

In [None]:
print(roberta_pos_1star[58]) # Bad Texture

In [None]:
print(roberta_pos_1star[59]) # Bad Taste

In [None]:
print(roberta_pos_1star[60]) # Bad Texture

In [None]:
print(roberta_pos_1star[61]) # Damaged package - Seal damaged ***

In [None]:
print(roberta_pos_1star[62]) # Bad taste

In [None]:
print(roberta_pos_1star[63]) # Bad Texture

In [None]:
print(roberta_pos_1star[64]) # Bad Texture # Bad Flavor

In [None]:
print(roberta_pos_1star[65]) # Bad Texture # Bad Flavor

In [None]:
print(roberta_pos_1star[66]) # Bad Flavor

In [None]:
print(roberta_pos_1star[67]) # Bad Flavor - Peanut Butter Chocolate

In [None]:
print(roberta_pos_1star[68]) # Bad Aroma # Bad Flavor

In [None]:
# Roberta - 'Negative' comment when 5-star was given by customer 
"""
roberta_neg_5star = []
for i in range(0, 2063):
    result = df.query('Rating_Score == 5') \
        .sort_values('roberta_neg', ascending = False)['Reviews'].values[i]
    roberta_neg_5star.append(result)
"""

In [None]:
# Roberta - 'Negative' comment when 1-star was given by customer
"""
roberta_neg_1star = []
for i in range(0, 69):
    result = df.query('Rating_Score == 1') \
        .sort_values('roberta_neg', ascending = False)['Reviews'].values[i]
    roberta_neg_1star.append(result)
"""

## Vader's Negative Sentiment (for-loop)

In [None]:
# Vader - 'Positive' comment when 1-star was given by customer
"""
vader_pos_1star = []
for i in range(0, 69):
    result = df.query('Rating_Score == 1') \
        .sort_values('vader_pos', ascending = False)['Reviews'].values[i]
    vader_pos_1star.append(result)
"""

In [None]:
# Vader - 'Negative' comment when 5-star was given by customer 
"""
vader_neg_5star = []
for i in range(0, 2063):
    result = df.query('Rating_Score == 5') \
        .sort_values('vader_neg', ascending = False)['Reviews'].values[i]
    vader_neg_5star.append(result)
"""

In [None]:
# Vader - 'Negative' comment when 1-star was given by customer 
"""
vader_neg_1star = []
for i in range(0, 69):
    result = df.query('Rating_Score == 1') \
        .sort_values('vader_neg', ascending = False)['Reviews'].values[i]
    vader_neg_1star.append(result)
"""

## TextBlob's Negative & Neutral Sentiment (for-loop)

In [None]:
# TextBlob - 'Neutral' / Polarity comment given by customer 
"""
textblob_neu_p = []
for i in range(0, 260):
    result = df2.query('textblob_analysis == 0') \
        .sort_values('textblob_polarity', ascending = False)['Reviews'].values[i]
    textblob_neu_p.append(result)
"""

In [None]:
# TextBlob - 'Neutral' / Subjectivity comment given by customer 
"""
textblob_neu_s = []
for i in range(0, 260):
    result = df2.query('textblob_analysis == 0') \
        .sort_values('textblob_subjectivity', ascending = False)['Reviews'].values[i]
    textblob_neu_s.append(result)
"""

In [12]:
# TextBlob - 'Negative' / Polarity comment given by customer 
textblob_neg_p = []
for i in range(0, 338):
    result = df2.query('textblob_analysis == -1') \
        .sort_values('textblob_polarity', ascending = False)['Reviews'].values[i]
    textblob_neg_p.append(result)

In [None]:
print(textblob_neg_p[80]) # Bad Taste

In [None]:
print(textblob_neg_p[300]) # Bad Taste # Bad Texture

In [None]:
print(textblob_neg_p[337]) # Bad Aftertaste

In [None]:
print(textblob_neg_p[37]) # Bad Texture

In [None]:
print(textblob_neg_p[29]) # Shipping Issue - Portion of product is missing

In [None]:
print(textblob_neg_p[215]) # Bad taste - Peanut Butter

In [None]:
# TextBlob - 'Negative' / Subjectivity comment given by customer 
textblob_neg_s = []
for i in range(0, 338):
    result = df2.query('textblob_analysis == -1') \
        .sort_values('textblob_subjectivity', ascending = False)['Reviews'].values[i]
    textblob_neg_s.append(result)

## Bert's Negative & Neutral Sentiment (for-loop)

In [13]:
# Bert - Sentiment when 1-star was given by model
"""
bert_1star = []
for i in range(0, 191):
    result = df3.query('bert_sentiment == 1') \
        .sort_values('Rating_Score', ascending = False)['Reviews'].values[i]
    bert_1star.append(result)
"""

In [14]:
# Bert - Sentiment when 2-star was given by model
"""
bert_2star = []
for i in range(0, 360):
    result = df3.query('bert_sentiment == 2') \
        .sort_values('Rating_Score', ascending = False)['Reviews'].values[i]
    bert_2star.append(result)
"""

In [15]:
# Bert - Sentiment when 3-star was given by model 
"""
bert_3star = []
for i in range(0, 528):
    result = df3.query('bert_sentiment == 3') \
        .sort_values('Rating_Score', ascending = False)['Reviews'].values[i]
    bert_3star.append(result)
"""

# Conclusion 

After reviewing multiple comments labelled as ‘negative’ according to the previously mentioned classification models (except Naïve Bayes’ classifier) , we noticed the most common words or sentences are related with: 
- Bad taste (Peanut Butter and ‘Cookies n Cream’ being the most common flavors)
- Bad texture when mixed with liquids (mostly due to ‘lumps’)
- Instances or occasional cases of ‘bad batches’
- Expensive relative to low quality
- Nausea or stomach aches
- Bad aftertaste
- Excessive sugary or sweet flavors
- Bad aroma
- Quality decrease according to frequent buyers or consumers
- Small or ‘Incorrect’ scoop sizing
- Cases in which the package is damaged – seal is broken (specific issue)

Negative experiences allows businesses to learn from their mistakes and to continuously improve the quality of their products and services.