## Rating Prediction and model performance comparison

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as stats
import statsmodels.formula.api as smf
pd.set_option('display.max_colwidth', 1000)
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix,accuracy_score,f1_score
from sklearn.svm import SVC
from sklearn.svm import LinearSVC
from sklearn.ensemble import RandomForestClassifier

### 1.1 Load the data

In [2]:
# Load data
amreviews = pd.read_csv("amazon-reviews.csv.bz2", sep='\t')
#Viewing the data
amreviews.sample(5)

Unnamed: 0,date,summary,review,rating
31663,2012-05-04,"Cute & contain mess, but not biodegradeable","These were the first table-toppers I purchased and I loved that they helped contain messes and also kept my little one from eating food off of a table. But, I soon encountered a better solutions... there are other table toppers that are biodegradeable (the Pooh ones are plastic). The other brand also is mostly black and white, and my my older kiddo enjoys coloring on them. Overall these type of products are great for early self feeders, but I prefer the ones that will biodegrade.",3
38752,2010-04-07,Love It!!!,"Received this item in the mail today. I will be using it as a sheet saver in anticipation of newborn poopy blowouts. The cloth part of the pad is soft and cute whereas the backing looks to be and claims to be waterproof. It is a great size and a great price. I will have to see how it holds up in the wash but if all goes well, I will probably get a couple more for backup. Love it! :)",5
76347,2012-11-20,Affordable and functional,"This bouncy seat has few frills, but it gets the job done. We've used this chair daily since day one. The vibration is soothing (at four months we've only changed the batteries once) and it bounces easily. She can now even bounce herself when she kicks her legs. I feel like the toy bar is the perfect height (despite another review). When she was old enough to actually grab at things she could reach the toys. It does sit at more of an angle than other brands, but it was never a problem for my newborn. I feel that we will get longer use of it because she isn't completely reclined in it.",5
42857,2011-04-15,She LOVES this player!,"We first received this when she was 4 months old, and now she's 6 months. From day one she was fascinated by the colors and music, and she just LOVES this thing! I would say 9 times out of 10 when she's laying down for a diaper change and she starts to cry, I can hand this to her and she stops. When she's awake she only seems to be truly happy sitting up or standing up...she's not a big fan of just lying there, and especially not a fan of tummy time! So, this little player comes in handy for us a lot! I hope it lasts for a while.",5
89474,2013-07-17,Not like they used to be,"I used to love these wipes. And then something changed. They must have started using a new solution or something, because the smell of them changed (and the new smell is not good). I have stopped buying and am currently looking into using other brands instead.",2


In [3]:
# view dimensions
amreviews.shape

(205331, 4)

### 1.2 Remove missing and empty observations for review and rating

In [4]:
#Check for na values
amreviews.review.isna().sum()

80

In [5]:
#drop na 
amreviews_mod = amreviews.drop(amreviews[amreviews.review.isna()].index).reset_index()

In [6]:
#Check data
amreviews_mod.sample(3)

Unnamed: 0,index,date,summary,review,rating
73615,73643,2013-08-09,Great set!,By far my grandson's favorite toy! He loves someone to stack these so he can knock them over. Although some have gone missing so I see another purchase of these in the near future.,5
84627,84661,2013-04-12,wonderful,"wonderful item. larger and much softer than the standard aden and anais swaddle wraps. somewhat thinner, but not affecting durability. the subtle color and designs are very nice.",5
190054,190124,2017-11-15,Made my nighttime lotion and oil just soak in LOVE THIS,This reminds me of the paraffin treatment that makes dead skin roll off your feet. I LOVE THIS PRODUCT!! My skin has never felt so smooth ever. My face drank up my nighttime lotion and oil. I will buy more of this. WOW product,5


In [7]:
# drop previous index column
amreviews_mod.drop('index', axis = 1, inplace= True)

In [8]:
#Check for empty strings 
np.where(amreviews.review.apply(lambda x: x == ''))

(array([], dtype=int64),)

In [9]:
#Check for value counts for rating
amreviews.rating.value_counts(dropna=False)

5    120434
4     42916
3     21911
2     10939
1      9131
Name: rating, dtype: int64

There are no missing values/empty values for review and rating in the sample.

### 1.3 Create outcome variable

In [10]:
# create outcome variable
amreviews_mod['5_star'] = np.where(amreviews_mod.rating < 5, 0, 1)
amreviews_mod.sample(4)

Unnamed: 0,date,summary,review,rating,5_star
52833,2011-07-08,my baby sitter loved these,my baby sitter loved these diaper liners! I need to get some more as well! Great for diaper changes in cloth!,5,1
11035,2013-01-02,Wasn't as good as I hoped,"I thought I would really like this one, but I found that I was constantly fighting it. I hated washing the cover, because it was such a huge pain to try to get it back on. Also, half of it kept sliding off the bed. It wasn't as comfortable as I hoped, and I couldn't seem to get it positioned correctly to help support my growing belly.",2,0
101566,2013-04-25,"great for swaddling newborns, many uses later on too!","The material is super soft yet not bulky. We swaddled out baby in these until he got too big (around 1.5 month). I love how quickly these dry - I would often hand-wash these and would dry inside the apartment in no time. I know use them as towels, to clean spit ups, as covers for the carseat/stroller, as light stroller blankets, as furniture covers when the babe decides to lounge on couches when we are out and about. They are great for everything!",5,1
188646,2015-10-27,"Silky texture, blur is an accurate description","This gives a cover that really does 'blur' on the skin. The texture is somewhere between a very soft and smooth sand, and silicon. It lays on your skin instead of staining it. If you use any other cosmetic products, I think this should go above and not below. I have olive, slightly tanned skin and this gives a nice blurred highlight to my skin and then, over time, seems to fade back a bit. There is no scent and a very little goes a very long way. You do not need much. I think there is just a bit more pigment in this product than the norm for BB creams. Overall a good product.",5,1


### 1.4 Sample data

In [11]:
#check some sample reviews
np.random.seed(400)
amreviews_mod.sample(7)

Unnamed: 0,date,summary,review,rating,5_star
11811,2006-06-19,This carrier was not for us,"We received this carrier as a gift, so we didn't have much input on the selection. We liked the idea of it being a front or back carrier, however it just doesn't seem to work with our bodies. Our daughter seemed to be carried with her head by my ear when I tried it as a front pack. The plastic side buckles to hold the child in place were not that secure, so when we used it we still held her for fear of them opening up - defeats the purpose of having your hands free! We much rather use our FABULOUS Quattro tour system than this!!!",2,0
173365,2014-02-17,Nice Nail Polish,"I'm not a connoisseur of different brands, but when I did an online search, Essie and OPI came up as as some of the better brands. I chose Essie because of the colors that so many of the other reviews where raving about. However, I liked the color of this nail polish better on the screen than on my nails. But the color is objective, it probably just doesn't go with my skin tone. The quality seems good - I had to put on about 3 thin coats to get a solid color, and if you dry them well in between, it will stay on for at least 5 or 6 days.",3,0
204694,2014-04-17,My favorite pedal - great for metal,"I bought this based on video reviews on the net. My first impressions:To my ears, if the OCD / Ultimate OD simulates ""fat"" power tube distortion, this pedal simulates a lightly distorted tube pre-amp, with a bit of edge, sharpness, the kind of edge the tube amps have. The focus knob seems like a tone knob, maybe more in the upper mids than the treble, but it does the job well.I wish it had a bit more gain. Surprisingly, this pedal with the gain on 10 and the tone on 10, gives me a great metal tone all by itself! If i boost the pedal with say a few db of clean boost, it gets me exactly where I'd want to be (a dry, tight metal tone, kinda 5150ish. And while this isn't the purpose of the pedal, it is how I've ended up using it just because it sounds better than any amp sim and produces less heat than my tube amps. It's tight, its raw, there is no fizz and no boomyness, just a healthy low end. Crank the knobs on this thing to 10 and you have a very aggressive, tight tone that ma...",5,1
127503,2014-06-07,"What we expected, easy installation","We just got this seat yesterday and judging by the few low-star comments I was expecting a nightmare of an installation. It was actually really easy for me and took about 5 minutes. I think people get a confused because there are a few different installation set ups, but you just have to follow the instructions for the one you are doing and ignore the rest.I primarily bought this seat in this color because they do not use flame retardants on the cover. It does have that typical &#34;chemical&#34; smell to it from the packaging and foam inside, but I feel like a lot of that will come off when I wash it. The material is soft and cozy and, yes, the seat is huge, but that's to be expected for a car seat that can turn into a booster and hold 100lbs. We can not fit it rear-facing behind the driver's seat of our mid-sized SUV, and I am only about 5'8. The passenger seat does have to be moved up quite a bit to fit it in, but my wife is small so it's not a big deal. I can see that it would...",4,0
182068,2018-04-16,Gets better as it sits,"Edited:\n\nI originally was not crazy about this formula, but after using it several days, I like it. It is pretty full coverage, and it sits well and doesnt seem to oxidize. It does come off easily to the touch, but overall, I like this for a more full coverage look than their B.B. cream.\n\nOriginal review:\n\nI have been using this brands BB creams for years. While I love those, although they could cover a little better, Im not a big fan of this one. It goes on more difficulty than the BB cream, and I have to use more product too. This foundation also clings to any dry skin illuminating it.\n\nI kept the foundation on for an hour or so and looked at my face. Surprisingly, it seemed to have settled nicely. My dry skin was no longer obvious, and it left me with a nice smooth finish. Overall, this is just okay. It may just be my skin, but I prefer the B.B. cream.",4,0
169332,2013-11-04,Great shampoo overall,"I really like this shampoo. It cleans effectively, smells nice, and lathers well. Not sure if it has done anything to noticeably thicken my hair, but works well as a daily shampoo.",4,0
107421,2014-02-07,Perfect for 4 month old,"I bought these for my child for Christmas as she was just beginning to reach out and feel the different textures of things. This was perfect for her and she still loves them, especially the ball with the rattle inside she loves to shake it and hear the noise she is making with it.",5,1


I feel as a person it would be easy to gauge a 5 star and less than 5 star review. However it might be difficult for the algorithm/s to predict unless we have perfectly glowing review. Take for example this review - "I typically dislike facial sunscreens because they tend to feel oily and heavy."

The words dislike, dont expect etc might lead the algorithm to rate this as a less than 5 star review.

### 1.5 Convert reviews into BOW using Count Vectorizer

In [12]:
# import libaries
import sklearn
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction import stop_words

In [13]:
#drop rating column
amreviews_mod.drop('rating', axis=1, inplace=True)

In [14]:
# convert in BOW
vectorizer = CountVectorizer(stop_words=stop_words.ENGLISH_STOP_WORDS, binary = True)
X = vectorizer.fit_transform(amreviews_mod.review.values)

### 1.6 Here come the models

In [15]:
#set random seed
np.random.seed(894523)

In [16]:
#Splitting data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, amreviews_mod['5_star'], test_size=0.2)

In [17]:
# function that takes supervised learning algorithm, trains the model and prints model accuracy
def model_run(model,X_train, X_test, y_train, y_test):
    model.fit(X_train, y_train)
    prediction = model.predict(X_test)
    print("Model Accuracy:", accuracy_score(prediction, y_test))
    print("Model F1-Score:", f1_score(prediction, y_test))

### SVC with basic linear kernel

In [18]:
model_run(LinearSVC(),X_train, X_test, y_train, y_test)

Model Accuracy: 0.7591776083408444
Model F1-Score: 0.8016293442491372




### SVC with basic polynomial kernel

In [19]:
poly_model = SVC(kernel = 'poly',degree=2, max_iter = 1000)
model_run(poly_model,X_train, X_test, y_train, y_test)



Model Accuracy: 0.5902170470877689
Model F1-Score: 0.7422310756972113


### SVC with rbf kernel

In [20]:
rbf_model = SVC(kernel = 'rbf', max_iter = 1000)
model_run(rbf_model,X_train, X_test, y_train, y_test)



Model Accuracy: 0.5904606465128742
Model F1-Score: 0.7422421194652276


### SVC with sigmoid kernel

In [21]:
sig_model = SVC(kernel = 'sigmoid', max_iter = 1000)
model_run(sig_model,X_train, X_test, y_train, y_test)



Model Accuracy: 0.5904850064553848
Model F1-Score: 0.7422772079903109


### Logistic Regression

In [22]:
model_run(LogisticRegression(),X_train, X_test, y_train, y_test)



Model Accuracy: 0.7757423692480085
Model F1-Score: 0.8163647969360888


### Random Forest Classifier

In [23]:
#Creating a pipeling to create the BOW and then apply Random Forest Classifier
model_run(RandomForestClassifier(),X_train, X_test, y_train, y_test)



Model Accuracy: 0.7137219556161848
Model F1-Score: 0.7547681649346855


### Multinomial NB

In [24]:
model_run(MultinomialNB(alpha=10),X_train, X_test, y_train, y_test)

Model Accuracy: 0.7503593091520304
Model F1-Score: 0.8083052749719417


### Logistic Regression with solver = saga and penalty  range

In [28]:
c = np.arange(0.1,1,0.2)

for pen in c:
    print("For C=", pen)
    model_run(LogisticRegression(solver='saga', C=pen),X_train, X_test, y_train, y_test)
    print()

For C= 0.1
Model Accuracy: 0.78353755085138
Model F1-Score: 0.8242762221167536

For C= 0.30000000000000004
Model Accuracy: 0.7819541545881952
Model F1-Score: 0.822502924904321

For C= 0.5000000000000001
Model Accuracy: 0.7812477162553897
Model F1-Score: 0.8217475882329588

For C= 0.7000000000000001
Model Accuracy: 0.780492558037563
Model F1-Score: 0.8210505411577798

For C= 0.9000000000000001
Model Accuracy: 0.7802245986699471
Model F1-Score: 0.8208214172227519



source - https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html?highlight=logisticregression#sklearn.linear_model.LogisticRegression

### 1.7 Best model amongst chosen ones

For rating classification I chose the following main models: 
- Support Vector Classifier (linear, rbf, polynomial and sigmoid)
- Logistic Regression
- Multinomial Naive Bayes
- Random Forest Classifier

I first ran basic implementations for each and compared performance. Out of all the above models, Logistic Regression performed the best with an accuracy rate of 78% and a F1-score of 82.42%.

I then chose to tune the hyperparameters for Logistic Regression and discovered that I got the best performance when I selected the solver as 'sag' and the regularization term as 0.1. The model performance seemed to degrade with increasing value of C

### 1.8 Model prediction vs manual prediction

In [26]:
model = LogisticRegression(solver='saga', C=0.1)
model.fit(X_train, y_train)
prediction = model.predict(X)

In [27]:
amreviews_mod['prediction'] = prediction
amreviews_mod.iloc[np.where(amreviews_mod['5_star'] != amreviews_mod.prediction)].sample(10)

Unnamed: 0,date,summary,review,5_star,prediction
57697,2012-11-08,Not quite as good as the Graco Jumper Bumper,"Our son is a jumper...he's loved it since he was 4-5 mths old & still loves it at 10mths. We own two of these - this one for home & the Graco Jumper Bumper for the office. Originally - I preferred this one - mainly because of the toy. It spins - my son loves spinny things. We've now removed the toy from the jumper & it's one of his favorite toys (he could care less when he's jumping). But now that he's bigger - 10mths & 22lbs - this one really is not suitable for him. The weight limit I believe is 25lbs, whereas the Graco Jumper Bumper weight limit is supposedly 90 something (which is absurd - no one at 90lbs could fit in that thing, let alone jump) - but still - it means that it comfortably fits him - without me sitting there watching tensely to make sure the spring doesn't fail or the rope doesn't give way. Additionally - the graco jumper bumper has - as the name implies - a bumper - a rubber edge. Not too big of a thing when they're 4-5mths old - but at 10mths my boy can...",0,1
117555,2013-03-10,Cute and functional,Love the looks of this hamper. Great construction. Love that clothes can just be thrown in instead of having to lift a lid while holding a baby.,0,1
97863,2012-02-04,No complaints; feels sturdy and safe for my son,I was torn between the pricier seats and a less expensive option which seems to have similar safety features. I went with the britax because of its great reviews and I have not been disappointed. I had no problem putting it in my car and adjusting it as my son grows up. I would recommend this britax to a friend.,1,0
75410,2014-06-07,Best Bibs Ever,"I don't always use bibs but when I do, I only use these. They are the best bibs ever. The large size and over-the-shoulder design keep messy little eaters relatively clean. They also wash well and true to the manufacturer's claims, get softer with repeated washings. They also stay cute even as they fade. I have never used them as burp cloths but as bibs they more than suffice.",1,0
162832,2014-07-20,Four Stars,Love this color! It's so pretty and coraly. Goes with every skin tone.,0,1
118623,2011-11-08,Battery/Charging Issues? Please read this before you give up.,"I have had the Summer Infant ""BestView"" monitor (an older version of this one; ref ASIN B001NAATW0) for about a year. We have liked it for the most part. The biggest problem is with the battery life.After about 9 or 10 months, it simply wouldn't take or hold a charge. Very frustrating! I was about to trash it and buy another, until I discovered that there is an easy solution. Inside the parent unit are 3 low-performing rechargeable AA batteries. They're cheap Chinese PsOS, and can be replaced with something a lot better. Just unscrew the cover and swap them out! I like the Sanyo Eneloop brand personally, but any higher end Ni-MH rechargeable battery will work. Those will charge right in the unit and should do a much better job keeping a charge for you. Good luck!",1,0
9570,2012-07-10,Not 100% pleased,After just a few washes the multi-colored stitching around the edges are separating on many of the cloths. I've decided to just use these for the diaper bag and get some much plusher and sturdier ones for home.,0,1
177441,2013-04-28,The color is not what I expected,Nice polish but I have used this color in the salon and it actually should go on clear and not a light pink. Not sure if the wrong tag was put on this shellac but I kept it and will make use of it until it is gone.,0,1
13629,2012-10-28,Great value,We received this changing table as a gift. It was easy to put together and is very sturdy. It fits everything we need to diaper on a daily basis.,0,1
132681,2013-05-01,"Very good versatile gate... not quite perfect, but we'd buy it again","This is a very good gate, and we'd definitely buy it again. I'll go through all of the positive and negative points that we have found after buying it and using it for a few weeks.First, let's talk about installation. This was generally quite easy to accomplish with minimal fuss and time required. I installed the whole gate in about an hour from opening the box through completed installation. Considering that this was combined with keeping my twin 1-year-olds safe, happy, and occupied, it should take less than half an hour for most people. Installation consisted of:* Selecting the installation location* Tracing the screw locations on the wall* Drilling holes in the wall* Installing anchors in the holes (I had a plentiful supply of appropriate anchors to use, but you might need some from your local hardware store)* Screwing the anchor plates to the prepared locations using the included screws* Extending the gate* Attaching the gate to the anchors (it's a simple matter of droppi...",0,1


The Logistic Regression model predictions generally correspond to the rating I would assign to the review. It does slip up in the following cases (based on above reviews):
-  When the customer gives an ambivalent review but still goes ahead and gives a 5 star rating to the product.
-  When the customer mentions issues with previous products to provide contrast with the performance/quality of the one they have. 
- When the review isn't bad as such but the customer assigns a lower rating due to varied personal reasons.