# **Rating Product & Sorting Reviews in Amazon**

## Problem <br>
In e-commerce sorting reviews correctly and avoiding the reviews that are not relatable or that can lead misinformation about the products are very important. So, if misleading reviews take the lead in reviews it will effect the e-commerce site in a bad way because customers will not be happy with that situation. The other important point in e-commerce is rating products. We are going to cover these problems with amazon dataset. 

In [1]:
import pandas as pd
import scipy.stats as st
from sklearn.preprocessing import MinMaxScaler

In [2]:
amazon_df = pd.read_csv("amazon_review.csv")
df = amazon_df.copy()
df.head()

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime,day_diff,helpful_yes,total_vote
0,A3SBTW3WS4IQSN,B007WTAJTO,,"[0, 0]",No issues.,4.0,Four Stars,1406073600,2014-07-23,138,0,0
1,A18K1ODH1I2MVB,B007WTAJTO,0mie,"[0, 0]","Purchased this for my device, it worked as adv...",5.0,MOAR SPACE!!!,1382659200,2013-10-25,409,0,0
2,A2FII3I2MBMUIA,B007WTAJTO,1K3,"[0, 0]",it works as expected. I should have sprung for...,4.0,nothing to really say....,1356220800,2012-12-23,715,0,0
3,A3H99DFEG68SR,B007WTAJTO,1m2,"[0, 0]",This think has worked out great.Had a diff. br...,5.0,Great buy at this price!!! *** UPDATE,1384992000,2013-11-21,382,0,0
4,A375ZM4U047O79,B007WTAJTO,2&amp;1/2Men,"[0, 0]","Bought it with Retail Packaging, arrived legit...",5.0,best deal around,1373673600,2013-07-13,513,0,0


## Task1: Calculate Average Rating To Current Reviews and Compare to Average Rating For All Reviews

In [3]:
def check_df(dataframe, count=5):
    print("--------------------- Shape ---------------------")
    print(dataframe.shape)
    print("--------------------- Types ---------------------")
    print(dataframe.dtypes)
    print("--------------------- Head ---------------------")
    print(dataframe.head(count))
    print("--------------------- Tail ---------------------")
    print(dataframe.tail(count))
    print("--------------------- NA ---------------------")
    print(dataframe.isnull().sum())
    print("-------------------- Quantiles ---------------------")
    print(dataframe.describe([0, 0.05, 0.50, 0.95, 0.99, 1]).T)

In [4]:
## All reviews are just one product
df["asin"].value_counts()

B007WTAJTO    4915
Name: asin, dtype: int64

In [5]:
df.loc[df["day_diff"]<100,"overall"].mean()

4.749049429657795

In [6]:
df["overall"].mean()

4.587589013224822

In [7]:
df["day_diff"].quantile(0.25)

281.0

In [8]:
def control_different_timespan(df):
    print("AVERAGE RATINGS FOR DIFFERENT TIMESTAMPS")
    print("The reviews has been made...")
    print(f"First",df["day_diff"].quantile(0.25),"days after getting product.")
    print(df.loc[df["day_diff"]<= df["day_diff"].quantile(0.25),"overall"].mean())
    print(f"Between",df["day_diff"].quantile(0.25),"and",df["day_diff"].quantile(0.50),"days after getting product.")
    print(df.loc[(df["day_diff"]> df["day_diff"].quantile(0.25)) & (df["day_diff"]<= df["day_diff"].quantile(0.50)),"overall"].mean())
    print(f"Between",df["day_diff"].quantile(0.50),"and",df["day_diff"].quantile(0.75),"days after getting product.")
    print(df.loc[(df["day_diff"]> df["day_diff"].quantile(0.50)) & (df["day_diff"]<= df["day_diff"].quantile(0.75)),"overall"].mean())
    print(f"After",df["day_diff"].quantile(0.75),"days after getting product.")
    print(df.loc[df["day_diff"]> df["day_diff"].quantile(0.75),"overall"].mean())

In [9]:
control_different_timespan(df)

AVERAGE RATINGS FOR DIFFERENT TIMESTAMPS
The reviews has been made...
First 281.0 days after getting product.
4.6957928802588995
Between 281.0 and 431.0 days after getting product.
4.636140637775961
Between 431.0 and 601.0 days after getting product.
4.571661237785016
After 601.0 days after getting product.
4.4462540716612375


**Result:** The day difference after getting a product effects the ratings of the product. If you rate your product after ~280 days then your opionon about the product will change little bit and effect your rating values decreasing.

## Task2: Decide 20 top reviews to show on a website or app.

### Step1: Create "helpful_no" variable

* total_vote = up down count
* up = helpful_yes
* helpful_no = total_vote - helpful_yes

In [10]:
df["helpful_no"] = df["total_vote"] - df['helpful_yes']

In [12]:
df["helpful_no"].value_counts()

0      4674
1       175
2        43
3         7
27        2
4         2
6         2
73        1
8         1
10        1
68        1
110       1
183       1
77        1
126       1
14        1
9         1
Name: helpful_no, dtype: int64

### Step2: Add score_pos_neg_diff, score_average_rating and wilson_lower_bound scores to dataframe.

* Define **score_pos_neg_diff**, **score_average_rating** and **wilson_lower_bound** functions to calculate the values.
* Create **score_pos_neg_diff** column while using its funciton to calculate values.
* Create **score_average_rating** column while using its funciton to calculate values.
* Create **wilson_lower_bound** column while using its funciton to calculate values.

In [37]:
import math

def score_pos_neg_diff(up, down):
    return up - down


def score_average_rating(up, down):
    if up + down == 0:
        return 0
    return up / (up + down)

def wilson_lower_bound(up, down, confidence=0.95):
    n = up + down
    if n == 0:
        return 0
    z = st.norm.ppf(1 - (1 - confidence) / 2)
    phat = 1.0 * up / n
    return (phat + z * z / (2 * n) - z * math.sqrt((phat * (1 - phat) + z * z / (4 * n)) / n)) / (1 + z * z / n)

In [38]:
df["score_pos_neg_diff"] = score_pos_neg_diff(df["helpful_yes"],df["helpful_no"])
df.sort_values("score_pos_neg_diff",ascending=False).head(20)


Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime,day_diff,helpful_yes,total_vote,helpful_no,score_pos_neg_diff
2031,A12B7ZMXFI6IXY,B007WTAJTO,"Hyoun Kim ""Faluzure""","[1952, 2020]",[[ UPDATE - 6/19/2014 ]]So my lovely wife boug...,5.0,UPDATED - Great w/ Galaxy S4 & Galaxy Tab 4 10...,1367366400,2013-01-05,702,1952,2020,68,1884
4212,AVBMZZAFEKO58,B007WTAJTO,SkincareCEO,"[1568, 1694]",NOTE: please read the last update (scroll to ...,1.0,1 Star reviews - Micro SDXC card unmounts itse...,1375660800,2013-05-08,579,1568,1694,126,1442
3449,AOEAD7DPLZE53,B007WTAJTO,NLee the Engineer,"[1428, 1505]",I have tested dozens of SDHC and micro-SDHC ca...,5.0,Top of the class among all (budget-priced) mic...,1348617600,2012-09-26,803,1428,1505,77,1351
317,A1ZQAQFYSXL5MQ,B007WTAJTO,"Amazon Customer ""Kelly""","[422, 495]","If your card gets hot enough to be painful, it...",1.0,"Warning, read this!",1346544000,2012-02-09,1033,422,495,73,349
3981,A1K91XXQ6ZEBQR,B007WTAJTO,"R. Sutton, Jr. ""RWSynergy""","[112, 139]",The last few days I have been diligently shopp...,5.0,"Resolving confusion between ""Mobile Ultra"" and...",1350864000,2012-10-22,777,112,139,27,85
4596,A1WTQUOQ4WG9AI,B007WTAJTO,"Tom Henriksen ""Doggy Diner""","[82, 109]",Hi:I ordered two card and they arrived the nex...,1.0,Designed incompatibility/Don't support SanDisk,1348272000,2012-09-22,807,82,109,27,55
1835,A1J6VSUM80UAF8,B007WTAJTO,goconfigure,"[60, 68]",Bought from BestBuy online the day it was anno...,5.0,I own it,1393545600,2014-02-28,283,60,68,8,52
4672,A2DKQQIZ793AV5,B007WTAJTO,Twister,"[45, 49]",Sandisk announcement of the first 128GB micro ...,5.0,Super high capacity!!! Excellent price (on Am...,1394150400,2014-07-03,158,45,49,4,41
4306,AOHXKM5URSKAB,B007WTAJTO,Stellar Eller,"[51, 65]","While I got this card as a ""deal of the day"" o...",5.0,Awesome Card!,1339200000,2012-09-06,823,51,65,14,37
315,A2J26NNQX6WKAU,B007WTAJTO,"Amazon Customer ""johncrea""","[38, 48]",Bought this card to use with my Samsung Galaxy...,5.0,Samsung Galaxy Tab2 works with this card if re...,1344816000,2012-08-13,847,38,48,10,28


In [39]:
df["score_average_rating"] = df.apply(lambda x: score_average_rating(x["helpful_yes"],x["helpful_no"]),axis=1)
df.sort_values("score_average_rating",ascending=False).head(20)


Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime,day_diff,helpful_yes,total_vote,helpful_no,score_pos_neg_diff,score_average_rating
4277,A35KXSU6AD1481,B007WTAJTO,S. Q.,"[1, 1]",I have a galaxy note II and after rooting I no...,5.0,Perfect!!,1355875200,2012-12-19,719,1,1,0,1,1.0
2881,A3VSG5X7GPNNW6,B007WTAJTO,Lou Thomas,"[1, 1]",The Nexus One is listed as supporting a maximu...,5.0,Nexus One Loves This Card!,1349049600,2012-01-10,1063,1,1,0,1,1.0
1073,A2ZXEKQ2OBZLEE,B007WTAJTO,C. Sanchez,"[1, 1]",I used it with my Samsung S4 and it works grea...,5.0,Tons of space for phone,1376352000,2013-08-13,482,1,1,0,1,1.0
445,AIWBDRNBODLEA,B007WTAJTO,"Apache ""Elizabeth""","[1, 1]",This is exactly what I was looking for to upgr...,4.0,Amazon Great Prices,1387324800,2013-12-18,355,1,1,0,1,1.0
3923,A2PH4RGYVR34L,B007WTAJTO,Rock Your Roots,"[1, 1]","It's a SanDisk, so what more is there to say? ...",5.0,What more to say?,1388361600,2013-12-30,343,1,1,0,1,1.0
435,AUH8I22ITG020,B007WTAJTO,Anthony L cate,"[1, 1]",This is working great in my AT&T Galaxy Note. ...,5.0,Love the extra storage,1343088000,2012-07-24,867,1,1,0,1,1.0
2901,A28TRYU3FJ039C,B007WTAJTO,luis,"[1, 1]",Not a good typer or speller :) here is what I ...,5.0,Awesome and fast card :),1368403200,2013-05-13,574,1,1,0,1,1.0
2204,AANX2UN8NPE22,B007WTAJTO,"jbwam ""jbwam""","[1, 1]",I just called Sandisk and they say they have a...,2.0,Sandisk will replace failures due to bad batch...,1371168000,2013-06-14,542,1,1,0,1,1.0
2206,A3KO3964CNP0XN,B007WTAJTO,JCBiker,"[1, 1]",I bought this for my garmin virb action cam. ...,5.0,Great card,1383177600,2013-10-31,403,1,1,0,1,1.0
3408,A20WUUD9EDWY4N,B007WTAJTO,"Neng Vang ""Neng2012""","[1, 1]",Very good card and still working now in my car...,5.0,working no problem,1374710400,2013-07-25,501,1,1,0,1,1.0


In [40]:
df["wilson_lower_bound"] = df.apply(lambda x: wilson_lower_bound(x["helpful_yes"],x["helpful_no"]),axis=1)
df.sort_values("wilson_lower_bound",ascending=False).head(20)

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime,day_diff,helpful_yes,total_vote,helpful_no,score_pos_neg_diff,score_average_rating,wilson_lower_bound
2031,A12B7ZMXFI6IXY,B007WTAJTO,"Hyoun Kim ""Faluzure""","[1952, 2020]",[[ UPDATE - 6/19/2014 ]]So my lovely wife boug...,5.0,UPDATED - Great w/ Galaxy S4 & Galaxy Tab 4 10...,1367366400,2013-01-05,702,1952,2020,68,1884,0.966337,0.957544
3449,AOEAD7DPLZE53,B007WTAJTO,NLee the Engineer,"[1428, 1505]",I have tested dozens of SDHC and micro-SDHC ca...,5.0,Top of the class among all (budget-priced) mic...,1348617600,2012-09-26,803,1428,1505,77,1351,0.948837,0.936519
4212,AVBMZZAFEKO58,B007WTAJTO,SkincareCEO,"[1568, 1694]",NOTE: please read the last update (scroll to ...,1.0,1 Star reviews - Micro SDXC card unmounts itse...,1375660800,2013-05-08,579,1568,1694,126,1442,0.92562,0.912139
317,A1ZQAQFYSXL5MQ,B007WTAJTO,"Amazon Customer ""Kelly""","[422, 495]","If your card gets hot enough to be painful, it...",1.0,"Warning, read this!",1346544000,2012-02-09,1033,422,495,73,349,0.852525,0.818577
4672,A2DKQQIZ793AV5,B007WTAJTO,Twister,"[45, 49]",Sandisk announcement of the first 128GB micro ...,5.0,Super high capacity!!! Excellent price (on Am...,1394150400,2014-07-03,158,45,49,4,41,0.918367,0.808109
1835,A1J6VSUM80UAF8,B007WTAJTO,goconfigure,"[60, 68]",Bought from BestBuy online the day it was anno...,5.0,I own it,1393545600,2014-02-28,283,60,68,8,52,0.882353,0.784651
3981,A1K91XXQ6ZEBQR,B007WTAJTO,"R. Sutton, Jr. ""RWSynergy""","[112, 139]",The last few days I have been diligently shopp...,5.0,"Resolving confusion between ""Mobile Ultra"" and...",1350864000,2012-10-22,777,112,139,27,85,0.805755,0.732136
3807,AFGRMORWY2QNX,B007WTAJTO,R. Heisler,"[22, 25]",I bought this card to replace a lost 16 gig in...,3.0,"Good buy for the money but wait, I had an issue!",1361923200,2013-02-27,649,22,25,3,19,0.88,0.700442
4306,AOHXKM5URSKAB,B007WTAJTO,Stellar Eller,"[51, 65]","While I got this card as a ""deal of the day"" o...",5.0,Awesome Card!,1339200000,2012-09-06,823,51,65,14,37,0.784615,0.670334
4596,A1WTQUOQ4WG9AI,B007WTAJTO,"Tom Henriksen ""Doggy Diner""","[82, 109]",Hi:I ordered two card and they arrived the nex...,1.0,Designed incompatibility/Don't support SanDisk,1348272000,2012-09-22,807,82,109,27,55,0.752294,0.663595


### Step3:Result

Sort first 20 comments to the wilson_lower_bound then make a comment.

In [41]:
df.sort_values("wilson_lower_bound",ascending=False).head(20)

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime,day_diff,helpful_yes,total_vote,helpful_no,score_pos_neg_diff,score_average_rating,wilson_lower_bound
2031,A12B7ZMXFI6IXY,B007WTAJTO,"Hyoun Kim ""Faluzure""","[1952, 2020]",[[ UPDATE - 6/19/2014 ]]So my lovely wife boug...,5.0,UPDATED - Great w/ Galaxy S4 & Galaxy Tab 4 10...,1367366400,2013-01-05,702,1952,2020,68,1884,0.966337,0.957544
3449,AOEAD7DPLZE53,B007WTAJTO,NLee the Engineer,"[1428, 1505]",I have tested dozens of SDHC and micro-SDHC ca...,5.0,Top of the class among all (budget-priced) mic...,1348617600,2012-09-26,803,1428,1505,77,1351,0.948837,0.936519
4212,AVBMZZAFEKO58,B007WTAJTO,SkincareCEO,"[1568, 1694]",NOTE: please read the last update (scroll to ...,1.0,1 Star reviews - Micro SDXC card unmounts itse...,1375660800,2013-05-08,579,1568,1694,126,1442,0.92562,0.912139
317,A1ZQAQFYSXL5MQ,B007WTAJTO,"Amazon Customer ""Kelly""","[422, 495]","If your card gets hot enough to be painful, it...",1.0,"Warning, read this!",1346544000,2012-02-09,1033,422,495,73,349,0.852525,0.818577
4672,A2DKQQIZ793AV5,B007WTAJTO,Twister,"[45, 49]",Sandisk announcement of the first 128GB micro ...,5.0,Super high capacity!!! Excellent price (on Am...,1394150400,2014-07-03,158,45,49,4,41,0.918367,0.808109
1835,A1J6VSUM80UAF8,B007WTAJTO,goconfigure,"[60, 68]",Bought from BestBuy online the day it was anno...,5.0,I own it,1393545600,2014-02-28,283,60,68,8,52,0.882353,0.784651
3981,A1K91XXQ6ZEBQR,B007WTAJTO,"R. Sutton, Jr. ""RWSynergy""","[112, 139]",The last few days I have been diligently shopp...,5.0,"Resolving confusion between ""Mobile Ultra"" and...",1350864000,2012-10-22,777,112,139,27,85,0.805755,0.732136
3807,AFGRMORWY2QNX,B007WTAJTO,R. Heisler,"[22, 25]",I bought this card to replace a lost 16 gig in...,3.0,"Good buy for the money but wait, I had an issue!",1361923200,2013-02-27,649,22,25,3,19,0.88,0.700442
4306,AOHXKM5URSKAB,B007WTAJTO,Stellar Eller,"[51, 65]","While I got this card as a ""deal of the day"" o...",5.0,Awesome Card!,1339200000,2012-09-06,823,51,65,14,37,0.784615,0.670334
4596,A1WTQUOQ4WG9AI,B007WTAJTO,"Tom Henriksen ""Doggy Diner""","[82, 109]",Hi:I ordered two card and they arrived the nex...,1.0,Designed incompatibility/Don't support SanDisk,1348272000,2012-09-22,807,82,109,27,55,0.752294,0.663595


We see the first 5 products has the most comments then wilson lower bound make them in the first row then we see some of the products has less comments then the other products but still they are before in the sorting becuase their helpful comment ratio is better than other products.