<div style="border-radius:10px; border:#D0C2F0 solid; padding: 15px; background-color: #F8F1E8; text-align:left">

<h3 align="left"><font color='#5E5273'>📄 The Story of the Dataset </font></h3>
    <center><img src="https://i.imgur.com/6QiRhzo.png"> </center>
    
*   The data consists of reviews of books by users of the Kindle Store, an e-book service of Amazon.
*   Website : <a href="https://www.amazon.com/Kindle-Store/b?ie=UTF8&node=133140011"> Kindle Store </a>

In [30]:
import pandas as pd
import numpy as np

In [31]:
df = pd.read_csv("/kaggle/input/kindle-reviews/kindle_reviews.csv")
df.head(5)

Unnamed: 0.1,Unnamed: 0,asin,helpful,overall,reviewText,reviewTime,reviewerID,reviewerName,summary,unixReviewTime
0,0,B000F83SZQ,"[0, 0]",5,I enjoy vintage books and movies so I enjoyed ...,"05 5, 2014",A1F6404F1VG29J,Avidreader,Nice vintage story,1399248000
1,1,B000F83SZQ,"[2, 2]",4,This book is a reissue of an old one; the auth...,"01 6, 2014",AN0N05A9LIJEQ,critters,Different...,1388966400
2,2,B000F83SZQ,"[2, 2]",4,This was a fairly interesting read. It had ol...,"04 4, 2014",A795DMNCJILA6,dot,Oldie,1396569600
3,3,B000F83SZQ,"[1, 1]",5,I'd never read any of the Amy Brewster mysteri...,"02 19, 2014",A1FV0SX13TWVXQ,"Elaine H. Turley ""Montana Songbird""",I really liked it.,1392768000
4,4,B000F83SZQ,"[0, 1]",4,"If you like period pieces - clothing, lingo, y...","03 19, 2014",A3SPTOKDG7WBLN,Father Dowling Fan,Period Mystery,1395187200


<div style="border-radius:10px; border:#6B8BA0 solid; padding: 15px; background-color: #F2EADF; text-align:left">

<h3 align="left"><font color='#6B8BA0'>👀 Features: </font></h3>
    

1. **Unnamed: 0:** a second index column, redundancy in data, can be removed.

2. **asin:** This column is the ASIN (Amazon Standard Identification Number), Amazon's system for uniquely identifying products, in short, the product ID.

3. **helpful:** This column indicates whether a review has been found helpful by other users.

4. **overall:** This column contains the score given by the reviewer.

5. **reviewText:** This column contains the review written by the user. 

6. **reviewTime:** This column contains the date the review was written.

7. **reviewerID:** This column contains the ID of the user who wrote the review.

8. **reviewerName:** This column contains the name of the user who wrote the review.

9. **summary:** This column contains a summary of the review.

10. **unixReviewTime:** This column contains the Unix timestamp of the date the review was written.


In [32]:
df = df[["asin","helpful","overall","reviewText","unixReviewTime","reviewerID","reviewerName"]]

In [33]:
df.head()

Unnamed: 0,asin,helpful,overall,reviewText,unixReviewTime,reviewerID,reviewerName
0,B000F83SZQ,"[0, 0]",5,I enjoy vintage books and movies so I enjoyed ...,1399248000,A1F6404F1VG29J,Avidreader
1,B000F83SZQ,"[2, 2]",4,This book is a reissue of an old one; the auth...,1388966400,AN0N05A9LIJEQ,critters
2,B000F83SZQ,"[2, 2]",4,This was a fairly interesting read. It had ol...,1396569600,A795DMNCJILA6,dot
3,B000F83SZQ,"[1, 1]",5,I'd never read any of the Amy Brewster mysteri...,1392768000,A1FV0SX13TWVXQ,"Elaine H. Turley ""Montana Songbird"""
4,B000F83SZQ,"[0, 1]",4,"If you like period pieces - clothing, lingo, y...",1395187200,A3SPTOKDG7WBLN,Father Dowling Fan


In [34]:
df["Date"] = pd.to_datetime(df["unixReviewTime"], unit="s")
df.head()

Unnamed: 0,asin,helpful,overall,reviewText,unixReviewTime,reviewerID,reviewerName,Date
0,B000F83SZQ,"[0, 0]",5,I enjoy vintage books and movies so I enjoyed ...,1399248000,A1F6404F1VG29J,Avidreader,2014-05-05
1,B000F83SZQ,"[2, 2]",4,This book is a reissue of an old one; the auth...,1388966400,AN0N05A9LIJEQ,critters,2014-01-06
2,B000F83SZQ,"[2, 2]",4,This was a fairly interesting read. It had ol...,1396569600,A795DMNCJILA6,dot,2014-04-04
3,B000F83SZQ,"[1, 1]",5,I'd never read any of the Amy Brewster mysteri...,1392768000,A1FV0SX13TWVXQ,"Elaine H. Turley ""Montana Songbird""",2014-02-19
4,B000F83SZQ,"[0, 1]",4,"If you like period pieces - clothing, lingo, y...",1395187200,A3SPTOKDG7WBLN,Father Dowling Fan,2014-03-19


In [35]:
df.shape

(982619, 8)

# <p style="background-color:#F8F1E8; font-family:newtimeroman;color:#602F44; font-size:150%; text-align:center; border-radius: 15px 50px;"> 🙄 Rating Products ⭐</p>

### <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#006600; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #003300">Average of Product Scores</p>

In [36]:
df[df["asin"] == "B000F83SZQ"]

Unnamed: 0,asin,helpful,overall,reviewText,unixReviewTime,reviewerID,reviewerName,Date
0,B000F83SZQ,"[0, 0]",5,I enjoy vintage books and movies so I enjoyed ...,1399248000,A1F6404F1VG29J,Avidreader,2014-05-05
1,B000F83SZQ,"[2, 2]",4,This book is a reissue of an old one; the auth...,1388966400,AN0N05A9LIJEQ,critters,2014-01-06
2,B000F83SZQ,"[2, 2]",4,This was a fairly interesting read. It had ol...,1396569600,A795DMNCJILA6,dot,2014-04-04
3,B000F83SZQ,"[1, 1]",5,I'd never read any of the Amy Brewster mysteri...,1392768000,A1FV0SX13TWVXQ,"Elaine H. Turley ""Montana Songbird""",2014-02-19
4,B000F83SZQ,"[0, 1]",4,"If you like period pieces - clothing, lingo, y...",1395187200,A3SPTOKDG7WBLN,Father Dowling Fan,2014-03-19
5,B000F83SZQ,"[0, 0]",4,A beautiful in-depth character description mak...,1401062400,A1RK2OCZDSGC6R,ubavka seirovska,2014-05-26
6,B000F83SZQ,"[0, 0]",4,I enjoyed this one tho I'm not sure why it's c...,1402358400,A2HSAKHC3IBRE6,Wolfmist,2014-06-10
7,B000F83SZQ,"[1, 1]",4,Never heard of Amy Brewster. But I don't need ...,1395446400,A3DE6XGZ2EPADS,WPY,2014-03-22


In [37]:
df.groupby("asin").agg(mean_rating=("overall", "mean")).reset_index()

Unnamed: 0,asin,mean_rating
0,B000F83SZQ,4.250000
1,B000FA64PA,4.200000
2,B000FA64PK,4.375000
3,B000FA64QO,3.800000
4,B000FBFMVG,4.333333
...,...,...
61929,B00LZFHL7Y,4.750000
61930,B00LZKMXBI,4.813333
61931,B00M029T4O,4.909091
61932,B00M0RE7CS,4.965517


<div style="border-radius:10px; border:#6B8BA0 solid; padding: 15px; background-color: #F2EADF; text-align:left">

<h3 align="left"><font color='#6B8BA0'>💭 Comment: </font></h3>

* A simple sorting method that averages the scores of product reviews has some drawbacks. This method may miss certain trends as it is not capable of detailed analysis. 

### <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#006600; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #003300">Time-Based Weighted Average Score</p>

In [38]:
max_date = df["Date"].max()
max_date

Timestamp('2014-07-23 00:00:00')

In [39]:
df["day_diff"] = (max_date - df["Date"]).dt.days
df.head()

Unnamed: 0,asin,helpful,overall,reviewText,unixReviewTime,reviewerID,reviewerName,Date,day_diff
0,B000F83SZQ,"[0, 0]",5,I enjoy vintage books and movies so I enjoyed ...,1399248000,A1F6404F1VG29J,Avidreader,2014-05-05,79
1,B000F83SZQ,"[2, 2]",4,This book is a reissue of an old one; the auth...,1388966400,AN0N05A9LIJEQ,critters,2014-01-06,198
2,B000F83SZQ,"[2, 2]",4,This was a fairly interesting read. It had ol...,1396569600,A795DMNCJILA6,dot,2014-04-04,110
3,B000F83SZQ,"[1, 1]",5,I'd never read any of the Amy Brewster mysteri...,1392768000,A1FV0SX13TWVXQ,"Elaine H. Turley ""Montana Songbird""",2014-02-19,154
4,B000F83SZQ,"[0, 1]",4,"If you like period pieces - clothing, lingo, y...",1395187200,A3SPTOKDG7WBLN,Father Dowling Fan,2014-03-19,126


In [40]:
def apply_penalty(on:pd.Series, dependency:pd.Series, bins:list, weights:list) -> pd.Series:
    """
    This function applies a penalty on ratings based on a measurement range.

    Parameters:
    - on: pandas Series, representing the ratings.
    - dependency: pandas Series, representing the measurements.
    - bins: list defining dependency ranges, for example: [0, 30, 90, 180]
    - weights: list defining penalties, for example: [100, 98, 95, 90]

    Returns:
    A pandas Series with applied penalties on ratings.
    """
    
    if (len(weights) - len(bins)) != 0: raise "The weight_bins must be same range as day_bins"
    # create weight by days, then multiply
    bins.append(np.inf)
    bins[0] = -np.inf
    
    penalties = pd.cut(dependency, bins=bins, labels=np.array(weights)/100).astype(float)
    point = on * penalties
    
    return point

In [41]:
df["overall_weighted_by_day"] = apply_penalty(on = df["overall"],
                                              dependency = df["day_diff"],
                                              bins = df["day_diff"].quantile([0, 0.25, 0.50, 0.75]).to_list(),
                                              weights = [100, 90, 85, 80])
df.head(5)

Unnamed: 0,asin,helpful,overall,reviewText,unixReviewTime,reviewerID,reviewerName,Date,day_diff,overall_weighted_by_day
0,B000F83SZQ,"[0, 0]",5,I enjoy vintage books and movies so I enjoyed ...,1399248000,A1F6404F1VG29J,Avidreader,2014-05-05,79,5.0
1,B000F83SZQ,"[2, 2]",4,This book is a reissue of an old one; the auth...,1388966400,AN0N05A9LIJEQ,critters,2014-01-06,198,3.6
2,B000F83SZQ,"[2, 2]",4,This was a fairly interesting read. It had ol...,1396569600,A795DMNCJILA6,dot,2014-04-04,110,4.0
3,B000F83SZQ,"[1, 1]",5,I'd never read any of the Amy Brewster mysteri...,1392768000,A1FV0SX13TWVXQ,"Elaine H. Turley ""Montana Songbird""",2014-02-19,154,4.5
4,B000F83SZQ,"[0, 1]",4,"If you like period pieces - clothing, lingo, y...",1395187200,A3SPTOKDG7WBLN,Father Dowling Fan,2014-03-19,126,3.6


In [42]:
df.groupby("asin").agg(mean_rating=("overall", "mean"),
                        weighted_mean_rating=("overall_weighted_by_day", "mean"))

Unnamed: 0_level_0,mean_rating,weighted_mean_rating
asin,Unnamed: 1_level_1,Unnamed: 2_level_1
B000F83SZQ,4.250000,4.037500
B000FA64PA,4.200000,3.570000
B000FA64PK,4.375000,3.750000
B000FA64QO,3.800000,3.240000
B000FBFMVG,4.333333,3.677778
...,...,...
B00LZFHL7Y,4.750000,4.750000
B00LZKMXBI,4.813333,4.813333
B00M029T4O,4.909091,4.909091
B00M0RE7CS,4.965517,4.965517


<div style="border-radius:10px; border:#6B8BA0 solid; padding: 15px; background-color: #F2EADF; text-align:left">

<h3 align="left"><font color='#6B8BA0'>💭 Comment: </font></h3>

* We can capture the current trends if we punish the points given to the product in the recent period with no or little punishment, while punishing the points given in the old periods in a higher way.

### <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#006600; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #003300">User-Based Weighted Average Score</p>

In [43]:
df["reviewerID"].value_counts()

reviewerID
A13QTZ8CIMHHG4    1173
A2WZJDFX12QXKD    1007
A320TMDV6KCFU      847
A3PTWPKPXOG8Y5     789
A1JLU5H1CCENWX     782
                  ... 
A2494KLDHMDPMB       5
AC5JIXKRGKAM8        5
AEVXRGD91MH5U        5
AWWYCN6IBS3UM        5
A1EVY3RDX8GXCL       5
Name: count, Length: 68223, dtype: int64

In [44]:
df["reviewer_count_by_user"] = df.groupby("reviewerID")["reviewerID"].transform("count")
df.head()

Unnamed: 0,asin,helpful,overall,reviewText,unixReviewTime,reviewerID,reviewerName,Date,day_diff,overall_weighted_by_day,reviewer_count_by_user
0,B000F83SZQ,"[0, 0]",5,I enjoy vintage books and movies so I enjoyed ...,1399248000,A1F6404F1VG29J,Avidreader,2014-05-05,79,5.0,11
1,B000F83SZQ,"[2, 2]",4,This book is a reissue of an old one; the auth...,1388966400,AN0N05A9LIJEQ,critters,2014-01-06,198,3.6,67
2,B000F83SZQ,"[2, 2]",4,This was a fairly interesting read. It had ol...,1396569600,A795DMNCJILA6,dot,2014-04-04,110,4.0,22
3,B000F83SZQ,"[1, 1]",5,I'd never read any of the Amy Brewster mysteri...,1392768000,A1FV0SX13TWVXQ,"Elaine H. Turley ""Montana Songbird""",2014-02-19,154,4.5,20
4,B000F83SZQ,"[0, 1]",4,"If you like period pieces - clothing, lingo, y...",1395187200,A3SPTOKDG7WBLN,Father Dowling Fan,2014-03-19,126,3.6,7


In [45]:
df["reviewerID"].value_counts()["A1F6404F1VG29J"]

11

In [46]:
df["reviewer_count_by_user"].quantile([0, 0.30, 0.60, 0.90]).to_list()

[5.0, 10.0, 30.0, 139.0]

In [47]:
df["user_based_weighted_rate"] = apply_penalty(on = df["overall"],
                                              dependency = df["reviewer_count_by_user"],
                                              bins = df["reviewer_count_by_user"].quantile([0, 0.30, 0.60, 0.90]).to_list(),
                                              weights = [80, 90, 95, 100])
df.head(5)

Unnamed: 0,asin,helpful,overall,reviewText,unixReviewTime,reviewerID,reviewerName,Date,day_diff,overall_weighted_by_day,reviewer_count_by_user,user_based_weighted_rate
0,B000F83SZQ,"[0, 0]",5,I enjoy vintage books and movies so I enjoyed ...,1399248000,A1F6404F1VG29J,Avidreader,2014-05-05,79,5.0,11,4.5
1,B000F83SZQ,"[2, 2]",4,This book is a reissue of an old one; the auth...,1388966400,AN0N05A9LIJEQ,critters,2014-01-06,198,3.6,67,3.8
2,B000F83SZQ,"[2, 2]",4,This was a fairly interesting read. It had ol...,1396569600,A795DMNCJILA6,dot,2014-04-04,110,4.0,22,3.6
3,B000F83SZQ,"[1, 1]",5,I'd never read any of the Amy Brewster mysteri...,1392768000,A1FV0SX13TWVXQ,"Elaine H. Turley ""Montana Songbird""",2014-02-19,154,4.5,20,4.5
4,B000F83SZQ,"[0, 1]",4,"If you like period pieces - clothing, lingo, y...",1395187200,A3SPTOKDG7WBLN,Father Dowling Fan,2014-03-19,126,3.6,7,3.2


In [48]:
df.groupby("asin").agg(mean_rating=("overall", "mean"),
                        weighted_mean_rating=("overall_weighted_by_day", "mean"),
                      user_weighted_mean_rating=("user_based_weighted_rate", "mean"))

Unnamed: 0_level_0,mean_rating,weighted_mean_rating,user_weighted_mean_rating
asin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
B000F83SZQ,4.250000,4.037500,3.775000
B000FA64PA,4.200000,3.570000,3.360000
B000FA64PK,4.375000,3.750000,3.500000
B000FA64QO,3.800000,3.240000,3.040000
B000FBFMVG,4.333333,3.677778,3.516667
...,...,...,...
B00LZFHL7Y,4.750000,4.750000,4.296875
B00LZKMXBI,4.813333,4.813333,4.187333
B00M029T4O,4.909091,4.909091,4.531818
B00M0RE7CS,4.965517,4.965517,4.484483


<div style="border-radius:10px; border:#6B8BA0 solid; padding: 15px; background-color: #F2EADF; text-align:left">

<h3 align="left"><font color='#6B8BA0'>💭 Comment: </font></h3>

* If we take into account a user's interest in books and give more weight to his/her comments in the product rating, the opinions of experienced users can be made more important than those of others.

# <p style="background-color:#F8F1E8; font-family:newtimeroman;color:#602F44; font-size:150%; text-align:center; border-radius: 15px 50px;"> ⬆ Sorting Reviews ⬇ </p>

In [49]:
df.head(5)

Unnamed: 0,asin,helpful,overall,reviewText,unixReviewTime,reviewerID,reviewerName,Date,day_diff,overall_weighted_by_day,reviewer_count_by_user,user_based_weighted_rate
0,B000F83SZQ,"[0, 0]",5,I enjoy vintage books and movies so I enjoyed ...,1399248000,A1F6404F1VG29J,Avidreader,2014-05-05,79,5.0,11,4.5
1,B000F83SZQ,"[2, 2]",4,This book is a reissue of an old one; the auth...,1388966400,AN0N05A9LIJEQ,critters,2014-01-06,198,3.6,67,3.8
2,B000F83SZQ,"[2, 2]",4,This was a fairly interesting read. It had ol...,1396569600,A795DMNCJILA6,dot,2014-04-04,110,4.0,22,3.6
3,B000F83SZQ,"[1, 1]",5,I'd never read any of the Amy Brewster mysteri...,1392768000,A1FV0SX13TWVXQ,"Elaine H. Turley ""Montana Songbird""",2014-02-19,154,4.5,20,4.5
4,B000F83SZQ,"[0, 1]",4,"If you like period pieces - clothing, lingo, y...",1395187200,A3SPTOKDG7WBLN,Father Dowling Fan,2014-03-19,126,3.6,7,3.2


In [50]:
df["asin"].value_counts()

asin
B006GWO5WK    1113
B00BTIDW4S     781
B00BT0J8ZS     516
B00JDYC5OI     502
B00H0V069M     481
              ... 
B008OP2ILM       5
B00HSP1NYC       5
B008OKFFP8       5
B008OFFSZA       5
B00HU5BCI2       5
Name: count, Length: 61934, dtype: int64

In [51]:
reviews_df = df[df["asin"] == "B00BTIDW4S"][["helpful","reviewText","reviewerID"]]
reviews_df.head()

Unnamed: 0,helpful,reviewText,reviewerID
461839,"[1, 3]",I really love how this author weaves a story. ...,A1VIR960IWHK07
461840,"[0, 0]",I am totally addicted to the series. It has al...,A3N1DHE2H4N802
461841,"[0, 0]",I love all sorts of romance novels. I have bee...,A1FWKQNETBC9OA
461842,"[0, 0]",Love this series!! They're very addicting.i do...,A3CCJ8DZPZYKRG
461843,"[0, 0]",I don't understand the reviews saying it's an ...,AK5ZLOY2DVRNN


In [52]:
reviews_df.shape

(781, 3)

In [53]:
votes = reviews_df["helpful"].str.replace("[^\d,]","",regex=True).str.split(',', expand=True).rename(columns={0:"helpful_yes",1:"vote_Total"}).astype(int)
votes["helpful_no"] = votes["vote_Total"] - votes["helpful_yes"]
votes

Unnamed: 0,helpful_yes,vote_Total,helpful_no
461839,1,3,2
461840,0,0,0
461841,0,0,0
461842,0,0,0
461843,0,0,0
...,...,...,...
462615,0,0,0
462616,0,1,1
462617,0,0,0
462618,0,0,0


In [54]:
reviews_df = pd.concat([reviews_df,votes],axis=1)
reviews_df

Unnamed: 0,helpful,reviewText,reviewerID,helpful_yes,vote_Total,helpful_no
461839,"[1, 3]",I really love how this author weaves a story. ...,A1VIR960IWHK07,1,3,2
461840,"[0, 0]",I am totally addicted to the series. It has al...,A3N1DHE2H4N802,0,0,0
461841,"[0, 0]",I love all sorts of romance novels. I have bee...,A1FWKQNETBC9OA,0,0,0
461842,"[0, 0]",Love this series!! They're very addicting.i do...,A3CCJ8DZPZYKRG,0,0,0
461843,"[0, 0]",I don't understand the reviews saying it's an ...,AK5ZLOY2DVRNN,0,0,0
...,...,...,...,...,...,...
462615,"[0, 0]","This story wasn't too poorly written, although...",A100ALUJWBXF0Y,0,0,0
462616,"[0, 1]","This is a standard genre storyline, reasonably...",ASB8CT8IJHR28,0,1,1
462617,"[0, 0]",I thought the story line was pretty good but c...,A7AQD6MDJYQX4,0,0,0
462618,"[0, 0]",Wow.This book was so good and engaging that I ...,A3NOFJSTRLQHX5,0,0,0


In [55]:
def score_pos_neg_diff(col1,col2):
    return col1 - col2

def score_average_rating(col1, col2):
    if col1 + col2 == 0:
        return 0
    return col1 / (col1 + col2)

def wilson_lower_bound(up, down, confidence=0.95):
    import scipy.stats as st
    import math
    n = up + down
    if n == 0:
        return 0
    z = st.norm.ppf(1 - (1 - confidence) / 2)
    phat = 1.0 * up / n
    return (phat + z * z / (2 * n) - z * math.sqrt((phat * (1 - phat) + z * z / (4 * n)) / n)) / (1 + z * z / n)

In [56]:
reviews_df["score_pos_neg_diff"] = reviews_df.apply(lambda x: score_pos_neg_diff(x["helpful_yes"],
                                                                        x["helpful_no"]), axis=1)
reviews_df.sort_values(by="score_pos_neg_diff", ascending=False).head(10)

Unnamed: 0,helpful,reviewText,reviewerID,helpful_yes,vote_Total,helpful_no,score_pos_neg_diff
462504,"[110, 122]",I love werewolves. Totally and completely. So ...,A3M933GCAXKYAW,110,122,12,98
462467,"[133, 176]",I debated writing this review since I had alre...,AIGWUMCJQNTGR,133,176,43,90
462140,"[40, 55]",I decided to write this in hopes that it will ...,A2VNIA5OSEDQRQ,40,55,15,25
462210,"[33, 44]",I don't care about throbbing thighs sex scenes...,A1CFPB03CD8ULH,33,44,11,22
462356,"[30, 39]",When I first downloaded this book I was skepti...,ADDVZUA8X585Q,30,39,9,21
462614,"[10, 11]",I had read some really good reviews about this...,A2IEQ2QLPYH8I3,10,11,1,9
462031,"[8, 10]",Maybe I had high hopes for this book because o...,A1EDMVK20UGV3H,8,10,2,6
462341,"[6, 6]",I have to admit that the genre of paranormal r...,A1H7VKBJ4950KL,6,6,0,6
461936,"[6, 8]",If you were fortunate enough to get this book ...,A2JKPYAYGILKOB,6,8,2,4
461934,"[6, 8]",This was the first shapeshifter romance I ever...,A28T6S95YEZQLW,6,8,2,4


<div style="border-radius:10px; border:#6B8BA0 solid; padding: 15px; background-color: #F2EADF; text-align:left">

<h3 align="left"><font color='#6B8BA0'>💭 Comment: </font></h3>
    
 **score_pos_neg_diff (Positive and Negative Score Difference)**
   - Advantages:*
     - A simple and straightforward metric.
     - It only considers the difference between positive and negative scores, which can help to capture a significant **emotion difference**.

   - Disadvantages:*
     - Does not take average reviews into account, which means that all reviews have the same weight.
     - If the scores have a wide range, this metric may be insufficient.

In [57]:
reviews_df["score_average_rating"] = reviews_df.apply(lambda x: score_average_rating(x["helpful_yes"],
                                                                                    x["helpful_no"]), axis=1)
reviews_df.sort_values(by="score_average_rating", ascending=False).head(10)

Unnamed: 0,helpful,reviewText,reviewerID,helpful_yes,vote_Total,helpful_no,score_pos_neg_diff,score_average_rating
462418,"[1, 1]",I wasn't going to read this after reading the ...,A6YTG0ADRKK99,1,1,0,1,1.0
462462,"[1, 1]",Not bad but could have been better. The alpha ...,A2PNMPCEH7LJEL,1,1,0,1,1.0
462099,"[1, 1]",This series must be geared to junior high leve...,A3S42FFFACQSCK,1,1,0,1,1.0
462392,"[1, 1]",I can see why everyone has be raving about thi...,A3T9H9B8GMSAZ,1,1,0,1,1.0
462409,"[1, 1]",I read Bonded first then Betrayed before The M...,A1O0M548ZSH9O4,1,1,0,1,1.0
461924,"[1, 1]",Definitely lived up to expectations I like the...,A1NDPQQT37PT70,1,1,0,1,1.0
462063,"[1, 1]",First time I've read this genre and I'm hooked...,A1M539WPPY2U5S,1,1,0,1,1.0
462061,"[3, 3]",I really enjoyed this book and you can't beat ...,A1OBQ5I31GW1TG,3,3,0,3,1.0
462437,"[1, 1]",Just like the 2 previous law of the lycans boo...,A2Q0IKJ9TZJN4N,1,1,0,1,1.0
462478,"[1, 1]",Nicky has a way of telling her story in a way ...,A1FISOCYK6VCN3,1,1,0,1,1.0


<div style="border-radius:10px; border:#6B8BA0 solid; padding: 15px; background-color: #F2EADF; text-align:left">

<h3 align="left"><font color='#6B8BA0'>💭 Comment: </font></h3>

**score_average_rating (Average Rating)**
   - Advantage:*
     - Approaches each interpretation equally.

   - Disadvantage:*
     - Comments with low engagement may override comments with high engagement.


In [58]:
reviews_df["wilson_lower_bound"] = reviews_df.apply(lambda x: wilson_lower_bound(x["helpful_yes"],
                                                                        x["helpful_no"]), axis=1)
reviews_df.sort_values(by="wilson_lower_bound", ascending=False).head(10)

Unnamed: 0,helpful,reviewText,reviewerID,helpful_yes,vote_Total,helpful_no,score_pos_neg_diff,score_average_rating,wilson_lower_bound
462504,"[110, 122]",I love werewolves. Totally and completely. So ...,A3M933GCAXKYAW,110,122,12,98,0.901639,0.835923
462467,"[133, 176]",I debated writing this review since I had alre...,AIGWUMCJQNTGR,133,176,43,90,0.755682,0.687185
462614,"[10, 11]",I had read some really good reviews about this...,A2IEQ2QLPYH8I3,10,11,1,9,0.909091,0.622642
462356,"[30, 39]",When I first downloaded this book I was skepti...,ADDVZUA8X585Q,30,39,9,21,0.769231,0.616637
462341,"[6, 6]",I have to admit that the genre of paranormal r...,A1H7VKBJ4950KL,6,6,0,6,1.0,0.609666
462210,"[33, 44]",I don't care about throbbing thighs sex scenes...,A1CFPB03CD8ULH,33,44,11,22,0.75,0.605594
462140,"[40, 55]",I decided to write this in hopes that it will ...,A2VNIA5OSEDQRQ,40,55,15,25,0.727273,0.597678
462031,"[8, 10]",Maybe I had high hopes for this book because o...,A1EDMVK20UGV3H,8,10,2,6,0.8,0.490162
462061,"[3, 3]",I really enjoyed this book and you can't beat ...,A1OBQ5I31GW1TG,3,3,0,3,1.0,0.438503
461934,"[6, 8]",This was the first shapeshifter romance I ever...,A28T6S95YEZQLW,6,8,2,4,0.75,0.409275


<div style="border-radius:10px; border:#6B8BA0 solid; padding: 15px; background-color: #F2EADF; text-align:left">

<h3 align="left"><font color='#6B8BA0'>💭 Comment: </font></h3>

**wilson_lower_bound**
   - Advantages:*
     - Since it is based on statistical confidence interval, it eliminates bias.

   - Disadvantages:*
     - It may take longer to calculate large data than others.
    
<center> <img src="https://i.imgur.com/1sBrz6R.png" > </center>