# Motivation / Background

This write-up was inspired by Evan Miller's [excellent article on why average ratings shouldn't be used](http://www.evanmiller.org/how-not-to-sort-by-average-rating.html) when ranking internet comments, ranking items in an online marketplaces, etc. I was curious about the relationship between these "incorrect" ranking approaches and the Wilson Lower Bound he suggested. Essentially, I was curious about the rank-order correlation between these different approaches.

Below, I simulated random interger thumbs up / thumbs down data and computed four different scores:
* Net positive (Positive scores minus negative scores)
* Average rating (thumbs up scores divided by total ratings)
* Positive per Negative (ratio of positive score to negative scores)
* WLB

Next, I constructed a Spearman Rank-Order correlation matrix between these 4 values. Given the random data, it appears they are highly correlated with each other and produce essentially the same rank-ordered lists. This simulation leaves out highly skewed ratings of items (either positively or negatively), so that would be a potential future direction to address and would, given Evan's post, demonstrate how the WLB is superior to the other approches.

# Analysis

In [25]:
#importing modules
import numpy as np
import scipy.stats as st
import pandas as pd
import random
import string
import matplotlib.pyplot as plt

In [26]:
#Creating ratings dataframe
#ID is random alphanumeric variable
#Postive/thumbs-up ratings
#Negative/thumbs-down ratings
ratings = pd.DataFrame({'id' : map(lambda x: ''.join([random.choice(string.ascii_letters + string.digits) for n in xrange(5)]), range(1000)), 
    'positive' : map(lambda x: random.randint(a=0,b=1000), range(1000)),
    'negative' : map(lambda x: random.randint(a=0,b=1000), range(1000))
})

#Incorrect solution 1, positive minus negative
ratings['score_!'] = ratings['positive'] - ratings['negative']

#Incorrect solution 2, positive divided by total
ratings['score_2'] = ratings['positive'] / (ratings['positive'] + ratings['negative'])

#Incorrect (potentially) solution 3, positive divided by negative
ratings['score_3'] = ratings['positive'] / ratings['negative']

In [27]:
#Defining WLB function
def wlb(pos,n,confidence):
    if n == 0:
        return 0
    if n != 0:
        z = float(st.norm.ppf(1-((1-confidence)/2)))
        phat = float(pos)/float(n)
        return (phat + z*z/(2*n) - z * np.sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)


In [28]:
#Obtaining 95% lower bound on proportion of positive ratings
ratings['lower_bound'] = np.vectorize(wlb)(ratings['positive'], ratings['positive']+ratings['negative'], 0.95)

In [29]:
#Viewing data
ratings.head()

Unnamed: 0,id,negative,positive,score_!,score_2,score_3,lower_bound
0,2mrw5,596,1000,404,0.626566,1.677852,0.602558
1,PWHqC,629,982,353,0.609559,1.561208,0.585503
2,9uGlf,735,628,-107,0.460748,0.854422,0.434433
3,toG9w,50,704,654,0.933687,14.08,0.913637
4,dIwIg,317,807,490,0.717972,2.545741,0.690957


In [34]:
#Confirming values are equal
wlb(pos=1000,n=1596,confidence=0.95) #confirming WLB with rating data

0.60255775158611891

In [31]:
#Changing column names for ranking dataframe to differentiate from rating df
rankings = pd.DataFrame.rank(ratings.iloc[:,3:7])

rankings.columns = "rankings_" + rankings.columns

rankings.head()

Unnamed: 0,rankings_score_!,rankings_score_2,rankings_score_3,rankings_lower_bound
0,817.0,708.0,708.0,718.0
1,787.5,686.0,686.0,693.0
2,429.5,447.0,447.0,451.0
3,948.0,967.0,967.0,967.0
4,868.5,801.0,801.0,807.0


# Final data

In [32]:
#Combining ratings and rankings dataframes
final_data = pd.concat([ratings.reset_index(drop=True), rankings], axis=1)

final_data.head()

Unnamed: 0,id,negative,positive,score_!,score_2,score_3,lower_bound,rankings_score_!,rankings_score_2,rankings_score_3,rankings_lower_bound
0,2mrw5,596,1000,404,0.626566,1.677852,0.602558,817.0,708.0,708.0,718.0
1,PWHqC,629,982,353,0.609559,1.561208,0.585503,787.5,686.0,686.0,693.0
2,9uGlf,735,628,-107,0.460748,0.854422,0.434433,429.5,447.0,447.0,451.0
3,toG9w,50,704,654,0.933687,14.08,0.913637,948.0,967.0,967.0,967.0
4,dIwIg,317,807,490,0.717972,2.545741,0.690957,868.5,801.0,801.0,807.0


# Results

In [33]:
#Rank-order correlation matrix between 4 rating approaches
pd.DataFrame(st.spearmanr(final_data.iloc[:,7:11])[0])

Unnamed: 0,0,1,2,3
0,1.0,0.955765,0.955765,0.954331
1,0.955765,1.0,1.0,0.999258
2,0.955765,1.0,1.0,0.999258
3,0.954331,0.999258,0.999258,1.0


# Conclusions

Given the random data I generated, it appears that these 4 approaches yield nearly identical rank-ordered lists (completely identical in the case of score_2 and score_3). One thing to note is that these four ranking approaches may return different rank-orders with items that have highly skewed ratings. Additionally, examining rank-order changes over time (i.e. if any item being ranked undergoes changes in product features) would be interesting to observe as well.