# New Beer Recommendation System

The goal of this recommendation system is to recommend potential customers for a new product. In this case the new product is a new beer that we’ll define with certain attributes such as Style (India Pale Ale, Belgian Ale) and the Alcohol by Volume (ABV). In order to achieve this, I’m going to be using a Dataset that was scraped from the RateBeer website that contains User Reviews for all sorts of beers. This dataset contains ratings from April 2000 to November 2011.

## Loading + Pre Processing the Data

In [154]:
# Import statements
import pandas as pd
import numpy as np

In [179]:
# load our RateBeer dataset that has filtered out columns
beer_ratings_df = pd.read_csv( './reduced_ratebeer_dataset.csv' )

In [156]:
beer_ratings_df.head()


Unnamed: 0,name,beerId,brewerId,ABV,style,overall,profileName
0,John Harvards Simcoe IPA,63836,8481,5.4,India Pale Ale &#40;IPA&#41;,13,hopdog
1,John Harvards Simcoe IPA,63836,8481,5.4,India Pale Ale &#40;IPA&#41;,13,TomDecapolis
2,John Harvards Cristal Pilsner,71716,8481,5.0,Bohemian Pilsener,14,PhillyBeer2112
3,John Harvards Fancy Lawnmower Beer,64125,8481,5.4,K�lsch,8,TomDecapolis
4,John Harvards Fancy Lawnmower Beer,64125,8481,5.4,K�lsch,8,hopdog


In [89]:
beer_ratings_df.dtypes

name           object
beerId         object
brewerId        int64
ABV            object
style          object
overall         int64
profileName    object
dtype: object

In [90]:
# encode category values to codes
#beer_ratings_df["style"] = beer_ratings_df["style"].astype('category')
#beer_ratings_df["style_cat"] = beer_ratings_df["style"].cat.codes

beer_ratings_df["brewerId"] = beer_ratings_df["brewerId"].astype('category')
#beer_ratings_df["brewerId_cat"] = beer_ratings_df["brewerId"].cat.codes

beer_ratings_df["beerId"] = beer_ratings_df["beerId"].astype('category')
beer_ratings_df["beerId_cat"] = beer_ratings_df["beerId"].cat.codes

beer_ratings_df["profileName"] = beer_ratings_df["profileName"].astype('category')
beer_ratings_df["userId"] = beer_ratings_df["profileName"].cat.codes

# drop entries that do not specify Alcohol by Volume
beer_ratings_df = beer_ratings_df[beer_ratings_df.ABV != '-'] 

In [91]:
beer_ratings_df.head()

Unnamed: 0,name,beerId,brewerId,ABV,style,overall,profileName,beerId_cat,userId
0,John Harvards Simcoe IPA,63836,8481,5.4,India Pale Ale &#40;IPA&#41;,13,hopdog,105412,18486
1,John Harvards Simcoe IPA,63836,8481,5.4,India Pale Ale &#40;IPA&#41;,13,TomDecapolis,105412,10465
2,John Harvards Cristal Pilsner,71716,8481,5.0,Bohemian Pilsener,14,PhillyBeer2112,106618,8190
3,John Harvards Fancy Lawnmower Beer,64125,8481,5.4,K�lsch,8,TomDecapolis,105458,10465
4,John Harvards Fancy Lawnmower Beer,64125,8481,5.4,K�lsch,8,hopdog,105458,18486


In [92]:
userId_to_Name = beer_ratings_df[['userId','profileName']].drop_duplicates()
userId_to_Name.head()

Unnamed: 0,userId,profileName
0,18486,hopdog
1,10465,TomDecapolis
2,8190,PhillyBeer2112
7,5269,JFGrind
8,16573,egajdzis


In [93]:
beers_info =  beer_ratings_df[['name', 'beerId_cat','brewerId', 'ABV', 'style']].drop_duplicates()

In [94]:
beers_info.head()

Unnamed: 0,name,beerId_cat,brewerId,ABV,style
0,John Harvards Simcoe IPA,105412,8481,5.4,India Pale Ale &#40;IPA&#41;
2,John Harvards Cristal Pilsner,106618,8481,5.0,Bohemian Pilsener
3,John Harvards Fancy Lawnmower Beer,105458,8481,5.4,K�lsch
7,John Harvards Grand Cru,106619,8481,7.0,Belgian Ale
10,John Harvards Belgian Tripel,106617,8481,8.5,Abbey Tripel


## Taking a Sample to Prototype On

In [99]:
# lets take a sample of the dataframe for prototyping the final model
sample_df = beer_ratings_df.sample(1000)

In [100]:
beer_user_ratings = sample_df.pivot_table(index='userId', columns='beerId_cat', values='overall') 
beer_user_ratings = beer_user_ratings.fillna(0)

In [101]:
import numpy as np
def als(R, K, steps, alpha, beta):
    m,n = R.shape
    K = K
    P = np.random.rand(m, K)
    Q = np.random.rand(K, n)
    steps = steps
    alpha = alpha
    beta = beta

    for step in range(steps):
        if(step%50 == 0):
            print("Step: {}".format(step))
            final_ratings = P.dot(Q)
            print(final_ratings)
        for i in range(m):
            for j in range(n):
                if R[i][j] > 0:
                    e_ij = R[i][j] - np.dot(P[i,:], Q[:,j])
                    for k in range(K):
                        P[i][k] = P[i][k] + alpha * (2 * e_ij * Q[k][j] - beta * P[i][k])
                        Q[k][j] = Q[k][j] + alpha * (2 * e_ij * P[i][k] - beta * Q[k][j])

        eR = np.dot(P,Q)
        e = 0
        for i in range(len(R)):
            for j in range(len(R[i])):
                if R[i][j] > 0:
                    e = e + pow(R[i][j] - np.dot(P[i,:],Q[:,j]), 2)
                    for k in range(K):
                        e = e + (beta/2) * ( pow(P[i][k],2) + pow(Q[k][j],2) )
        if(step%50 == 0):
            print("Error: {}".format(e))
            print("----------")
        if e < 0.001:
            break
            
    return P,Q

In [102]:
P,Q = als(beer_user_ratings.values, 2, 500, 0.005, 0.25)

final_ratings = P.dot(Q)
print(final_ratings)

Step: 0
[[0.57400724 0.33415212 0.89370092 ... 1.03621195 1.22846436 1.04546714]
 [0.50236198 0.26486559 0.74230071 ... 0.73572104 0.90598436 0.71551151]
 [0.25780486 0.11834174 0.35552962 ... 0.26843874 0.35709503 0.24001833]
 ...
 [0.0920485  0.098519   0.2082452  ... 0.445028   0.47258774 0.49263635]
 [0.25602682 0.16672086 0.42416553 ... 0.57189304 0.65635768 0.59416703]
 [0.29114682 0.17244753 0.45757781 ... 0.54395232 0.64125083 0.55168466]]
Error: 165181.40333941072
----------
Step: 50
[[ 3.39186392  3.40737683  4.517768   ...  4.90540231  4.60256333
   4.63455438]
 [10.34581797  9.66198159 13.50176547 ... 14.0423121  12.81303644
  13.27531316]
 [10.13968246  7.87608122 12.62632027 ... 11.75741099  9.88672613
  11.1345721 ]
 ...
 [ 8.91919114  8.01506485 11.52022348 ... 11.71008598 10.51884948
  11.07429633]
 [12.23297778  9.52372473 15.24121494 ... 14.21193556 11.96410732
  13.4587623 ]
 [12.02042502 10.56909276 15.43722944 ... 15.48872133 13.78598115
  14.65069801]]
Error: 342

For this experiment we will be analyzing the ratings for a particular user,  "mribm" 

In [161]:
mr_ibm_user_ratings = final_ratings[608] 
for beerIndex in reversed(np.argsort(mr_ibm_user_ratings)[-10:]):
    beerId = beer_user_ratings.columns.values[beerIndex]
    print("Predicted Score for Beer with Id ({}) : {}\n\n".format(beerId, mr_ibm_user_ratings[beerIndex]))

Predicted Score for Beer with Id (85094) : 20.918637532323952


Predicted Score for Beer with Id (44870) : 20.237246742111445


Predicted Score for Beer with Id (20681) : 20.120632737746007


Predicted Score for Beer with Id (67716) : 19.924784703813692


Predicted Score for Beer with Id (31410) : 19.570419198464386


Predicted Score for Beer with Id (3017) : 19.391613131558483


Predicted Score for Beer with Id (101) : 19.239391783546225


Predicted Score for Beer with Id (30877) : 19.102875527738355


Predicted Score for Beer with Id (81231) : 19.02147547684467


Predicted Score for Beer with Id (40328) : 19.00841171436849




# Recommending a New Beer to Users in Our Current Set

By using ALS we were able to get predict what the top beers for a user are. Now given the top 20 beers we can now check what the properties are for each of the beers.

In [162]:
# lets get the predictions for the first user
def print_beer_properties(beerId):
    print("Beer Id: {}".format(beerId))
    
    beerName = beers_info.loc[beers_info['beerId_cat'] == beerId]['name'].values[0]
    abv =  beers_info.loc[beers_info['beerId_cat'] == beerId]['ABV'].values[0]
    brewerdId = beers_info.loc[beers_info['beerId_cat'] == beerId]['brewerId'].values[0]
    style  = beers_info.loc[beers_info['beerId_cat'] == beerId]['style'].values[0]
    
    print("Beer Name: {}".format(beerName))
    print("ABV: {}".format(abv))
    print("BrewerID: {}".format(brewerdId))
    print("Style: {}".format(style))
    
for beerIndex in reversed(np.argsort(mr_ibm_user_ratings)[-20:]):
    beerId = beer_user_ratings.columns.values[beerIndex]
    print_beer_properties(beerId)
    print("Predicted Score for Beer with Id ({}) : {}".format(beerId, mr_ibm_user_ratings[beerIndex]))
    print("-"*10)


Beer Id: 85094
Beer Name: Cigar City Hunahpu�s Imperial Stout - Whiskey Barrel Aged
ABV: 11
BrewerID: 9990
Style: Imperial Stout
Predicted Score for Beer with Id (85094) : 20.918637532323952
----------
Beer Id: 44870
Beer Name: Valley Brew Decadence 12 Cuvee Speciale
ABV: 13
BrewerID: 3490
Style: Abt/Quadrupel
Predicted Score for Beer with Id (44870) : 20.237246742111445
----------
Beer Id: 20681
Beer Name: Valley Brew Uberhoppy Imperial IPA
ABV: 9.5
BrewerID: 3490
Style: Imperial/Double IPA
Predicted Score for Beer with Id (20681) : 20.120632737746007
----------
Beer Id: 67716
Beer Name: N�rrebro Imperial Sk�rgaards Porter Cabernet Barrel
ABV: 9
BrewerID: 3992
Style: Imperial/Strong Porter
Predicted Score for Beer with Id (67716) : 19.924784703813692
----------
Beer Id: 31410
Beer Name: Pizza Port Hop Suey Double IPA
ABV: 9
BrewerID: 1538
Style: Imperial/Double IPA
Predicted Score for Beer with Id (31410) : 19.570419198464386
----------
Beer Id: 3017
Beer Name: Westvleteren 12
ABV: 10

So we have the properties for the top 20 beers that "mribm" likes. In this case we can get an idea that "mribm" likes beers where the ABV is relatively high since the average ABV for these beers was around 9.5. Also there is some repetition in the Style of Beer and BrewerIds suggesting that the"mribm" might have an affinity for beers of certain styles and from certain brewers. 

So let's say we're a Brewer and we're trying to introduce a new beer to the market. We have a certain BrewerId, our beer has a certain Style, and our ABV is a certain percentage. Based on these properties I will score how often our new beer lines up with that of the top 10 beers. 


We'll go through each of the top 10 beers and check for the following input beer properties.

- For our provided ABV, if it is within a certain range from the ABV of the top beer we add 1 point
- For our provided BrewerID if it matches up we add 1 point
- For our provided Style, if it matches up we add 1 point

We'll multiply the total above by the predicted score out of 20. That way if the user was predicted to give a 20/20 for a beer they'll retain the full score and likewise if they were predicted to give the beer a 10/20 they would retain half the score.


We run this through all of the top 20 beers and sum the scores from each of them to get our final "likeability" score based on the input beer properties. This will give us an indicator of how likely this user would like our New Beer. 

We do the same for the rest of the users in our dataset and get the top users with the highest scores and they would be our potential customers for our new beer. 

In [171]:
# lets say we're introducing our new beer with the following properties
new_beer_brewerId = 3490
new_beer_style = 'Abt/Quadrupel'
new_beer_abv = 12

In [172]:
# implementation for the scoring system mentioned above
def get_likeability_score(user_ratings, input_brewerId, input_style, input_abv, abv_threshold=1):
    final_score = 0
    for beerIndex in reversed(np.argsort(user_ratings)[-20:]):
        # get beer id
        beerId = beer_user_ratings.columns.values[beerIndex]
        
        # get beer properties
        abv =  float(beers_info.loc[beers_info['beerId_cat'] == beerId]['ABV'].values[0])
        brewerId = beers_info.loc[beers_info['beerId_cat'] == beerId]['brewerId'].values[0]
        style  = beers_info.loc[beers_info['beerId_cat'] == beerId]['style'].values[0]
        
        # check if abv is within certain range
        
        if (abv - abv_threshold) <= input_abv <= (abv + abv_threshold):
            final_score += 1
        
        if brewerId == input_brewerId:
            final_score += 1
            
        if input_style in style:
            final_score += 1
        
    return final_score*user_ratings[beerIndex]/20
        

In [173]:
# lets get the likeability score for our friend "mribm"
get_likeability_score(mr_ibm_user_ratings, 3490, 'Abt/Quadrupel', 12)

10.213219018348461

Now let's get the likeability score for all the users.

In [174]:
final_user_likeability_scores = []

for i in range(0, len(final_ratings)):
    
    # get user name
    userId = beer_user_ratings.index.values[i]
    userName = userId_to_Name.loc[userId_to_Name['userId'] == userId]['profileName'].values[0]
    #print(str(i) + " " + userName)
    # get user ratings 
    user_ratings = final_ratings[i]
    
    # get likeability score from user_ratings
    likeability_score = get_likeability_score(user_ratings, new_beer_brewerId, new_beer_style, new_beer_abv)
    
    final_user_likeability_scores.append({"user": userName, "likeability_score": likeability_score})

In [176]:
sorted_scores = sorted(final_user_likeability_scores, key=lambda k: k['likeability_score'], reverse=True)


In [177]:
sorted_scores[:25]

[{'user': 'mribm', 'likeability_score': 10.213219018348461},
 {'user': 'Leighton', 'likeability_score': 9.718911921009681},
 {'user': 'hophead75', 'likeability_score': 9.693727438049795},
 {'user': 'lampeno420', 'likeability_score': 9.433628918755632},
 {'user': 'lusikka', 'likeability_score': 9.404424775458697},
 {'user': 'LilKem', 'likeability_score': 9.25259159169947},
 {'user': 'Hammy78', 'likeability_score': 9.202447052065768},
 {'user': 'OldGrowth', 'likeability_score': 9.198000006803122},
 {'user': 'wxman', 'likeability_score': 9.115318448508685},
 {'user': 'Defreni', 'likeability_score': 9.113605455405073},
 {'user': 'mikenodak', 'likeability_score': 9.10356550023791},
 {'user': 'gibson584', 'likeability_score': 9.070573333655162},
 {'user': 'mmcdowell3', 'likeability_score': 8.991859462224328},
 {'user': 'Aquilo', 'likeability_score': 8.981681344835614},
 {'user': 'MaliMisho', 'likeability_score': 8.932750340123166},
 {'user': 'fata2683', 'likeability_score': 8.93249534144295}

Based on the likeability scores now we have our top potential customers for our new beer. These would be the people we should try to market our beer to.