---
layout: post
title: Recommendation Systems
subtitle: Learning about how content gets recommended to me.
imgurl: https://cdn.pixabay.com/photo/2012/12/20/10/12/article-71342_960_720.jpg
imgsource: https://pixabay.com/
category: data science
tags:
    - recommendation systems
---

On almost all the internet platforms I use on a regular basis, I get several recommendations of different kinds of content on the website. I want to learn a little bit about how these platforms come up with the content to recommend to me based on the factors and data they have?

## Why do some recommendation systems feel... "Off"?

Frequently I feel like the recommended content given to me feels out of place, irrelevant, or at times outright random. How could this possibly be?

### Noticing bad VFX

The reason we are upset with bad VFX is because modern VFX has become so effective that we mostly don't even notice when VFX is done well. Many people only notice bad quality VFX as it sticks out like a sore thumb compared to the CG skylines, helicopters, car crashes and explosions that we've been watching and not even noticing they were CG in the first place. [Rocket Jump](https://www.youtube.com/watch?v=bL6hp8BKB24) has a great video detailing this phenomenon. Similarly, jokes about peculiar recommendations coming up in your newsfeed mostly come from when you notice and exclusively remember that one article that may seem out of place hidden in the 50+ articles accurately recommended to you.

In fact, modern recommendation systems are very effective. Verifiably so! [Netflix](https://research.netflix.com/research-area/recommendations) dedicates significant resources to perfecting customized recommendations to all of its nearly [170 million](https://www.statista.com/statistics/250934/quarterly-number-of-netflix-streaming-subscribers-worldwide/) subscribers.

## Kinds of recommendation systems

### Popularity model

### Collaborative filtering

### Content-based filtering

### Knowledge-based recommendations

### Hybrid recommendation methods

## Datasets

- [SNAP](https://snap.stanford.edu/index.html) has comprehensive datasets for all sorts of projects
- [Julian McAuley](https://cseweb.ucsd.edu/~jmcauley/datasets.html) at UCSD has a page filled with several datasets specifically for recommender systems.
- [Kaggle](https://kaggle.com) is a website filled with different kinds of datasets and data science competitions.

Just for fun, let's see if we can learn more about these algorithms by suggesting new [beer](https://www.kaggle.com/rdoume/beerreviews).

In [1]:
import os
import numpy as np
import pandas as pd
from sklearn import preprocessing
from matplotlib import pyplot as plt

In [2]:
data_root = "./data"

beer_reviews = pd.read_csv(os.path.join(data_root, "beer_reviews.csv"))
beer_reviews = beer_reviews.sort_values('review_time')
beer_reviews.head()

Unnamed: 0,brewery_id,brewery_name,review_time,review_overall,review_aroma,review_appearance,review_profilename,beer_style,review_palate,review_taste,beer_name,beer_abv,beer_beerid
564601,33,Berkshire Brewing Company Inc.,840672001,4.0,3.5,3.5,Todd,American Pale Ale (APA),4.0,4.0,Steel Rail Extra Pale Ale,5.3,93
286273,35,Boston Beer Company (Samuel Adams),884390401,4.0,4.0,3.0,Todd,American Strong Ale,4.5,4.5,Samuel Adams Triple Bock,17.5,111
764128,144,Sprecher Brewing Company,884649601,4.5,4.0,4.0,BeerAdvocate,Vienna Lager,4.0,4.0,Special Amber,5.0,97
1417077,139,Shipyard Brewing Co.,885340801,4.0,3.5,3.0,BeerAdvocate,English Pale Ale,3.5,4.0,Tremont Ale,4.8,51
1029414,138,Shepherd Neame Ltd,885427201,1.0,3.0,3.0,BeerAdvocate,Irish Dry Stout,1.0,1.0,Casey's Smooth Stout,,306


This data details users and beer ratings. Let's implement each recommender system and see how effective each SINGLE method is. Then, we'll see if we can combine multiple systems together to make a hybrid recommendation system that is more even more accurate. [Hybrid recommender systems](https://arxiv.org/abs/1901.03888) have recently become the norm in the industry.

In [67]:
df = (
    beer_reviews[['review_overall', 'review_profilename', 'beer_beerid']]
    .groupby(['review_profilename', 'beer_beerid'])
    .mean()
    .reset_index()
)
df.head()

Unnamed: 0,review_profilename,beer_beerid,review_overall
0,0110x011,23,3.5
1,0110x011,39,5.0
2,0110x011,195,4.0
3,0110x011,459,4.5
4,0110x011,599,4.0


In [106]:
def to_set(df):
    return set(df["beer_beerid"])

def evaluate(df, test_df, model, recall = 5, sample = 100):
    interacted_beers = df[["review_profilename", "beer_beerid"]].groupby("review_profilename").apply(to_set)

In [107]:
class PopularRecommender:
    def __init__(self):
        self.trained = False
    
    def train(self, df):
        self.topN = (
            df.groupby("beer_beerid")
            .sum()
            .sort_values("review_overall", ascending = False)
            .reset_index()
        )
        self.trained = True
        
    def recommend(self, user, items, N = 5):
        if not self.trained:
            raise Exception("Model has not trained.")
        
        return self.topN.iloc[:N]

In [108]:
P = PopularRecommender()
P.train(df)
P.recommend("011x011", None)

Unnamed: 0,beer_beerid,review_overall
0,2093,13287.5
1,412,12674.583333
2,1904,12206.0
3,1093,11381.833333
4,7971,11167.5


In [109]:
evaluate(df, df, P)