# Recommendations Using the Mean

## Imports

In [1]:
import numpy as np
import pandas as pd

## Load Data

Only loading a subset of the original data set for proof of concept reasons.

In [2]:
# 80/20 split earlier
df_train = pd.read_csv('../Data/training_data_subset.csv')
df_test = pd.read_csv('../Data/testing_data_subset.csv')

In [3]:
df_train.head(2)

Unnamed: 0,category,description,title,also_buy,brand,rank,also_view,main_cat,price,asin,details,overall,verified,reviewerID,reviewText,summary,vote,style,for_testing
0,"['Grocery & Gourmet Food', 'Sauces, Gravies & ...",['Sriracha chili sauce made from sun ripened c...,"Huy Fong Sriracha Chili Sauce, 28 Ounce Bottle...","['B001E5DZZM', 'B003NROMC4', 'B00U9VTL5U', 'B0...",Huy Fong,"145,292 in Grocery & Gourmet Food (","['B001E5DZZM', 'B008AV5HLS', 'B00U9VTL5U', 'B0...",Grocery,,B00BT7C9R0,"{'Shipping Weight:': '11.4 pounds', 'ASIN: ': ...",5.0,True,A3FYXMWYC9KUCK,I have been using Sriracha for several years n...,This stuff is great!,,,False
1,"['Grocery & Gourmet Food', 'Breakfast Foods', ...",['belVita Chocolate Breakfast Biscuits are lig...,"belVita Chocolate Breakfast Biscuits, 5 Count ...","['B00QF27JL0', 'B01BNIN5ZO', 'B01FLPFPOY', 'B0...",Belvita,"19,427 in Grocery & Gourmet Food (","['B01COWTO4O', 'B01FLPFPOY', 'B00QF27JL0', 'B0...",Grocery,,B00IO2DO2W,"{'Shipping Weight:': '4.1 pounds', 'Domestic S...",5.0,True,A2OWR2PL3DLWS4,My daughter is a Belvita addict. She likes al...,Delciious,,,False


In [4]:
df_test.head(2)

Unnamed: 0,category,description,title,also_buy,brand,rank,also_view,main_cat,price,asin,details,overall,verified,reviewerID,reviewText,summary,vote,style,for_testing
0,"['Grocery & Gourmet Food', 'Produce', 'Fresh V...","['<div class=""aplus""> <div class=""three-fourth...","Organic Green Cabbage, 1 Head",,produce aisle,,,Grocery,,B000P6H29Q,{'\n Product Dimensions: \n ': '7.5 x 6....,5.0,True,A1NKRXSU63EA4M,Hugh and delicious,Five Stars,,,True
1,"['Grocery & Gourmet Food', 'Cooking & Baking',...",['Light & Fluffy. Just add water. Made with re...,"Krusteaz Complete Pancake Mix, Buttermilk, 32 oz","['B000R32RJC', 'B07CX6LN8T', 'B000PXZZQG', 'B0...",Krusteaz,,"['B00DXGGSBI', 'B00CEMP2Z0', 'B00BP2RY42', 'B0...",Grocery,,B000QCLEB6,{'\n Product Dimensions: \n ': '6.1 x 2....,5.0,True,A3TR0FIT13SSVN,Great flavor and surprisingly fluffy out of th...,Surprisingly good :),6.0,,True


### RMSE

In [5]:
def compute_rmse(y_pred, y_true):
    """ Compute Root Mean Squared Error. """
    
    return np.sqrt(np.mean(np.power(y_pred - y_true, 2)))

### Evaluation method

In [6]:
def evaluate(estimate_f):
    """ RMSE-based predictive performance evaluation with pandas. """
    
    ids_to_estimate = zip(df_test.reviewerID, df_test.asin)
    estimated = np.array([estimate_f(u,i) for (u,i) in ids_to_estimate])
    real = df_test.overall.values
    return compute_rmse(estimated, real)

## Well-known Solutions to the Recommendation Problem

### Content-based filtering

*Recommend based on the user's rating history.* 

Generic expression (notice how this is kind of a 'row-based' approach):

$$ \newcommand{\aggr}{\mathop{\rm aggr}\nolimits}r_{u,i} = \aggr_{i' \in I(u)} [r_{u,i'}]$$

A simple example using the mean as an aggregation function:

$$ r_{u,i} = \bar r_u = \frac{\sum_{i' \in I(u)} r_{u,i'}}{|I(u)|} $$

In [7]:
def content_mean(product_id, movie_id):
    """ Simple content-filtering based on mean ratings. """
    
    user_condition = df_train.reviewerID == product_id
    return df_train.loc[user_condition, 'overall'].mean()

In [8]:
# Specific example
content_mean('ACOICLIJQYECU', '4639725043')

nan

In [9]:
# Test model
print('RMSE for content mean: %s' % evaluate(content_mean))

TODO: Research ways to fix.
In case the output is cleared, the RMSE using content mean is nan.

### Collaborative filtering

*Recommend based on other user's rating histories.* 

Generic expression (notice how this is kind of a 'col-based' approach):

$$ \newcommand{\aggr}{\mathop{\rm aggr}\nolimits}r_{u,i} = \aggr_{u' \in U(i)} [r_{u',i}] $$

A simple example using the mean as an aggregation function:

$$ r_{u,i} = \bar r_i = \frac{\sum_{u' \in U(i)} r_{u',i}}{|U(i)|} $$

In [10]:
def collaborative_mean(user_id, product_id):
    """ Simple collaborative filter based on mean ratings. """
    
    user_condition = df_train.reviewerID != user_id
    movie_condition = df_train.asin == product_id
    ratings_by_others = df_train.loc[user_condition & movie_condition]
    if ratings_by_others.empty:
        return 4.0
    else:
        return ratings_by_others.overall.mean()
    

In [11]:
# Specific example
collaborative_mean('ACOICLIJQYECU', '4639725043')

4.0

The review rating for the collaborative mean is higher than the rating using the content mean above.

In [13]:
# Test model
print(f'RMSE for collaborative mean is: {evaluate(collaborative_mean)}.')

RMSE for collaborative mean is: 1.2031960408756872.


In case the output is cleared, the RMSE for collaborative mean is 1.0631020146075487.
This is the best results so far.

### Generalizations of the aggregation function for content-based filtering: incorporating similarities

Possibly incorporating metadata about items, which makes the term 'content' make more sense now.

$$ r_{u,i} = k \sum_{i' \in I(u)} sim(i, i') \; r_{u,i'} $$

$$ r_{u,i} = \bar r_u + k \sum_{i' \in I(u)} sim(i, i') \; (r_{u,i'} - \bar r_u) $$

Here $k$ is a normalizing factor,

$$ k = \frac{1}{\sum_{i' \in I(u)} |sim(i,i')|} $$

and $\bar r_u$ is the average rating of user u:

$$ \bar r_u = \frac{\sum_{i \in I(u)} r_{u,i}}{|I(u)|} $$


### Generalizations of the aggregation function for collaborative filtering: incorporating similarities

Possibly incorporating metadata about users.

$$ r_{u,i} = k \sum_{u' \in U(i)} sim(u, u') \; r_{u',i} $$

$$ r_{u,i} = \bar r_u + k \sum_{u' \in U(i)} sim(u, u') \; (r_{u',i} - \bar r_u) $$

Here $k$ is a normalizing factor,

$$ k = \frac{1}{\sum_{u' \in U(i)} |sim(u,u')|} $$

and $\bar r_u$ is the average rating of user u:

$$ \bar r_u = \frac{\sum_{i \in I(u)} r_{u,i}}{|I(u)|} $$

## Summary
- TODO

## References
1) Unata 2015 [Hands-on with PyData: How to Build a Minimal Recommendation Engine](https://www.youtube.com/watch?v=F6gWjOc1FUs).  