## Collaborative Filtering

This model answers the question 'what would other people like me like?". Once you have enough data about a user, this algorithm works very well.

Where $u_{i}$ is a user and $v_{j}$ is an item to recommend
Our predictions have this form...
$$\hat{y}_{ij} = u_{0i} + v_{0j} + u_i \cdot v_j  $$ 


We train with gradient descent, where these are our gradients.

\begin{aligned}
\nabla{L}=&\left( \begin{array}{c}
\frac{\delta L}{\delta u_{il}}\\
\frac{\delta L}{\delta v_{jl}}\\
\frac{\delta L}{\delta u_{0i}}\\
\frac{\delta L}{\delta v_{0i}}\end{array} \right)=&\left( \begin{array}{c}
- \frac{2}{N}\sum_{j:r_{ij}=1}(y_{ij}-u^0_i-v^0_j-u_i\cdot v_j)v_{jl} \\
- \frac{2}{N}\sum_{i:r_{ij}=1}(y_{ij}-u^0_i-v^0_j-u_i\cdot v_j)u_{il}\\
- \frac{2}{N}\sum_{j:r_{ij}=1}(y_{ij}-u^0_i-v^0_j-u_i\cdot v_j)\\
- \frac{2}{N}\sum_{i:r_{ij}=1}(y_{ij}-u^0_i-v^0_j-u_i\cdot v_j)\\\end{array} \right)
\end{aligned}


In [5]:
import numpy as np
import pandas as pd

In [6]:
# here is a handy function from fast.ai
def proc_col(col):
    """Encodes a pandas column with continous ids. 
    """
    uniq = col.unique()
    name2idx = {o:i for i,o in enumerate(uniq)}
    return name2idx, np.array([name2idx[x] for x in col]), len(uniq)
def encode_data(df):
    """Encodes rating data with continous user and movie ids using 
    the helpful fast.ai function from above.
    
    Arguments:
      train_csv: a csv file with columns user_id,movie_id,rating 
    
    Returns:
      df: a dataframe with the encode data
      num_users
      num_movies
      
    """
    # YOUR CODE HERE
    _,df['userId'],num_users = proc_col(df['userId'])
    _,df['movieId'],num_movies = proc_col(df['movieId'])

    return df, num_users, num_movies
df = pd.read_csv("tiny_training2.csv")
df, num_users, num_movies = encode_data(df)

### Initializing our parameters

In [7]:
def create_embedings(n, K):
    """ Create a numpy random matrix of shape n, K
    
    The random matrix should be initialized with uniform values in (0, 6/K)
    Arguments:
    
    Inputs:
    n: number of items/users
    K: number of factors in the embeding 
    
    Returns:
    emb: numpy array of shape (n, num_factors)
    """
    np.random.seed(3)
    emb = 6*np.random.random((n, K)) / K
    return emb

# here is an example on how the prediction matrix would look like with 7 users and 5 movies
np.dot(create_embedings(1,2), create_embedings(3,2).transpose())

array([[ 7.24366501,  4.69774059, 10.13887178]])

### Read in data

In [9]:
#Code to encode matrix in sparse 
from scipy import sparse
def df2matrix(df, nrows, ncols, column_name="rating"):
    """ Returns a sparse matrix constructed from a dataframe
    
    This code assumes the df has columns: MovieID,UserID,Rating
    """
    values = df[column_name].values
    ind_movie = df['movieId'].values
    ind_user = df['userId'].values
    return sparse.csc_matrix((values,(ind_user, ind_movie)),shape=(nrows, ncols))

In [10]:
df = pd.read_csv("tiny_training2.csv")
df, num_users, num_movies = encode_data(df)
Y = df2matrix(df, num_users, num_movies)

In [11]:
df.head()

Unnamed: 0,userId,movieId,rating
0,0,0,4
1,0,1,5
2,1,1,5
3,1,2,3
4,2,0,4


In [12]:
def predict(df, emb_user, emb_movie):
    """ This function computes df["prediction"] without doing (U*V^T).
    
    Compute df["prediction"] by using elementwise multiplication of the corresponding embeddings and then 
    sum to get the prediction u_i*v_j. This avoids creating the dense matrix U*V^T.
    """    
    users = df['userId']
    movies = df['movieId']
    df['prediction'] = (emb_user[users] * emb_movie[movies]).sum(axis=1)
    return df

## Mean Squared Error Loss

In [13]:
def cost(df, emb_user, emb_movie):
    """ Computes mean square error
    
    First compute prediction using the predict function.
    Prediction for user i and movie j is emb_user[i]*emb_movie[j]
    
    Arguments:
      df: dataframe with all data or a subset of the data
      emb_user: embedings for users
      emb_movie: embedings for movies
      
    Returns:
      error(float): this is the MSE
    """
    # YOUR CODE HERE
    prediction = predict(df, emb_user, emb_movie)['prediction']
    actual = df['rating']
    error = np.mean(np.power(actual-prediction,2))
    return error

## Find Gradient

In [14]:
def finite_difference(df, emb_user, emb_movie, ind_u=None, ind_m=None, k=None):
    """ Computes finite difference on MSE(U, V).
    
    This function is used for testing the gradient function. 
    """
    e = 0.000000001
    c1 = cost(df, emb_user, emb_movie)
    K = emb_user.shape[1]
    x = np.zeros_like(emb_user)
    y = np.zeros_like(emb_movie)
    if ind_u is not None:
        x[ind_u][k] = e
    else:
        y[ind_m][k] = e
    c2 = cost(df, emb_user + x, emb_movie + y)
    return (c2 - c1)/e

In [16]:
def gradient(df, emb_user, emb_movie):
    """ 
    Computes the gradient.

    Arguments:
      df: dataframe with all data or a subset of the data
      Y: sparse representation of df
      emb_user: embedings for users
      emb_movie: embedings for movies
      
    Returns:
      d_emb_user
      d_emb_movie
    """
    users = df['userId']
    movies = df['movieId']
    df['prediction'] = predict(df, emb_user, emb_movie)['prediction']
    Y = df2matrix(df, emb_user.shape[0], emb_movie.shape[0])
    Y_hat = df2matrix(df, emb_user.shape[0], emb_movie.shape[0], column_name="prediction")
    N = df.shape[0]
    nambla = (Y.todense()-Y_hat)
    d_emb_user = (-2/N) * np.dot(nambla,emb_movie)
    d_emb_movie = (-2/N) * np.dot(nambla.T,emb_user)

    return d_emb_user, d_emb_movie

## Using gradient descent with momentum

In [17]:
def gradient_descent(df, emb_user, emb_movie, iterations=100, learning_rate=0.01, df_val=None):
    """ Computes gradient descent with momentum (0.9) for a number of iterations.
    
    Prints training cost and validation cost (if df_val is not None) every 50 iterations.
    
    Returns:
    emb_user: the trained user embedding
    emb_movie: the trained movie embedding
    """
    momentum = 0.9
    update_user, update_movie = gradient(df, emb_user, emb_movie)
    
    for iteration in range(iterations):
        #calculate gradients
        #updates
        if iteration > 0:
            du, dm = gradient(df, emb_user, emb_movie)
            update_user =  update_user * (momentum) + du * (1 - momentum)
            update_movie = update_movie * (momentum) + dm * (1 - momentum)
        
        #parameter changes
        emb_user -= learning_rate * update_user
        emb_movie -= learning_rate *update_movie
        if iteration%50 == 49:
            print('Training Error Rate: '+str(cost(df, emb_user  , emb_movie)))
            if df_val is not None:
                print('Validation Error Rate: '+str(cost(df_val, emb_user, emb_movie)))
    return emb_user, emb_movie

In [20]:
emb_user = create_embedings(num_users, 3)
emb_movie = create_embedings(num_movies, 3)
emb_user, emb_movie = gradient_descent(df, emb_user, emb_movie, iterations=1000, learning_rate=0.01)

Training Error Rate: 1.7013643061883719
Training Error Rate: 0.9748705172056421
Training Error Rate: 0.6991455656925724
Training Error Rate: 0.5265621094145297
Training Error Rate: 0.39009717387070564
Training Error Rate: 0.27831199440666954
Training Error Rate: 0.19147935796884605
Training Error Rate: 0.12886261100552718
Training Error Rate: 0.08647778051306698
Training Error Rate: 0.05890081746897103
Training Error Rate: 0.04123628994620466
Training Error Rate: 0.029892049787387052
Training Error Rate: 0.02249858175468944
Training Error Rate: 0.017567111649942825
Training Error Rate: 0.014178755207495756
Training Error Rate: 0.011768396636592487
Training Error Rate: 0.009987640000180828
Training Error Rate: 0.008620533643235114
Training Error Rate: 0.007532197477577391
Training Error Rate: 0.006637556360077275
