# FastAI Chapter 8 - Collaborative Filtering

The chapter talks about methods to predict entries in big matrices of, for instance customer data by predicting latent factors underlying the customers' choice. Hence, a recommendation system. We want to learn which movies are similar to each other and hence which unseen movie a customer might like based on the movies he already liked. Essentially, we want to assign each movie a vector in a continuous vector space. Movies that are closer together in that vector space are considered to be similar based on the input data set.

Take, for instance movie rating from numerous user on IMDB. We know how certain customers rate certain movies. But we don't know which measures the users based that rating on or how they would rate other movies. But we can learn that.

Collaborative Filtering means to randomly initialze a number of latent factors for each user AND each movie, then calculate the DOT PRODUCT of both sets of factors and calculate the loss compared to the existing ratings.

Once we have learned those latent factors we can predict which movie that user might like.

The most important concept to understand to solve this task is so called "Embedding". 
Imagine a list of 100 movies. We could represent each movie with a one-hot vector with a one at the list position of the movie. However, using a large and sparse vector like this is very memory intensive and would slow down everything. Instead we can assign each movie a certain number of randomly initialized features and learn their "true" value. 

The model we are going to use is hence a dot product of embeddings for users and movies. In addition we need a bias which in this example is the underlying quality of the movie. A movie can be "very action" and the user can like action movies, but if it's a bad movie the rating is still going to be bad. 

In [None]:
#hide
!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

In [None]:
#hide
from fastbook import *

In [None]:
from fastai.collab import *
from fastai.tabular.all import *
path = untar_data(URLs.ML_100k)

In [None]:
ratings = pd.read_csv(path/'u.data', delimiter='\t', header=None,
                      names=['user','movie','rating','timestamp'])

In [None]:
movies = pd.read_csv(path/'u.item',  delimiter='|', encoding='latin-1',
                     usecols=(0,1), names=('movie','title'), header=None)

In [None]:
ratings = ratings.merge(movies)

In [None]:
dls = CollabDataLoaders.from_df(ratings, item_name='title', bs=64)

In [None]:
n_users  = len(dls.classes['user'])
n_movies = len(dls.classes['title'])
n_factors = 5

user_factors = torch.randn(n_users, n_factors)
movie_factors = torch.randn(n_movies, n_factors)

In [None]:
class DotProductBias(Module):
    def __init__(self, n_users, n_movies, n_factors, y_range=(0,5.5)):
        self.user_factors = Embedding(n_users, n_factors)
        self.user_bias = Embedding(n_users, 1)
        self.movie_factors = Embedding(n_movies, n_factors)
        self.movie_bias = Embedding(n_movies, 1)
        self.y_range = y_range
        
    def forward(self, x):
        users = self.user_factors(x[:,0])
        movies = self.movie_factors(x[:,1])
        res = (users * movies).sum(dim=1, keepdim=True)
        res += self.user_bias(x[:,0]) + self.movie_bias(x[:,1])
        return sigmoid_range(res, *self.y_range)

In [None]:
model = DotProductBias(n_users, n_movies, 50)
learn = Learner(dls, model, loss_func=MSELossFlat())
learn.fit_one_cycle(5, 5e-3)

# Weight decay or L2 regularization

In order to keep weights small, hence the search space for the SGD relatively smooth without deep valleys we introduce a regularization penalty by adding to the loss function the sum of all the weights squared.
In code we simply add a wd parameter. "wd" is basiclly lambda for the regularization.

In [None]:
learn.fit_one_cycle(5, 5e-3, wd=0.1)

# In short using fastAI

In [None]:
learn = collab_learner(dls, n_factors=50, y_range=(0, 5.5))

In [None]:
learn.fit_one_cycle(5, 5e-3, wd=0.1)