# Recommender Systems

**Level:** Coding from scratch. As low as we're getting. Intermediate level, using PyTorch, then finally high-level with the fastai library.

These notes are based on Chapter 8 of the [fastai book](https://github.com/fastai/fastbook/tree/master) and [this lecture](https://course.fast.ai/Lessons/lesson7.html) from their online course. Even moreso than previous chapers, this is really just notes I took, and not a reworking of the content.

While these notes are on recommender systems, there are two important goals in this chapter:
1. Learn what embedding layers are and how they work.
2. Use and write code for the lower-level PyTorch interfaces.

## Sections in this Notebook

* [Introduction to Collaborative Filtering](#scrollTo=_ASW5QqiXaag)
* [Embeddings](#scrollTo=zVRFaqIz9pV-)
* [A Dot Product Recommender](#scrollTo=AfOtWG9QuL9k)
* [A Neural Network Recommender](#scrollTo=wYESoFIrVDK8)
* [Bootstrapping](#scrollTo=9tajtQ01MGpU)


## Vocab Terms

* **embedding.** A mapping from a category, user ID, vocabulary, or similar list to a dense numeric vector. If you want an indicator variable for which state something is in, you'll need 50 columns in your dataset to one-hot encode them. Your embedding may only use 10, or 20, or some other smaller number.

* **latent factors.** The columns in the numeric vectors created by an embedding. They are called latent because they are not directly present in the dataset, but are "beneath the surface", so to speak. It is usually possible to find structures within these learned latent factors corresponding to some real-world feature.

* Were going to be seeing how systems know what to recommend you

In [None]:
from fastai.collab import *
from fastai.tabular.all import *

## Introduction to Collaborative Filtering

Let's begin with an example: recommender systems. When you go on Netflix, or Hulu, or Disney+, the website recommends shows or movies to watch. When you go on Amazon or another e-commerce site, you get recommended products to buy. How do websites guess at what you'll like? Well, they have data. Let's take a look at time.

We're going to use the MovieLens 100k dataset, which consists of 100,000 ratings of movies through the site MovieLens. We'll download a copy of it that was uploaded by a user to kaggle.

In [None]:
%env KAGGLE_USERNAME="donnydutch"
%env KAGGLE_KEY="e000856dbf910d714ae70609d63d00aa"

!kaggle datasets download -d prajitdatta/movielens-100k-dataset
!unzip movielens-100k-dataset

env: KAGGLE_USERNAME="donnydutch"
env: KAGGLE_KEY="e000856dbf910d714ae70609d63d00aa"
Traceback (most recent call last):
  File "/usr/local/bin/kaggle", line 5, in <module>
    from kaggle.cli import main
  File "/usr/local/lib/python3.10/dist-packages/kaggle/__init__.py", line 3, in <module>
    from kaggle.api.kaggle_api_extended import KaggleApi
  File "/usr/local/lib/python3.10/dist-packages/kaggle/api/__init__.py", line 6, in <module>
    from kaggle.api.kaggle_api_extended import KaggleApi
  File "/usr/local/lib/python3.10/dist-packages/kaggle/api/kaggle_api_extended.py", line 68, in <module>
    import requests
  File "/usr/local/lib/python3.10/dist-packages/requests/__init__.py", line 181, in <module>
    logging.getLogger(__name__).addHandler(NullHandler())
  File "/usr/lib/python3.10/logging/__init__.py", line 879, in __init__
    self.level = _checkLevel(level)
  File "/usr/lib/python3.10/logging/__init__.py", line 193, in _checkLevel
    def _checkLevel(level):
KeyboardInter

And then, loading it in, we can see what the data consist of:
* A column with a **user** ID
* A column with a **movie** ID
* A column with that user's **rating** of that movie.
* A **timestamp** when that rating was submitted.

In [None]:
ratings = pd.read_csv("ml-100k/u.data", sep="\t", header=None, #This is a Tsv file, with no headings, so it has to be loaded in a certain way to accomodate for this
            names=['user','movie','rating','timestamp'])

ratings.head()

Unnamed: 0,user,movie,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


There's also a file with the actual names of the movies in it. Let's merge them together for readability.

In [None]:
movies = pd.read_csv("ml-100k/u.item", delimiter="|", encoding="latin_1", header=None,
                     names=["movie","title"], usecols=[0,1])
movies.head()

Unnamed: 0,movie,title
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)


...and then merge them together.

In [None]:
ratings = ratings.merge(movies) #We see the User number, the movies ID number, and the rating the user gave the movie (timestamp shows when you rated that movie)
ratings.head()

Unnamed: 0,user,movie,rating,timestamp,title
0,196,242,3,881250949,Kolya (1996)
1,186,302,3,891717742,L.A. Confidential (1997)
2,22,377,1,878887116,Heavyweights (1994)
3,244,51,2,880606923,Legends of the Fall (1994)
4,166,346,1,886397596,Jackie Brown (1997)


This data's format may be a bit unintuitive, so let's convert it to a denser format using the `crosstab` function from Pandas.

The first thing you'll note is that the crosstab is quite empty. There is no movie that has been seen by every user, and there is no user that has seen every movie. That's just how it is. We might imagine the goal of our algorithm to be filling in those NaN results.

In [None]:
pd.crosstab(ratings["user"], ratings["movie"], ratings["rating"], aggfunc=max).head() #Movie along the top and user going down

#You can look at Movie 5 and see which users rated it by looking for the User columns that gave it a value

  pd.crosstab(ratings["user"], ratings["movie"], ratings["rating"], aggfunc=max).head() #Movie along the top and user going down


movie,1,2,3,4,5,6,7,8,9,10,...,1673,1674,1675,1676,1677,1678,1679,1680,1681,1682
user,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,5.0,3.0,4.0,3.0,3.0,5.0,4.0,1.0,5.0,3.0,...,,,,,,,,,,
2,4.0,,,,,,,,,2.0,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,4.0,3.0,,,,,,,,,...,,,,,,,,,,


How might we approach this problem? Let's consider some options:
* Let's not be fooled by the numeric user and movie IDs themselves. It doesn't make sense to add or multiply those.These are IDs and are categorical, dont think of them as numeric
* For each movie, we could average all of the user ratings, and for each user, get the average of their ratings, and use those numbers somehow. Perhaps averaging them, or adding them or something. That's a good start, but it's not enough information to do much differentiation. (This doesn't take into account how different movies appeal to different kinds of movies though)
* Let's consider different features of each user and film. Perhaps genre, star power of the actors in the film, how old it is, what language it is in, and so on. Now we don't know those features but... you know, machine learning. Let's try this one.

-- Instead of looking at it as "User Dr Hallenbeck" we want to look at it as a series of numbers to describe them, so maybe we'd instead look at it as "1,5,3" to represent how they rated a 1 star, a 5 star, and a 3 star

Let's start with a simple model, where there are three features which range from -1 to 1:
1. Year of release (low = classic movie, high = recent movie)
2. Is this a sci-fi/fantasy movie? (low = not at all, high = yes)
3. Is this a movie for adults? (low = rated G or PG, high = rated R)

Here are some example films:

In [None]:
# Star Wars (1977)
starwars = torch.tensor([-0.5, 1.0, -0.1])#[0.5 means older movie, 1.0 since its a sci fi fantasy movie, and 1.0 since its rated PG-13 on the younger side]
                                          #Going from calling it "Movie" to a set of numbers that describe the movie -----> [Year, Fantasy/Scifi, MPAA Rating]

#This lets us identify movies by variables which we can perform mathematical operations on and look for trends in, we save these new variables in a dataset so we can perform mathematical operatiosn on
#them, and then can go back to the movies dataset to pick the actual movie for recommendation

#EX: We use the numbers to decide the movie with ([5,10,30]) would be the best, we then cross reference this with our corresponding movies dataset to see which movie that actually is, and then recommend
#that movie

#EX: we do this for both the user and the movie column so say a person, likes newish movies, doesnt like sci fi fantasy, and wants a R rated movie, that might look something like
# ([0.5,-0.5,1]), and then someone else may look like ([-0.7,0.6,-0.1])... now we need a way to combine these so we can see how someones information of liking old movies interacts with the oldness rating
#of the movies so, with star wars data, wed do...

#Userscore [0] * Moviescore[0] + Userscore[1] + moviescore[1] + etc.

#When 2 similar signs are multiplied together, you get a positive number (so if someone likes a negative number, when multiplied by a movie with a negative number you get a positive number)

#^^This is a dot product

# Barbie (2023)
barbie = torch.tensor([0.9, 0.6, -0.4])

# Oppenheimer (2023)
oppenheimer = torch.tensor([0.9, -0.9, 0.9])

And then user preferences are rated on a similar scale. This user is me, who likes newer movies and sci-fi/fantasy, but don't particularly care what its rating is:

In [None]:
drh = torch.tensor([0.5, 0.7, 0.1])

A reasonable way to predict would be to take a dot product, multiplying the features together and then adding them. Or if you like, you can think of these as features and weights, and our first model is just a linear $w_1 f_1 + w_2 f_2 + \cdots$:

In [None]:
def predict(movie, user): #Multiplying the Movie and User tensors now
    print((movie*user).sum())

predict(starwars, drh)
predict(barbie, drh)
predict(oppenheimer, drh)

tensor(0.4400)
tensor(0.8300)
tensor(-0.0900)


Instead of storing each set of features in its own variable, we'd ideally like to have it be part of some larger datastructure, like this which could be trained. This is called an embedding matrix.

In [None]:
features = torch.stack([starwars, barbie, oppenheimer])
features.requires_grad_()

#This creates an embedding matrix

#To find the embedding matrix, we end up using gradients and loss function and etc.

#Embedding is a vector representation of a categorical variable

tensor([[-0.5000,  1.0000, -0.1000],
        [ 0.9000,  0.6000, -0.4000],
        [ 0.9000, -0.9000,  0.9000]], requires_grad=True)

And then you'd look up each movie's information by row number (Star Wars is row 0 here, Barbie row 1, and Oppenheimer row 2). That *is* what is done, but "look up row #" is not an operation that GPUs understand, so in practice it is done with a matrix multiplication.

## Embeddings

It's time for us to start learning about how PyTorch works. fastai is a great library built on top of PyTorch, and for most purposes works great. But it's good to learn (1) how the library is really working, and (2) how to make more complex modifications to models once you know what you're doing.

First, let's put the dataset into a DataLoaders. We've done this plenty of times before. The items in a DataLoaders for Collaborative Filtering looks just like our original DataFrame, with a few irrelevant columns removed:

In [None]:
dls = CollabDataLoaders.from_df(ratings, item_name='title', bs=64)
dls.show_batch()

Unnamed: 0,user,title,rating
0,520,L.A. Confidential (1997),3
1,298,Schindler's List (1993),5
2,907,Titanic (1997),5
3,110,Poison Ivy II (1995),3
4,851,Girl 6 (1996),2
5,239,Air Force One (1997),1
6,110,Marked for Death (1990),2
7,568,My Life as a Dog (Mitt liv som hund) (1985),4
8,196,"Mrs. Brown (Her Majesty, Mrs. Brown) (1997)",4
9,823,Under Siege (1992),4


If we look at what PyTorch will be passed, it's actually this, which is a representation of the same kind of data as above: we get the user and movie IDs in the variable `x` and the ratings in the variable `y` for each minibatch. The numbers in the `x` don't necessarily (and probably don't) match the IDs above. In order to be more efficient, we want our IDs to be row numbers in that embedding matrix above, so PyTorch first converts the initial inputs to its own IDs.

In [None]:
x,y = dls.one_batch()

In [None]:
print(x[0:10,:])
print(y[0:10])

tensor([[ 387,  417],
        [ 691,  811],
        [ 141,  519],
        [ 474, 1340],
        [ 894,  988],
        [ 125,  110],
        [  94, 1037],
        [ 126,  738],
        [ 660,  798],
        [ 851,  581]])
tensor([[2],
        [5],
        [1],
        [4],
        [3],
        [4],
        [4],
        [4],
        [2],
        [5]], dtype=torch.int8)


We can create a randomly initialized embedding matrix using the PyTorch `Embedding` class. Let's saw we want 5 hidden features. In that case:

In [None]:
print(len(dls.classes["user"]))
print(len(dls.classes["title"]))

944
1665


In [None]:
user_embedding = Embedding(944, 5)
movie_embedding = Embedding(1665, 5)

We can get the embeddings for our x variables:

In [None]:
user_features = user_embedding(x[:,0])
user_features[0:10,:]

tensor([[ 0.0010, -0.0075, -0.0078,  0.0067, -0.0048],
        [ 0.0075, -0.0076,  0.0084,  0.0026,  0.0013],
        [ 0.0059,  0.0190,  0.0111,  0.0041,  0.0014],
        [-0.0037,  0.0140, -0.0093,  0.0144, -0.0198],
        [ 0.0122,  0.0057,  0.0064, -0.0030,  0.0053],
        [-0.0134,  0.0089, -0.0069,  0.0072, -0.0071],
        [ 0.0175,  0.0077,  0.0178, -0.0061,  0.0023],
        [-0.0002, -0.0017, -0.0054,  0.0160, -0.0054],
        [-0.0009,  0.0101, -0.0019,  0.0030, -0.0134],
        [ 0.0059, -0.0123, -0.0052,  0.0046, -0.0023]],
       grad_fn=<SliceBackward0>)

In [None]:
movie_features = movie_embedding(x[:,1])
movie_features[0:10, :]

tensor([[-1.4873e-03, -2.3138e-03, -1.9308e-03,  9.0524e-03, -2.5039e-03],
        [-1.8552e-02,  1.7734e-03, -1.5811e-02, -7.7349e-03, -7.5634e-03],
        [ 2.1875e-03,  6.2697e-03,  5.0209e-03,  5.3087e-03, -3.7423e-03],
        [ 1.1056e-02,  3.0830e-03, -2.3176e-03, -1.1666e-03,  2.4749e-03],
        [ 9.9115e-03, -1.2223e-02,  3.8147e-03, -2.3016e-03, -6.7365e-03],
        [-2.0811e-03, -2.6029e-03,  1.3798e-03, -6.1525e-03,  3.1610e-03],
        [-1.4165e-03,  1.8145e-02, -1.8575e-03,  2.0116e-03,  5.9018e-03],
        [-2.0285e-06, -1.6106e-02,  1.5769e-03, -1.1603e-03, -1.6361e-02],
        [-1.0077e-02, -1.0419e-03, -1.2345e-02, -1.1747e-02,  9.4333e-04],
        [ 1.8789e-02,  1.1359e-02, -1.2897e-02,  3.2525e-03, -7.6906e-03]],
       grad_fn=<SliceBackward0>)

And then we calculate the dot product...

In [None]:
predictions = (user_features*movie_features).sum(axis=1)
predictions

tensor([ 1.0363e-04, -3.1555e-04,  2.0508e-04, -4.1507e-05,  4.6925e-05,
        -7.1625e-05,  8.2311e-05,  8.8395e-05, -2.5611e-05,  7.1442e-05,
         1.2731e-04, -4.2313e-05,  4.9488e-05,  2.7356e-05,  1.0731e-04,
         4.1927e-05,  1.6978e-04, -7.4427e-05, -4.6362e-05,  1.0756e-04,
         6.0141e-05,  1.8949e-04,  1.2759e-04,  6.3091e-05, -1.5531e-04,
        -1.8786e-05,  5.0955e-06,  1.1776e-04,  1.5619e-04,  2.9458e-04,
        -9.4468e-05,  3.0193e-04, -3.0879e-04, -3.9732e-05, -3.1265e-04,
        -1.5706e-04,  2.9679e-05, -1.0212e-04, -9.1734e-05, -7.4566e-04,
        -2.5245e-05,  1.5416e-04,  6.5554e-06,  2.9029e-05, -8.8818e-05,
         4.5759e-05, -2.6233e-05, -2.0805e-05,  2.8357e-05,  1.8538e-04,
         3.4288e-04,  9.1041e-05,  8.4520e-05, -2.0260e-04,  5.0876e-05,
        -1.4023e-04,  2.9039e-04, -1.8698e-04,  2.1349e-05, -9.9394e-05,
        -2.5605e-04, -4.1697e-05, -1.2758e-04, -6.4309e-05],
       grad_fn=<SumBackward1>)

## A Dot Product Recommender

Let's encapsulate this work into a [PyTorch `Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). A `Module` is a model, which then gets passed into a Learner to be optimized. We essentially built one for a few units ago for a simple neural network—and we will remake it as a Module in a bit. The two important things when building a PyTorch Module are initializing it with `__init__` and creating a `forward` function, which defines what the model does (we previously created functions called `predict` for the same use). Here's a Module for our dot product recommender:

In [None]:
class DotProduct (Module):
    def __init__(self, n_users, n_movies, n_features):
        self.user_embedding  = Embedding(n_users, n_features)
        self.movie_embedding = Embedding(n_movies, n_features)

    def forward(self, x):
        users  = self.user_embedding(x[:,0])
        movies = self.movie_embedding(x[:,1])
        return (users*movies).sum(axis=1)

That's it! Let's put it into production:

In [None]:
n_users  = len(dls.classes["user"])
n_movies = len(dls.classes["title"])
n_features = 50

model = DotProduct(n_users, n_movies, n_features)
learn = Learner(dls, model, loss_func=MSELossFlat(), metrics=rmse)

learn.fit_one_cycle(5, 5e-3)

epoch,train_loss,valid_loss,_rmse,time
0,1.327311,1.31002,1.144561,00:12
1,1.055892,1.123471,1.059939,00:11
2,0.902756,1.016757,1.008344,00:11
3,0.793417,0.92462,0.961572,00:10
4,0.771423,0.893134,0.945058,00:11


Look at this! We can typically estimate what someone will rate a movie within $\pm0.95$ stars. That's pretty great. You know what's next: how do we improve?

We'll add two pieces:
1. Add a bias term so that particular movies can just have higher or lower ratings regardless of user preference, and the same or users regardless of movie.
2. Restrict the predictions to roughly the range $(1,5)$. Ratings never go below 1 or above 5 in this system, so those are the maxima. For the same reasons as label smoothing we'll actually go with the range $(0.5,5.5)$, but you get the idea.

In [None]:
class DotProductBias (Module):
    def __init__(self, n_users, n_movies, n_features, range=[0.0,1.0]):
        self.user_embedding  = Embedding(n_users, n_features)
        self.user_bias = Embedding(n_users, 1)

        self.movie_embedding = Embedding(n_movies, n_features)
        self.movie_bias = Embedding(n_movies, 1)

        self.min = range[0]
        self.max = range[1]

    def forward(self, x):
        users   = self.user_embedding(x[:,0])
        user_b  = self.user_bias(x[:,0])

        movies  = self.movie_embedding(x[:,1])
        movie_b = self.movie_bias(x[:,1])

        raw_rating = (users*movies).sum(axis=1, keepdim=True) + user_b + movie_b

        return torch.sigmoid(raw_rating)*(self.max-self.min) + self.min

In [None]:
n_users  = len(dls.classes["user"])
n_movies = len(dls.classes["title"])
n_features = 50

model = DotProductBias(n_users, n_movies, n_features, [0.5,5.5])
learn = Learner(dls, model, loss_func=MSELossFlat(), metrics=rmse)

learn.fit_one_cycle(5, 5e-3)

epoch,train_loss,valid_loss,_rmse,time
0,0.857418,0.925468,0.962012,00:12
1,0.558479,0.912969,0.955494,00:13
2,0.392992,0.942482,0.970815,00:12
3,0.302219,0.951572,0.975486,00:17
4,0.28459,0.952899,0.976165,00:12


We're still at $\pm0.95$ error on our ratings. Why didn't our model improve? The training loss decreases quite a bit, while the validation loss isn't decreasing (and looks like it may be increasing as well). We're overfitting. Let's add L2 normalization with $\lambda=0.1$ in order to fix that (using `wd=0.1` when we go to fit our model). And yeah, we get a bit of improvement:

In [None]:
model = DotProductBias(n_users, n_movies, n_features, [0.5,5.5])
learn = Learner(dls, model, loss_func=MSELossFlat(), metrics=rmse)

learn.fit_one_cycle(5, 5e-3, wd=0.1)

epoch,train_loss,valid_loss,_rmse,time
0,0.83727,0.937487,0.968239,00:12
1,0.636476,0.898733,0.948015,00:12
2,0.525377,0.879971,0.938068,00:12
3,0.429272,0.860002,0.927363,00:11
4,0.427108,0.854598,0.924445,00:12


## A Recommender with a Neural Network

But this is a course on deep learning, not just machine learning. Let's use an actual neural network, which we can do using the `nn.Sequential()` API to build one with two layers:

In [None]:
class RecommenderNN (Module):
    def __init__(self, user_sz, item_sz, range=[0.5,5.5], n_act=100):

        self.user_embedding  = Embedding(*user_sz)
        self.movie_embedding = Embedding(*item_sz)

        self.layers = nn.Sequential( #Creating layers
            nn.Linear(user_sz[1]+item_sz[1], n_act),  #Linear layer, our movie and user embeddings aren't being dot producted anymore, the size of each of these doesnt have to be the same anymore
                                     #They can be thought of as inputs for a layer
            nn.ReLU(), #A relu
            nn.Linear(n_act, 1))

        self.min = range[0]
        self.max = range[1]

    def forward(self, x):
        users   = self.user_embedding(x[:,0])
        movies  = self.movie_embedding(x[:,1])
        embeddings = torch.cat([users, movies], dim=1) #This takes our embeddings and makes a single vector as the input

        raw_rating = self.layers(embeddings)

        return torch.sigmoid(raw_rating)*(self.max-self.min) + self.min



In [None]:
embs = get_emb_sz(dls)

model = RecommenderNN(*embs, [0.5,5.5])
learn = Learner(dls, model, loss_func=MSELossFlat(), metrics=rmse)

learn.fit_one_cycle(5, 5e-3, wd=0.1)

NameError: name 'get_emb_sz' is not defined

And, of course, fastai already has a function which does this:

In [None]:
learn = collab_learner(dls, use_nn=True, y_range=(0, 5.5), layers=[100,50])
learn.fit_one_cycle(5, 5e-3, wd=0.1)

epoch,train_loss,valid_loss,time
0,0.986512,0.977334,00:16
1,0.885194,0.928558,00:16
2,0.826623,0.896697,00:16
3,0.766158,0.875392,00:17
4,0.745022,0.882101,00:15


## Bootstrapping

One problem with collaborative filtering is that they need a lot of initial data to get going. What do you do if you don't have an existing base of users and ratings? And what if you do have that list, but a new product or new user joins the system? There are a few approaches, all of which are fairly simple.

1. Use a pre-existing system. If I were starting a new streaming service, I might use a dataset like MovieLens to get some starting points.
2. Initialize new users to the average of all users, and new movies to the average of all movies.
3. Ask the user some questions about what they like, and put them into one of a few pre-made profile types.

#NOTES

In [None]:
ratings = pd.read_csv("ml-100k/u.data", sep="\t", header=None, #This is a Tsv file, with no headings, so it has to be loaded in a certain way to accomodate for this
            names=['user','movie','rating','timestamp'])
movies = pd.read_csv("ml-100k/u.item", delimiter="|", encoding="latin_1", header=None,
                     names=["movie","title"], usecols=[0,1])

ratings = ratings.merge(movies)

ratings

Unnamed: 0,user,movie,rating,timestamp,title
0,196,242,3,881250949,Kolya (1996)
1,186,302,3,891717742,L.A. Confidential (1997)
2,22,377,1,878887116,Heavyweights (1994)
3,244,51,2,880606923,Legends of the Fall (1994)
4,166,346,1,886397596,Jackie Brown (1997)
...,...,...,...,...,...
99995,880,476,3,880175444,"First Wives Club, The (1996)"
99996,716,204,5,879795543,Back to the Future (1985)
99997,276,1090,1,874795795,Sliver (1993)
99998,13,225,2,882399156,101 Dalmatians (1996)


In [None]:
class DotProduct (Module):
    def __init__(self, n_users, n_movies, n_features):
        self.user_embedding  = Embedding(n_users, n_features)
        self.movie_embedding = Embedding(n_movies, n_features)

        self.user_bias_embeds = Embedding(n_users,1)
        self.movie_bias_embeds = Embedding(n_movies,1)

#orward to make predictions
    def forward(self, x):

      #Unpack the user and movie ID's
        users  = self.user_embedding(x[:,0]) #X's 0th column
        movies = self.movie_embedding(x[:,1])

        user_ids = (x[:,0])
        movie_ids= (x[:,1])

        #Get the embedding for each one

        user_features = self.user_embedding(user_ids)
        movie_features = self.movie_embedding(movie_ids)

        user_bias = self.user_bias_embeds(user_ids)
        movie_bias = self.movie_bias_embeds(movie_ids)

        predictions = user_bias + movie_bias + (user_features*movie_features).sum(axis=1, keepdims = True) #keepdims tells it to read our pytorch tensors as tensors instead of vectors


        #Add a buffer zone above and below the max and min so that we dont need an input of +/- infinity to get 1

        minimum = 1
        range = 5.2-0.8 #Adding a buffer so we don't need infinity as an input
        return torch.sigmoid(predictions)*range+minimum

#WE DO NOT ADD .backward()
#This is inherited from the module class


In [None]:
dls = CollabDataLoaders.from_df(ratings, item_name='title', bs=500)
dls.show_batch()

Unnamed: 0,user,title,rating
0,367,Tales from the Hood (1995),2
1,269,Casablanca (1942),4
2,542,Addams Family Values (1993),3
3,430,Mimic (1997),2
4,622,Coneheads (1993),2
5,686,"Graduate, The (1967)",5
6,857,Emma (1996),5
7,692,Michael (1996),4
8,577,"Princess Bride, The (1987)",5
9,270,"Silence of the Lambs, The (1991)",5


In [None]:
n_users = len(dls.classes["user"])
n_movies = len(dls.classes["title"])

n_features = 50

model = DotProduct(n_users,n_movies,n_features)
learn = Learner(dls,model,loss_func=MSELossFlat(),metrics = rmse)

In [None]:
learn.fit_one_cycle(10,5e-3, wd = 0.1)

#wd = 0.1 is equivalent to "L2 regularization with lambda2 = 0.2", you don't know a good wd until you mess around and figure it out

#The rmse tells you your typical error in terms of Y, so if you think someone would give a movie a 4 star and your rmse is 1 star, that means you may be off by 1 star

#If our train_loss and valid_loss go down at the same time, our model is likely not overfit

epoch,train_loss,valid_loss,_rmse,time
0,0.296338,0.875782,0.935832,00:01
1,0.285285,0.886247,0.941407,00:01
2,0.272549,0.900692,0.949048,00:01
3,0.250845,0.905967,0.951823,00:01
4,0.231014,0.914532,0.956312,00:02


KeyboardInterrupt: 

In [None]:
#Now lets improve this by A) Adding a new term to our model and B) L2 regularization

#We're going to add a bias, all of our terms so far represent an interaction between the user and the movie, so the quality of the movie effect a users rating and the quality of the user effects
#the movies rating aswell

#This is the user_bias and movie_bias from the function

#Each user could potentially need their own bias, so this isnt a standard numner

Instead of user * movie + b1 + b2, were sending that through a sigmoid (a graph that looks like an s

A sigmoid takes any number in the range (-infinity,infinity) and forces it into the range 0,1

#Recommender building
