# Matrix factorization with PyTorch

In this notebook we will write a matrix factorization model in pytorch to solve a recommendation problem. 

The MovieLens dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 100004 ratings and 1296 tag applications across 9125 movies. https://grouplens.org/datasets/movielens/. To get the data:

`wget http://files.grouplens.org/datasets/movielens/ml-latest-small.zip`

## MovieLens dataset

In [1]:
from pathlib import Path
import pandas as pd
import numpy as np

In [2]:
PATH = Path("data/ml-latest-small")
list(PATH.iterdir())

[PosixPath('data/ml-latest-small/links.csv'),
 PosixPath('data/ml-latest-small/tags.csv'),
 PosixPath('data/ml-latest-small/ratings.csv'),
 PosixPath('data/ml-latest-small/README.txt'),
 PosixPath('data/ml-latest-small/movies.csv')]

In [3]:
! head data/ml-latest-small/ratings.csv

userId,movieId,rating,timestamp
1,1,4.0,964982703
1,3,4.0,964981247
1,6,4.0,964982224
1,47,5.0,964983815
1,50,5.0,964982931
1,70,3.0,964982400
1,101,5.0,964980868
1,110,4.0,964982176
1,151,5.0,964984041


In [4]:
# reading a csv into pandas
data = pd.read_csv(PATH/"ratings.csv")

In [5]:
data.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


### Encoding data
We enconde the data to have contiguous ids for users and movies. You can think about this as a categorical encoding of our two categorical variables userId and movieId.

In [6]:
time_80 = np.quantile(data.timestamp.values, 0.8)
time_80

1458635171.0

In [7]:
train = data[data["timestamp"] < time_80].copy()
val = data[data["timestamp"] >= time_80].copy()

In [8]:
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1434,15,1,2.5,1510577970
1436,15,47,3.5,1510571970
1440,15,260,5.0,1510571946
1441,15,293,3.0,1510571962
1442,15,296,4.0,1510571877


In [9]:
# encoding movies and user ids with continous ids

train_user_ids = np.sort(np.unique(train.userId.values))
train_user_ids[:15]

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [10]:
# number of unique ids
num_users = len(train_user_ids)
num_users

522

In [11]:
userid2idx = {o:i for i,o in enumerate(train_user_ids)}
#userid2idx

In [12]:
train["userId"] = train["userId"].apply(lambda x: userid2idx[x])
train.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,0,1,4.0,964982703
1,0,3,4.0,964981247
2,0,6,4.0,964982224
3,0,47,5.0,964983815
4,0,50,5.0,964982931


In [13]:
val["userId"] = val["userId"].apply(lambda x: userid2idx.get(x, -1)) # -1 for users not in training
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1434,14,1,2.5,1510577970
1436,14,47,3.5,1510571970
1440,14,260,5.0,1510571946
1441,14,293,3.0,1510571962
1442,14,296,4.0,1510571877


In [14]:
val = val[val["userId"] >= 0].copy()
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1434,14,1,2.5,1510577970
1436,14,47,3.5,1510571970
1440,14,260,5.0,1510571946
1441,14,293,3.0,1510571962
1442,14,296,4.0,1510571877


In [15]:
# now encoding movieId
train_movie_ids = np.sort(np.unique(train.movieId.values))
num_items = len(train_movie_ids)
print(num_items)
train_movie_ids[:15]

7867


array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [16]:
movieid2idx = {o:i for i,o in enumerate(train_movie_ids)}
train["movieId"] = train["movieId"].apply(lambda x: movieid2idx[x])
val["movieId"] = val["movieId"].apply(lambda x: movieid2idx.get(x, -1))

In [17]:
val = val[val["movieId"] >= 0].copy()
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1434,14,0,2.5,1510577970
1436,14,43,3.5,1510571970
1440,14,224,5.0,1510571946
1441,14,254,3.0,1510571962
1442,14,257,4.0,1510571877


In [18]:
val.shape

(1311, 4)

## Embedding layer

An embedding layer enables us to encode users and items into vectors. Every user and item is going to have a (unique) vector. These vectors are parameters of the model that are going to be learn in the optimization process. Ideally, the embeddings capture properties of the data by placing similar users (items) in close together in the embedding space.

In [19]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [20]:
# an Embedding module containing 10 users or items embedding size 3
# embedding will be initialized at random
embed = nn.Embedding(10, 3)
embed.weight

Parameter containing:
tensor([[ 0.5935, -1.3365,  1.1738],
        [ 0.4667,  0.7331,  0.0567],
        [-0.4587,  0.7762, -0.0968],
        [ 0.5330, -0.1384,  0.8012],
        [-0.0086,  0.6683,  1.0093],
        [ 0.3934,  0.4449, -1.4494],
        [ 0.6718, -0.0938,  1.4397],
        [ 0.3774,  0.4804,  0.4262],
        [ 0.6276, -0.2002,  1.0706],
        [-0.0940,  0.5449, -0.8489]], requires_grad=True)

In [21]:
# given a list of ids we can "look up" the embedding corresponing to each id
# can you see that some vectors are the same?
a = torch.LongTensor([[1,0,1,4,5,1]])
embed(a)

tensor([[[ 0.4667,  0.7331,  0.0567],
         [ 0.5935, -1.3365,  1.1738],
         [ 0.4667,  0.7331,  0.0567],
         [-0.0086,  0.6683,  1.0093],
         [ 0.3934,  0.4449, -1.4494],
         [ 0.4667,  0.7331,  0.0567]]], grad_fn=<EmbeddingBackward>)

## Matrix factorization model

In [22]:
class MF(nn.Module):
    def __init__(self, num_users, num_items, emb_size=100):
        super(MF, self).__init__()
        self.user_emb = nn.Embedding(num_users, emb_size)
        self.item_emb = nn.Embedding(num_items, emb_size)
        # initlializing weights
        self.user_emb.weight.data.uniform_(0,0.05)
        self.item_emb.weight.data.uniform_(0,0.05)
        
    def forward(self, u, v):
        u = self.user_emb(u)
        v = self.item_emb(v)
        return (u*v).sum(1)   

## Debugging MF model

In [23]:
df = pd.DataFrame({"userId": [0, 0, 1, 1, 3, 4], "movieId": [0, 1, 2, 1, 3, 0], "rating": [4, 5, 3, 1, 3, 4]})
df

Unnamed: 0,userId,movieId,rating
0,0,0,4
1,0,1,5
2,1,2,3
3,1,1,1
4,3,3,3
5,4,0,4


In [24]:
users = torch.LongTensor(df.userId.values)
users

tensor([0, 0, 1, 1, 3, 4])

In [57]:
torch.LongTensor(df.userId.values)

tensor([0, 0, 1, 1, 3, 4])

In [25]:
items = torch.LongTensor(df.movieId.values)
items

tensor([0, 1, 2, 1, 3, 0])

In [26]:
num_users = 5
num_items = 4
emb_size = 3

user_emb = nn.Embedding(num_users, emb_size)
item_emb = nn.Embedding(num_items, emb_size)
users = torch.LongTensor(df.userId.values)
items = torch.LongTensor(df.movieId.values)

In [27]:
U = user_emb(users)
V = item_emb(items)

In [28]:
U

tensor([[-1.3106,  0.2618,  0.2860],
        [-1.3106,  0.2618,  0.2860],
        [-0.3552, -1.3466, -0.6607],
        [-0.3552, -1.3466, -0.6607],
        [ 1.1995,  0.8170, -1.2229],
        [-1.5255,  0.7604,  1.2791]], grad_fn=<EmbeddingBackward>)

In [29]:
V

tensor([[-0.3169, -0.3454, -0.2697],
        [-0.8112,  0.6813, -1.5022],
        [ 0.2378, -1.2105, -0.3832],
        [-0.8112,  0.6813, -1.5022],
        [ 0.1699,  0.4992, -0.0128],
        [-0.3169, -0.3454, -0.2697]], grad_fn=<EmbeddingBackward>)

In [30]:
# element wise multiplication
U*V 

tensor([[ 0.4154, -0.0904, -0.0771],
        [ 1.0631,  0.1783, -0.4296],
        [-0.0845,  1.6301,  0.2532],
        [ 0.2881, -0.9175,  0.9925],
        [ 0.2038,  0.4078,  0.0157],
        [ 0.4835, -0.2626, -0.3450]], grad_fn=<MulBackward0>)

In [31]:
# what we want is a dot product per row
(U*V).sum(1) 

tensor([ 0.2478,  0.8119,  1.7988,  0.3631,  0.6273, -0.1242],
       grad_fn=<SumBackward1>)

## Training MF model

In [32]:
num_users = len(train.userId.unique())
num_items = len(train.movieId.unique())
print(num_users, num_items) 

522 7867


In [33]:
# here we are not using data loaders because our data fits well in memory
def train_epocs(model, epochs=10, lr=0.01, wd=0.0):
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=wd)
    for i in range(epochs):
        model.train()
        users = torch.LongTensor(train.userId.values)  #.cuda()
        items = torch.LongTensor(train.movieId.values) #.cuda()
        ratings = torch.FloatTensor(train.rating.values)  #.cuda()
    
        y_hat = model(users, items)
        loss = F.mse_loss(y_hat, ratings)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        testloss = valid_loss(model)
        print("train loss %.3f valid loss %.3f" % (loss.item(), testloss)) 

In [34]:
def valid_loss(model):
    model.eval()
    users = torch.LongTensor(val.userId.values) # .cuda()
    items = torch.LongTensor(val.movieId.values) #.cuda()
    ratings = torch.FloatTensor(val.rating.values) #.cuda()
    y_hat = model(users, items)
    loss = F.mse_loss(y_hat, ratings)
    return loss.item()

In [35]:
num_users = len(train_user_ids)
num_users

522

In [36]:
num_items = len(train_movie_ids)
num_items

7867

In [37]:
model = MF(num_users, num_items, emb_size=100)  # if you have a GPU .cuda()

In [51]:
model(users,items)

tensor([4.6938, 4.1823, 3.2456, 3.4114, 2.2294, 3.7705],
       grad_fn=<AddBackward0>)

In [38]:
train_epocs(model, epochs=20, lr=0.1, wd=1e-5)

train loss 12.941 valid loss 5.080
train loss 4.865 valid loss 2.695
train loss 2.530 valid loss 4.547
train loss 2.986 valid loss 1.460
train loss 0.850 valid loss 1.811
train loss 1.873 valid loss 2.648
train loss 2.683 valid loss 2.342
train loss 2.125 valid loss 1.410
train loss 1.067 valid loss 1.144
train loss 0.973 valid loss 1.750
train loss 1.625 valid loss 1.723
train loss 1.306 valid loss 1.155
train loss 0.773 valid loss 1.004
train loss 0.966 valid loss 1.198
train loss 1.343 valid loss 1.265
train loss 1.305 valid loss 1.114
train loss 0.914 valid loss 1.015
train loss 0.675 valid loss 1.159
train loss 0.872 valid loss 1.260
train loss 1.021 valid loss 1.090


In [39]:
train_epocs(model, epochs=15, lr=0.01, wd=1e-5)

train loss 0.797 valid loss 0.926
train loss 0.630 valid loss 0.910
train loss 0.642 valid loss 0.918
train loss 0.663 valid loss 0.920
train loss 0.648 valid loss 0.921
train loss 0.622 valid loss 0.921
train loss 0.605 valid loss 0.916
train loss 0.601 valid loss 0.901
train loss 0.599 valid loss 0.880
train loss 0.591 valid loss 0.860
train loss 0.579 valid loss 0.848
train loss 0.568 valid loss 0.843
train loss 0.563 valid loss 0.844
train loss 0.560 valid loss 0.848
train loss 0.557 valid loss 0.853


In [40]:
train_epocs(model, epochs=15, lr=0.001, wd=1e-5)

train loss 0.549 valid loss 0.853
train loss 0.542 valid loss 0.853
train loss 0.536 valid loss 0.853
train loss 0.532 valid loss 0.854
train loss 0.529 valid loss 0.855
train loss 0.526 valid loss 0.856
train loss 0.523 valid loss 0.857
train loss 0.521 valid loss 0.857
train loss 0.519 valid loss 0.858
train loss 0.517 valid loss 0.858
train loss 0.515 valid loss 0.858
train loss 0.513 valid loss 0.858
train loss 0.511 valid loss 0.857
train loss 0.509 valid loss 0.857
train loss 0.506 valid loss 0.857


## MF with bias

In [41]:
class MF_bias(nn.Module):
    def __init__(self, num_users, num_items, emb_size=100):
        super(MF_bias, self).__init__()
        self.user_emb = nn.Embedding(num_users, emb_size)
        self.user_bias = nn.Embedding(num_users, 1)
        self.item_emb = nn.Embedding(num_items, emb_size)
        self.item_bias = nn.Embedding(num_items, 1)
        # init 
        self.user_emb.weight.data.uniform_(0,0.05)
        self.item_emb.weight.data.uniform_(0,0.05)
        self.user_bias.weight.data.uniform_(-0.01,0.01)
        self.item_bias.weight.data.uniform_(-0.01,0.01)
        
    def forward(self, u, v):
        U = self.user_emb(u)
        V = self.item_emb(v)
        b_u = self.user_bias(u).squeeze()
        b_v = self.item_bias(v).squeeze()
        return (U*V).sum(1) +  b_u  + b_v

In [42]:
model = MF_bias(num_users, num_items, emb_size=100) #.cuda()

In [43]:
train_epocs(model, epochs=15, lr=0.1, wd=1e-5)

train loss 12.939 valid loss 4.321
train loss 4.119 valid loss 3.904
train loss 3.768 valid loss 3.603
train loss 2.264 valid loss 1.227
train loss 0.770 valid loss 1.797
train loss 1.851 valid loss 2.498
train loss 2.509 valid loss 2.285
train loss 2.114 valid loss 1.562
train loss 1.286 valid loss 1.160
train loss 0.950 valid loss 1.423
train loss 1.272 valid loss 1.580
train loss 1.279 valid loss 1.283
train loss 0.901 valid loss 1.042
train loss 0.821 valid loss 1.066
train loss 1.043 valid loss 1.144


In [44]:
train_epocs(model, epochs=10, lr=0.01, wd=1e-5)

train loss 1.190 valid loss 0.973
train loss 0.883 valid loss 0.932
train loss 0.713 valid loss 0.961
train loss 0.664 valid loss 0.998
train loss 0.681 valid loss 1.005
train loss 0.704 valid loss 0.983
train loss 0.703 valid loss 0.948
train loss 0.683 valid loss 0.917
train loss 0.660 valid loss 0.895
train loss 0.644 valid loss 0.886


In [45]:
train_epocs(model, epochs=10, lr=0.001, wd=1e-5)

train loss 0.638 valid loss 0.881
train loss 0.631 valid loss 0.877
train loss 0.624 valid loss 0.873
train loss 0.619 valid loss 0.871
train loss 0.614 valid loss 0.869
train loss 0.609 valid loss 0.868
train loss 0.605 valid loss 0.867
train loss 0.602 valid loss 0.866
train loss 0.599 valid loss 0.865
train loss 0.596 valid loss 0.864


In [46]:
train_epocs(model, epochs=10, lr=0.001, wd=1e-5)

train loss 0.594 valid loss 0.861
train loss 0.591 valid loss 0.859
train loss 0.588 valid loss 0.857
train loss 0.586 valid loss 0.856
train loss 0.584 valid loss 0.856
train loss 0.582 valid loss 0.855
train loss 0.580 valid loss 0.855
train loss 0.578 valid loss 0.856
train loss 0.576 valid loss 0.856
train loss 0.574 valid loss 0.857


In [47]:
train_epocs(model, epochs=10, lr=0.001, wd=1e-5)

train loss 0.571 valid loss 0.857
train loss 0.569 valid loss 0.858
train loss 0.567 valid loss 0.858
train loss 0.565 valid loss 0.859
train loss 0.563 valid loss 0.860
train loss 0.561 valid loss 0.860
train loss 0.558 valid loss 0.861
train loss 0.556 valid loss 0.861
train loss 0.554 valid loss 0.861
train loss 0.551 valid loss 0.861


Note that these models are susceptible to weight initialization, optimization algorithm and regularization.

# References
* This notebook is based on [lesson 5 of Jeremy Howard's Deep Learning Course](https://github.com/fastai/fastai/blob/master/courses/dl1/lesson5-movielens.ipynb)