<a href="https://colab.research.google.com/github/arunoda/fastai-v4/blob/master/07_1_colab_filtering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Coloabrative Filtering with fast.ai**

## Setting Up on Colab

You only need to run this on Colab.

In [1]:
!pip install fastai2 > /dev/null 2>&1
!git clone https://github.com/arunoda/fastai-v4 > /dev/null 2>&1
%cd fastai-v4

/content/fastai-v4


In [0]:
from fastai2.collab import *
from fastai2.tabular.all import *

## **Loading the Dataset**

Here we are going to use a Mini dataset from Movie Lens.

In [3]:
data_path = untar_data(URLs.ML_100k)

In [4]:
!ls -all {data_path}

total 15784
drwxr-xr-x 2 root root    4096 Jun  4 08:28 .
drwxr-xr-x 3 root root    4096 Jun  4 08:28 ..
-rw-r--r-- 1 root root     716 Jun  4 08:28 allbut.pl
-rw-r--r-- 1 root root     643 Jun  4 08:28 mku.sh
-rw-r--r-- 1 root root    6750 Jun  4 08:28 README
-rw-r--r-- 1 root root 1586544 Jun  4 08:28 u1.base
-rw-r--r-- 1 root root  392629 Jun  4 08:28 u1.test
-rw-r--r-- 1 root root 1583948 Jun  4 08:28 u2.base
-rw-r--r-- 1 root root  395225 Jun  4 08:28 u2.test
-rw-r--r-- 1 root root 1582546 Jun  4 08:28 u3.base
-rw-r--r-- 1 root root  396627 Jun  4 08:28 u3.test
-rw-r--r-- 1 root root 1581878 Jun  4 08:28 u4.base
-rw-r--r-- 1 root root  397295 Jun  4 08:28 u4.test
-rw-r--r-- 1 root root 1581776 Jun  4 08:28 u5.base
-rw-r--r-- 1 root root  397397 Jun  4 08:28 u5.test
-rw-r--r-- 1 root root 1792501 Jun  4 08:28 ua.base
-rw-r--r-- 1 root root  186672 Jun  4 08:28 ua.test
-rw-r--r-- 1 root root 1792476 Jun  4 08:28 ub.base
-rw-r--r-- 1 root root  186697 Jun  4 08:28 ub.test
-rw-r--r-- 

In [5]:
!head {data_path/"u.data"}

196	242	3	881250949
186	302	3	891717742
22	377	1	878887116
244	51	2	880606923
166	346	1	886397596
298	474	4	884182806
115	265	2	881171488
253	465	5	891628467
305	451	3	886324817
6	86	3	883603013


This is the file we want. Here are those fields:

```
user, movie, rating, timestamp
```

In [0]:
df_ml = pd.read_csv(data_path/"u.data", delimiter = "\t", header=None, names=["user", "movie", "rating", "timestamp"])


In [8]:
df_ml.head()

Unnamed: 0,user,movie,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [9]:
!head {data_path/"u.item"}

1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0
2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
4|Get Shorty (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Get%20Shorty%20(1995)|0|1|0|0|0|1|0|0|1|0|0|0|0|0|0|0|0|0|0
5|Copycat (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Copycat%20(1995)|0|0|0|0|0|0|1|0|1|0|0|0|0|0|0|0|1|0|0
6|Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)|01-Jan-1995||http://us.imdb.com/Title?Yao+a+yao+yao+dao+waipo+qiao+(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0
7|Twelve Monkeys (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Twelve%20Monkeys%20(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|1|0|0|0
8|Babe (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Babe%20(1995)|0|0|0|0|1

In [0]:
movies = pd.read_csv(data_path/"u.item", delimiter="|", encoding='latin-1', header=None, names=("movie", "title"), usecols=(0,1))

In [30]:
movies.head()

Unnamed: 0,movie,title
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)


**Now we are going to join these two tables. We do that using the foreign key of `movie`** 

In [33]:
df_ml = df_ml.merge(movies)
df_ml.tail()

Unnamed: 0,user,movie,rating,timestamp,title
99995,840,1674,4,891211682,Mamma Roma (1962)
99996,655,1640,3,888474646,"Eighth Day, The (1996)"
99997,655,1637,3,888984255,Girls Town (1996)
99998,655,1630,3,887428735,"Silence of the Palace, The (Saimt el Qusur) (1994)"
99999,655,1641,3,887427810,Dadetown (1995)


In [0]:
dls_bs_5 = CollabDataLoaders.from_df(df_ml, item_name="title", bs=5)

In [56]:
dls_bs_5.classes

{'title': (#1639) ['#na#',"'Til There Was You (1997)",'1-900 (1994)','101 Dalmatians (1996)','12 Angry Men (1957)','187 (1997)','2 Days in the Valley (1996)','20,000 Leagues Under the Sea (1954)','2001: A Space Odyssey (1968)','3 Ninjas: High Noon At Mega Mountain (1998)'...],
 'user': (#944) ['#na#',1,2,3,4,5,6,7,8,9...]}

In [55]:
dls_bs_5.show_batch()

Unnamed: 0,user,title,rating
0,789,Fargo (1996),5
1,479,Face/Off (1997),3
2,52,Fargo (1996),4
3,506,To Die For (1995),2
4,198,Raise the Red Lantern (1991),3
5,653,"People vs. Larry Flynt, The (1996)",3
6,655,Female Perversions (1996),3
7,87,Star Trek III: The Search for Spock (1984),4
8,892,Absolute Power (1997),4
9,87,Braveheart (1995),4


In [0]:
x, y = dls_bs_5.one_batch()

In [63]:
x

tensor([[ 123,  866],
        [ 645,  717],
        [ 928,  567],
        [ 299,  314],
        [ 911, 1468]], device='cuda:0')

In [64]:
y

tensor([[4],
        [2],
        [3],
        [4],
        [2]], device='cuda:0')

So. Based on above data, first col is user and second is movies. `y` containes the result.



In [72]:
userId = x[0][0]
movieId = x[0][1]
rating = y[0][0]
movieName = dls_bs_5.classes['title'][movieId]
print(f'For example user {userId} gave {rating} ratings to the movie "{movieId}({movieName})".')

For example user 123 gave 4 ratings to the movie "866(Looking for Richard (1996))".


## **Idea of Colab Filtering**

For now, let's give each movies a 2 factors and 2 factors for each user. So, it'll look like this:

In [0]:
user_factors = torch.randn((len(dls_bs_5.classes['user']), 2), requires_grad=True)
movie_factors = torch.randn((len(dls_bs_5.classes['title']), 2), requires_grad=True)

Let's pick the factors for `49` th user and `581` movie.

In [74]:
user_factors[49], movie_factors[581]

(tensor([-0.8807,  0.2404], grad_fn=<SelectBackward>),
 tensor([0.2815, 1.0626], grad_fn=<SelectBackward>))

Let's create single number from these.

In [75]:
(user_factors[49] * movie_factors[581]).sum()

tensor(0.0076, grad_fn=<SumBackward0>)

But, let's make sure we are within the 0-5 range.

In [0]:
pred_value = sigmoid_range((user_factors[49] * movie_factors[581]).sum(), 0, 5.1)

Now we need to compare that with the actual value.

In [78]:
(y[0][0] - pred_value).abs()

tensor(1.4403, device='cuda:0', grad_fn=<AbsBackward>)

**That's it. Now we need to get a gradient out of this and update factors**

## **Doing this for a Batch**

Let's try to do this for our whole mini-batch


In [79]:
x

tensor([[ 123,  866],
        [ 645,  717],
        [ 928,  567],
        [ 299,  314],
        [ 911, 1468]], device='cuda:0')

In [80]:
user_factors

tensor([[ 0.8834, -0.2239],
        [ 1.6923, -0.9428],
        [ 0.6841,  0.6890],
        ...,
        [ 0.9306,  2.2085],
        [ 0.6926, -1.8274],
        [-0.1610,  1.3395]], requires_grad=True)

Basically, we need to pick users in `x` from `user_factors`. But if we do that, it's not a operation which can do gradients. That's where one hot encoding comes to play. So, let's do it.

In [81]:
one_hot([1], 10)

tensor([0, 1, 0, 0, 0, 0, 0, 0, 0, 0], dtype=torch.uint8)

In [0]:
one_hot_i = one_hot(536, len(user_factors)).float()

In [83]:
one_hot_i.shape, user_factors.shape

(torch.Size([944]), torch.Size([944, 2]))

In [84]:
one_hot_i @ user_factors

tensor([-1.2912, -0.3411], grad_fn=<SqueezeBackward3>)

**See. That's supported with gradient calculation.**

Now let's do it for the whole batch


In [85]:
one_hot_users = torch.stack([one_hot(i, len(user_factors)) for i in x[:, 0]]).float()
one_hot_users.shape

torch.Size([5, 944])

In [86]:
one_hot_movies = torch.stack([one_hot(i, len(movie_factors)) for i in x[:, 1]]).float()
one_hot_movies.shape

torch.Size([5, 1626])

In [87]:
picked_user_factors = one_hot_users @ user_factors
picked_user_factors

tensor([[ 0.2329,  0.8767],
        [ 0.9639, -1.8107],
        [ 0.3526, -1.7974],
        [ 0.0137,  0.3235],
        [-0.8789,  0.4174]], grad_fn=<MmBackward>)

In [88]:
picked_movie_factors = one_hot_movies @ movie_factors
picked_movie_factors

tensor([[-0.6045, -1.0928],
        [-0.1028, -1.1786],
        [-0.8751,  1.2603],
        [ 0.4505,  0.7545],
        [-1.3938,  0.8778]], grad_fn=<MmBackward>)

In [89]:
score_preds = sigmoid_range((picked_user_factors * picked_movie_factors).sum(dim=1), 0, 5.1)
score_preds

tensor([1.2748, 4.5105, 0.3613, 2.8673, 4.2371], grad_fn=<AddBackward0>)

In [90]:
loss = (score_preds - y.cpu()[:, 0]).abs().sum()
loss

tensor(11.2443, grad_fn=<SumBackward0>)

In [0]:
loss.backward()

In [92]:
movie_factors.grad.sum(), user_factors.grad.sum()

(tensor(-1.7714), tensor(-1.0570))

**Above grad some has no meaning. I just wanted show that, getting gradient is possible.**


In [102]:
## Let's create a proper data loader
dls = CollabDataLoaders.from_df(df_ml, item_name="title", bs=64)
dls.show_batch()

Unnamed: 0,user,title,rating
0,500,Dragonheart (1996),3
1,316,Amadeus (1984),5
2,708,Gang Related (1997),1
3,554,Toy Story (1995),3
4,416,Bulletproof (1996),3
5,476,Butch Cassidy and the Sundance Kid (1969),3
6,671,Romper Stomper (1992),1
7,714,Con Air (1997),5
8,371,"Rock, The (1996)",3
9,363,So I Married an Axe Murderer (1993),5


In [166]:
n_users = len(dls.classes['user'])
n_movies = len(dls.classes['title'])

n_users, n_movies

(944, 1630)

## Create a Model

Let's create a PyTorch model based on these.

* Create factors
* Then user one_hot encoding to select a given movie
* Then implement model.

In [0]:
class OneHotModel(Module):
  def __init__(self, n_users, n_movies, n_factors):
    self.user_factors = nn.Parameter(torch.randn((n_users, n_factors))).cuda()
    self.movie_factors = nn.Parameter(torch.randn((n_movies, n_factors))).cuda()

  def forward(self, batch):
    n_users = len(self.user_factors)
    n_movies = len(self.movie_factors)
    
    users = torch.stack([one_hot(i, n_users).cuda() for i in batch[:, 0]]).float() @ self.user_factors
    movies = torch.stack([one_hot(i, n_movies).cuda() for i in batch[:, 1]]).float() @ self.movie_factors

    return (users * movies).sum(dim=1)

In [151]:
## Let's try it
model = OneHotModel(2000, 2000, 10)
model.forward(x.cuda())

tensor([ 1.1570,  2.4642, -3.0751,  1.0352,  0.6178], device='cuda:0',
       grad_fn=<SumBackward1>)

In [0]:
## Create the learner
model = OneHotModel(len(dls.classes['user']), len(dls.classes['title']), 50)
learn = Learner(dls, model, loss_func=MSELossFlat())

In [153]:
learn.fit_one_cycle(5, 5e-5)

IndexError: ignored

This is something with fast.ai not compatible with our model.
So, we need to use Embedding

In [0]:
class EmbedModel(Module):
  def __init__(self, n_users, n_movies, n_factors):
    self.user_factors = Embedding(n_users, n_factors)
    self.movie_factors = Embedding(n_movies, n_factors)

  def forward(self, batch):
    ## See this is why we call embedding. Here we simply call for the index
    ## But it's a mathematic operation which does the same thing as one_hot
    users = self.user_factors(batch[:, 0])
    movies = self.movie_factors(batch[:, 1])

    return (users * movies).sum(dim=1)

In [0]:
model = EmbedModel(len(dls.classes['user']), len(dls.classes['title']), 50)
learn = Learner(dls, model, loss_func=MSELossFlat())

In [158]:
learn.fit_one_cycle(5, 5e-3)

epoch,train_loss,valid_loss,time
0,1.357626,1.274033,00:07
1,1.107598,1.071771,00:07
2,0.913526,0.966176,00:07
3,0.823317,0.884766,00:08
4,0.731536,0.866175,00:08


**Let's try this with a proper y_range**

In [0]:
class EmbedModelWithYR(Module):
  def __init__(self, n_users, n_movies, n_factors):
    self.user_factors = Embedding(n_users, n_factors)
    self.movie_factors = Embedding(n_movies, n_factors)

  def forward(self, batch):
    ## See this is why we call embedding. Here we simply call for the index
    ## But it's a mathematic operation behind d
    users = self.user_factors(batch[:, 0])
    movies = self.movie_factors(batch[:, 1])

    return sigmoid_range((users * movies).sum(dim=1), 0.5, 5.1)

In [0]:
model = EmbedModelWithYR(len(dls.classes['user']), len(dls.classes['title']), 50)
learn = Learner(dls, model, loss_func=MSELossFlat())

In [162]:
learn.fit_one_cycle(5, 5e-3)

epoch,train_loss,valid_loss,time
0,0.994837,0.986214,00:08
1,0.87088,0.910422,00:08
2,0.67379,0.862666,00:08
3,0.51233,0.864105,00:08
4,0.40588,0.866904,00:08


**Now, let's add some bias**

In [0]:
class ColabWithBias(Module):
  def __init__(self, n_users, n_movies, n_factors):
    self.user_factors = Embedding(n_users, n_factors)
    self.movie_factors = Embedding(n_movies, n_factors)
    self.user_bias = Embedding(n_users, 1)
    self.movie_bias = Embedding(n_movies, 1)

  def forward(self, batch):
    users = self.user_factors(batch[:, 0])
    movies = self.movie_factors(batch[:, 1])
    user_bias = self.user_bias(batch[:, 0]).reshape((-1))
    movie_bias = self.movie_bias(batch[:, 1]).reshape((-1))

    res = (users * movies).sum(dim=1) + user_bias + movie_bias
    return sigmoid_range(res, 0.5, 5.1)

In [241]:
model = ColabWithBias(n_users, n_movies, 3).cuda()
model.forward(x)

tensor([2.8126, 2.7686, 2.7937, 2.8139, 2.7932], device='cuda:0',
       grad_fn=<AddBackward0>)

In [0]:
learn = Learner(dls, ColabWithBias(n_users, n_movies, 50), loss_func=MSELossFlat())

In [243]:
learn.fit_one_cycle(5, 5e-3)

epoch,train_loss,valid_loss,time
0,0.939735,0.926143,00:08
1,0.802474,0.855994,00:08
2,0.632082,0.845003,00:08
3,0.432442,0.86178,00:08
4,0.347394,0.867099,00:08


**See. Now we can see some overfitting here**

* training loss does down
* but the validation loss going high after it hits 0.84

In this case, we can try to use some weight decay to slow down the learning.

Let's give it a try.

In [0]:
learn = Learner(dls, ColabWithBias(n_users, n_movies, 50), loss_func=MSELossFlat())

In [252]:
learn.fit_one_cycle(10, 5e-3, wd=0.2)

epoch,train_loss,valid_loss,time
0,1.036012,1.020838,00:08
1,0.879198,0.91705,00:08
2,0.865076,0.900135,00:08
3,0.863017,0.878049,00:08
4,0.804648,0.855257,00:08
5,0.764002,0.84155,00:08
6,0.711642,0.833144,00:08
7,0.654105,0.82768,00:08
8,0.604046,0.826167,00:08
9,0.571788,0.826053,00:08


**As you can see, weight decay slow down training but it minimize the overfitting**

Here's `wd` is a hyperparameter. We need to tune it.