# Recommenders 2 : Pytorch and Recommenders

In this practical session, we dive a little more into [pytorch](https://pytorch.org/docs/stable/index.html) and propose to re-implement two classical matrix-factorization models with a neural network toolkit.

Also, in addition to using only rating, we propose to add text.


## WHAT IS PYTORCH?

It’s a Python-based scientific computing package targeted at two sets of audiences:

- A replacement for NumPy to use the power of GPUs
- a deep learning research platform that provides maximum flexibility and speed

### Tensors : the main unit

Tensors are similar to NumPy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.

```python
#initialize an empty 5x3 matrix
x = torch.empty(5, 3)
print(x)
```

```
out[]:

tensor([[8.3665e+22, 4.5580e-41, 1.6025e-03],
        [3.0763e-41, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 3.4438e-41],
        [0.0000e+00, 4.8901e-36, 2.8026e-45],
        [6.6121e+31, 0.0000e+00, 9.1084e-44]])
        
```

### Most useful functions:


```python
#initialize an empty 5x3 matrix
x = torch.empty(5, 3)
print(x.size())
```

### Full tutorial: 

a full pytorch tutorial can be found [here](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) do not hesitate to take a couple of minutes to skim read it. Plenty of [ressources](https://pytorch.org/resources) are available online. Also, you can have a look at the [extensive pytorch documentation](https://pytorch.org/docs/stable/index.html). 

Here, as we are defining neural networks, we mainly use the `torch.nn` module which contains most classical deep learning building blocks

### What's interesting:

Pytorch has Automatic differentiation: You only have to compute a loss function to obtain gradients automatically. How it works is detailed [here](https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-tensors-and-autograd)


# 1)  Load & Prepare Data

As usual: first thing to do is to load data: here we use an amazon review corpus

In [None]:
import gzip , json

#Here data is in json format: one "dict" per line.
def jsons2tuple(s,*keys):
    js = json.loads(s)
    return tuple([js[k] for k in keys])

#we directly read the gzip file
with gzip.open("dataset/reviews_Amazon_Instant_Video_5.json.gz","r") as f:
    data = [jsons2tuple(x,"reviewerID","asin","reviewText","overall") for x in f]

#how one exemple looks like
print("One tuple: user,item,review,rating")
print(data[:1])

train = []
test = []

#we take 80% for train and 20% for valid/test
for i,x in enumerate(data):
    if i % 8 ==0:
        test.append(x)
    else:
        train.append(x)

print(len(train))
print(len(test))

##  Prepare Data
We loaded raw data, now we prepare it:

- (1) user and items are remaped to ids from 0->len(users) /0->len(items)
- (2) reviews are tokenized using simple split

In [None]:
from collections import Counter

i_dic = {}
u_dic = {}
word_count = Counter()

prep_train = []

def text_preprocess(t):
    """
    a function to preprocess the text if needed
    takes str, returns list str
    """
    return t.split(" ")


# User and Items to key + split text + Count common words (to prune)

for uid,iid,text,rating in train:
    uk = u_dic.setdefault(uid,len(u_dic))
    ik = i_dic.setdefault(iid,len(i_dic))
    ptext = text_preprocess(text)
    word_count.update(ptext)    
    prep_train.append((uk,ik,ptext,rating))

    
# Unknown users/items are set to None    
    
prep_test = []

for uid,iid,text,rating in test:
    uk = u_dic.get(uid,None)
    ik = i_dic.get(iid,None)   
    ptext = text_preprocess(text)
    prep_test.append((uk,ik,ptext,rating))
    

# we further divide "test" in validation and test set
cutout = len(prep_test)//2
prep_val = prep_test[:cutout]
prep_test = prep_test[cutout:]
    
    

#  Pytorch Models

Now that we have loaded and prepared the data, we can define the models.


## 1) Classic SVD (with mean)

First we propose to implement a simple SVD:
### $$ \min\limits_{U,I}\sum\limits_{(u,i)} \underbrace{(r_{ui} -  (I_i^TU_u + \mu))^2}_\text{minimization} + \underbrace{\lambda(||U_u||^2+||I_u||^2 + \mu) }_\text{regularization} $$

where prediction is done in the following way:
### $$r_{ui} = \mu + U_u.I_i $$

where $\mu$ is the global mean,  $U_u$ a user embedding and $I_i$ an item embedding

### STEPS:
 To implement such model in pytorch, we need to do multiple things:
 
 - (1) model definition
 - (2) loss function
 - (3) evaluation
 - (4) training/eval loop


#### (1) Model definition

A model class typically extends `nn.Module`, the Neural network module. It is a convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.

One should define two functions: `__init__` and `forward`.

- `__init__` is used to initialize the model parameters
- `forward` is the net transformation from input to output. In fact, when doing `moduleClass(input)` you call this method.

##### (a) Initialization

Our model has different weigths:

- the user profiles (also called user embeddings) $U$
- the item profiles (also called user embeddings) $I$
- the mean bias $\mu$


##### (b) input to output operation
Technically, the prediction as defined earlier can be seen as just a dot product between two embeddings $U_u$ and $I_i$ plus the mean rating:

- `torch.sum(embed_u*embed_i,1) + self.mean` is equivalent to $r_{ui} = \mu + U_u.I_i $ 
- the `.squeeze(1)` operation is a shape operation to remove the dimension 1 (indexing starts at 0) akin to reshaping the matrix from `(batch_size,1,latent_size)` to `(batch_size,latent_size)`
- for reference, the inverse operation is `.unsqueeze()`
- we return weights to regularize them


### (TODO) Just to make sure you were following: complete the following `forward` method

In [None]:
import torch
import torch.nn as nn


# The model define as a class, inheriting from nn.Module
class ClassicMF(nn.Module):
    
    #(a) Init
    def __init__(self,nb_users,nb_items,latent_size):
        super(ClassicMF, self).__init__()
        
        #Embedding layers
        self.users = nn.Embedding(nb_users, latent_size)
        self.items = nn.Embedding(nb_items, latent_size)

        #The mean bias
        self.mean = nn.Parameter(torch.FloatTensor(1,).fill_(3))
        
        #initialize weights with very small values
        nn.init.normal_(self.users.weight,0,0.01)
        nn.init.normal_(self.items.weight,0,0.01)

    
    # (b) How we compute the prediction (from input to output)
    def forward(self, user, item): ## method called when doing ClassicMF(user,item)
        
        embed_u,embed_i = self.users(user).squeeze(1),self.items(item).squeeze(1)
        out =   ### TO COMPLETE

        return out, embed_u, embed_i, self.mean  # We return prediction + weights to regularize them
    
    


#### (2-4) full train loop

The train loop is organized around the [Dataloader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) class which Combines a dataset and a sampler, and provides single- or multi-process iterators over the dataset.

We just redefine a collate function

> collate_fn (callable, optional) – merges a list of samples to form a mini-batch.


**NOTE:** The dataset argument can be a list instead of a "Dataset" instance (works by duck typing)
    

##### The train loop sequence is the following:
    
[Dataset ==Dataloader==> Batch (not prepared) ==collate_fn==> Batch (prepared) ==Model.forward==> Prediction =loss_fn=> loss <-> truth 

1] PREDICT
- (a) The dataloader samples training exemples from the dataset (which is a list)
- (b) The collate_fn prepares the minibatch of training exemples
- (c) The prediction is made by feeding the minibatch in the model
- (d) The loss is computed on the prediction via a loss function

2] OPTIMIZE
- (e) Gradients are computed by automatic backard propagation
- (f) Parameters are updated using computed gradients

In [None]:
from torch.utils.data import DataLoader
import torch.nn.functional as F


# HyperParameters
n_epochs = 20
batch_size = 16
num_feat = 25
lr = 0.01
reg = 0.001


#(b) Collate function => Creates tensor batches to feed model during training
def tuple_batch(l):
    '''
    input l: list of (user,item,review, rating tuples)
    output: formatted batches (in torch tensors)

    takes n-tuples and create batch
    text -> seq word #id
    '''
    users, items, _,ratings = zip(*l) #we ignore review text
    users_t = torch.LongTensor(users)
    items_t = torch.LongTensor(items)
    ratings_t = torch.FloatTensor(ratings)
    
    return users_t, items_t, ratings_t
    


#(d) Loss function => Combines MSE and L2
def loss_func(pred,ratings_t,reg,*params):
    '''
    mse loss combined with l2 regularization.
    params assumed 2-dimension
    '''
    mse = F.mse_loss(pred,ratings_t,size_average=False)
    l2 = 0
    for p in params:
        l2 += torch.mean(p.norm(2,-1))
        
    return (mse/pred.size(0)) + reg*l2 , mse
    
#
# Training script starts here
#    


model = ClassicMF(len(u_dic),len(i_dic),num_feat)

# (a) dataloader will sample data from datasets using collate_fn tuple_batch
dataloader_train = DataLoader(prep_train, batch_size=batch_size, shuffle=True, num_workers=0, collate_fn=tuple_batch)
dataloader_val = DataLoader(prep_val, batch_size=batch_size, shuffle=True, num_workers=0, collate_fn=tuple_batch)
dataloader_test = DataLoader(prep_test, batch_size=batch_size, shuffle=False, num_workers=0, collate_fn=tuple_batch)

optimizer = torch.optim.SGD(model.parameters(),lr=lr)

# Train loop
for e in range(n_epochs):
    mean_loss = [0,0,0] #train/val/test

    ## Training loss (the one we train with)
    
    for users_t,items_t,ratings_t in dataloader_train:
        model.train() # set the model on train mode
        model.zero_grad() # reset gradients
        
        #(c) predictions are made by the model
        pred,*params = model(users_t,items_t)
        
        #(d) loss computed on predictions, we added regularization
        loss,mse_loss = loss_func(pred,ratings_t,reg,*params)
        
        loss.backward() #(e) backpropagating to get gradients
        
        mean_loss[0] += mse_loss
        optimizer.step() #(f) updating parameters
    
    ## Validation loss (no training)

    for users_t,items_t,ratings_t in dataloader_val:
        model.eval() # Inference mode
        pred,*params = model(users_t,items_t)
        _,mse_loss = loss_func(pred,ratings_t,reg,*params)
    
        mean_loss[1] += mse_loss    
        
    ## Test loss (no training)
        
    for users_t,items_t,ratings_t in dataloader_test:
        model.eval()
        pred,*params = model(users_t,items_t)
        _,mse_loss = loss_func(pred,ratings_t,reg,*params)
    
        mean_loss[2] += mse_loss    

    print("-"*25)
    print("epoch",e, "mse (train/val/test)", round((mean_loss[0]/len(prep_train)).item(),3),"/",  round((mean_loss[1]/len(prep_val)).item(),3),"/",  round((mean_loss[2]/len(prep_test)).item(),3))
    
    



## (Your turn) Koren 2009 model:

Here, this model simply adds a bias for each user and for each item

### $$ \min\limits_{U,I}\sum\limits_{(u,i)} \underbrace{(r_{ui} -  (I_i^TU_u + \mu+ \mu_i+\mu_u))^2}_\text{minimization} + \underbrace{\lambda(||U_u||^2+||I_u||^2 + \mu  + \mu+ \mu_i+\mu_u) }_\text{regularization} $$


### $$r_{ui} = \mu + \mu_i + \mu_u + U_u.I_i $$

### TODO:

- (a) complete the model initialization
- (b) complete the forward method

In [None]:
class KorenMF(nn.Module):

    def __init__(self,nb_users,nb_items,latent_size):
        super(KorenMF, self).__init__()
        
        self.users = ##
        self.items = ###
        self.umean = ###
        self.imean = ###
        self.gmean =  ###

        nn.init.normal_(self.users.weight,0,0.01)
        nn.init.normal_(self.items.weight,0,0.01)
        nn.init.normal_(self.umean.weight,0.5,1)
        nn.init.normal_(self.imean.weight,0.5,1)
        
        
    def forward(self, user,item):
        embed_u,embed_i = self.users(user).squeeze(1) , self.items(item).squeeze(1)
        umean, imean = self.umean(user).squeeze(-1) , self.imean(item).squeeze(-1)
        
        out = ##############

        return out , embed_u, embed_i, umean , imean , self.gmean

### (TODO) Here, train loop stays the same, you only have to change the model

In [None]:
from torch.utils.data import DataLoader
import torch.nn.functional as F

n_epochs = 50
batch_size = 16
num_feat = 25
lr = 0.01
reg = 0.001



def tuple_batch(l):
    '''
    input l: list of (user,item,review, rating tuples)
    output: formatted batches (in torch tensors)

    takes n-tuples and create batch
    text -> seq word #id
    '''
    users, items, _ ,ratings = zip(*l) # we ignore reviews for now
    users_t = torch.LongTensor(users)
    items_t = torch.LongTensor(items)
    ratings_t = torch.FloatTensor(ratings)
    
    return users_t,items_t,ratings_t


def loss_func(pred,ratings_t,reg,*params):
    '''
    mse loss combined with l2 regularization.
    params assumed 2-dimension
    '''
    mse = F.mse_loss(pred,ratings_t,size_average=False)
    l2 = 0
    for p in params:
        l2 += torch.mean(p.norm(2,-1))
        
    return (mse/pred.size(0)) + reg*l2 , mse
    

model =  ## TO COMPLETE


dataloader_train = DataLoader(prep_train, batch_size=batch_size, shuffle=True, num_workers=3, collate_fn=tuple_batch)
dataloader_val = DataLoader(prep_val, batch_size=batch_size, shuffle=True, num_workers=3, collate_fn=tuple_batch)
dataloader_test = DataLoader(prep_test, batch_size=batch_size, shuffle=False, num_workers=3, collate_fn=tuple_batch)

optimizer = torch.optim.SGD(model.parameters(),lr=lr)


for e in range(n_epochs):
    mean_loss = [0,0,0] #train/val/test

    for users_t,items_t,ratings_t in dataloader_train:
        model.train()
        model.zero_grad()
        pred,*params = model(users_t,items_t)

        loss,mse_loss = loss_func(pred,ratings_t,reg,*params)
        loss.backward()
        
        mean_loss[0] += mse_loss
        optimizer.step()
    
    

    for users_t,items_t,ratings_t in dataloader_val:
        model.eval()
        pred,*params = model(users_t,items_t)
        _,mse_loss = loss_func(pred,ratings_t,reg,*params)
    
        mean_loss[1] += mse_loss    
        
    for users_t,items_t,ratings_t in dataloader_test:
        model.eval()
        pred,*params = model(users_t,items_t)
        _,mse_loss = loss_func(pred,ratings_t,reg,*params)
    
        mean_loss[2] += mse_loss    

    print("-"*25)
    print("epoch",e, "mse (train/val/test)", round((mean_loss[0]/len(prep_train)).item(),3),"/",  round((mean_loss[1]/len(prep_val)).item(),3),"/",  round((mean_loss[2]/len(prep_test)).item(),3))
    
    

# 2) Taking text into account


## A) Let's first predict the rating from review text

To do so we need to:

- (1) Change the collate function to take text into account
- (2) Add word embedding in the model


#### (1) Complete the new collate function

In [1]:
#Changing the collate function
from random import shuffle

max_words = 10000
word_dic = {k:i for i,(k,v) in enumerate(word_count.most_common(max_words),2)} # word -> id (pad = 0 ,unk=1)

def tuple_batch_text(l):
    '''
    input l: list of (user,item,review, rating tuples)
    output: formatted batches (in torch tensors)

    takes n-tuples and create batch
    text -> seq word #id
    '''
    users, items, reviews,ratings = zip(*l)
    users_t = ########
    items_t = ########
    ratings_t = #######
    
    
    max_len = ######## max review length

    reviews_t = torch.LongTensor(#######).fill_(0)  # what is the dimension of input tensor ?
    
    for i,rev in enumerate(reviews):
        rev_words = [word_dic.get(w,1) for w in rev]
        rev_t = torch.LongTensor(rev_words)
        reviews_t[i,:len(rev_words)] = rev_t
    
    return users_t,items_t,reviews_t,ratings_t


SyntaxError: invalid syntax (<ipython-input-1-958b21af9765>, line 16)

#### (2) Rating prediction from text model
you can use [EmbeddingBag](https://pytorch.org/docs/stable/nn.html#torch.nn.EmbeddingBag) to direclty combine word embeddings.

In [None]:
import torch
import torch.nn as nn
class RatingPred(nn.Module):
    
    def __init__(self,dic_size,word_latent_size):
        super(RatingPred, self).__init__()
        self.text_emb = #####################
        self.to_rating = ####################

        
    def forward(self, text):

        text_emb = #######
        pred_rating = ########

        return pred_rating ##### Should be a 1-dim tensor of all predicted ratings
    

In [None]:

from torch.utils.data import DataLoader
import torch.nn.functional as F

n_epochs = 50
batch_size = 16
num_feat = 25
size_embedding = 50
lr = 0.001
reg = 0.01






def loss_func(pred,ratings_t):
    '''
    mse loss.
    '''
    mse = F.mse_loss(pred,ratings_t)
    return mse
    


model = RatingPred(max_words,size_embedding)


dataloader_train = DataLoader(prep_train, batch_size=batch_size, shuffle=True, num_workers=0, collate_fn=tuple_batch_text)
dataloader_val = DataLoader(prep_val, batch_size=batch_size, shuffle=True, num_workers=0, collate_fn=tuple_batch_text)
dataloader_test = DataLoader(prep_test, batch_size=batch_size, shuffle=False, num_workers=0, collate_fn=tuple_batch_text)

optimizer = torch.optim.Adam(model.parameters(),lr=lr)


for e in range(n_epochs):
    mean_loss = [0,0,0] #train/val/test
    length = [len(dataloader_train),len(dataloader_val),len(dataloader_test)]

    for _,_,reviews_t,ratings_t in dataloader_train:
        model.train()
        model.zero_grad()
        pred_rating = model(reviews_t)
        
        mse_loss = loss_func(pred_rating,ratings_t)
        
        
        mse_loss.backward()
        
        mean_loss[0] += mse_loss
        optimizer.step()
    

    for _,_,reviews_t,ratings_t in dataloader_val:
        model.eval()
        pred_rating = model(reviews_t)
        mse_loss = loss_func(pred_rating,ratings_t)
    
        mean_loss[1] += mse_loss    
        
    for _,_,reviews_t,ratings_t in dataloader_test:
        model.eval()
        pred_rating = model(reviews_t)
        mse_loss = loss_func(pred_rating,ratings_t)
    
        mean_loss[2] += mse_loss    
    
    
    print("-"*25)
    print("epoch",e, "mse (train/val/test)", round((mean_loss[0]/length[0]).item(),3),"/",  round((mean_loss[1]/length[1]).item(),3),"/",  round((mean_loss[2]/length[2]).item(),3))
    

## B) Let's now predict the rating from review text + Profile

To do so we need to:

- (1) Add profiles embedding in the model
- (2) Change forward function
- (3) Add profiles to training loop


In [2]:
import torch
import torch.nn as nn
class RatingPredProfile(nn.Module):
    
    def __init__(self,nb_users,nb_items,dic_size,latent_size):
        super(RatingPredProfile, self).__init__()
        self.text_emb = ###############
        self.to_rating = #############
        self.users = ##############
        self.items = ################

         #init
        nn.init.normal_(self.users.weight,0,0.1)
        nn.init.normal_(self.items.weight,0,0.1)

        
    def forward(self, user,item,text):

        text_emb = ############# get text embeddings
        embed_u,embed_i = ############ get user and items embeddings
        
        concatenation = ######### concatenate them
        
        pred_rating = ######## predict rating

        return pred_rating # 1-dim tensor

SyntaxError: invalid syntax (<ipython-input-2-dbd286c0a4d4>, line 7)

### We should now call our new model in the train loop

In [3]:

from torch.utils.data import DataLoader
import torch.nn.functional as F

n_epochs = 50
batch_size = 16
num_feat = 25
size_embedding = 50
lr = 0.001
reg = 0.01




def loss_func(pred,ratings_t):
    '''
    mse loss.
    '''
    mse = F.mse_loss(pred,ratings_t,size_average=False)
    return mse
    


model = RatingPredProfile(len(u_dic),len(i_dic),max_words,size_embedding)


dataloader_train = DataLoader(prep_train, batch_size=batch_size, shuffle=True, num_workers=0, collate_fn=tuple_batch_text)
dataloader_val = DataLoader(prep_val, batch_size=batch_size, shuffle=True, num_workers=0, collate_fn=tuple_batch_text)
dataloader_test = DataLoader(prep_test, batch_size=batch_size, shuffle=False, num_workers=0, collate_fn=tuple_batch_text)

optimizer = torch.optim.Adam(model.parameters(),lr=lr)


for e in range(n_epochs):
    mean_loss = [0,0,0] #train/val/test

    for _,_,reviews_t,ratings_t in dataloader_train:
        model.train()
        model.zero_grad()
        pred_rating = #######
        
        mse_loss = loss_func(pred_rating,ratings_t)
        
        
        mse_loss.backward()
        
        mean_loss[0] += mse_loss
        optimizer.step()
    

    for _,_,reviews_t,ratings_t in dataloader_val:
        model.eval()
        pred_rating = ########
        mse_loss = loss_func(pred_rating,ratings_t)
    
        mean_loss[1] += mse_loss    
        
    for _,_,reviews_t,ratings_t in dataloader_test:
        model.eval()
        pred_rating = #######
        mse_loss = loss_func(pred_rating,ratings_t)
    
        mean_loss[2] += mse_loss    
    
    
    print("-"*25)
    print("epoch",e, "mse (train/val/test)", round((mean_loss[0]/len(prep_train)).item(),3),"/",  round((mean_loss[1]/len(prep_val)).item(),3),"/",  round((mean_loss[2]/len(prep_test)).item(),3))
    

SyntaxError: invalid syntax (<ipython-input-3-4d9c1bdb5412>, line 40)

#### In reality, text can not be used as input as it's written after item consumption:

## => Let's predict text instead, using the item embedding:


# 3) Predicting Text

#### (2) Adding word embeddings to model



We propose to use the negative sampling loss : 

$$ \big( \log\sigma(real) + \sum\limits_{i=1}^k\mathbb{E}_{v_{b}\sim P_n(\textbf{w})}\log\sigma(-fake) \big)$$

With negative sampling, we "predict" text in a sense akin to $k$-NN

Simply, we have two cosine distances : $real$ and $fake$

- $real$ is the distance between **actual** review words and the predicted embedding 
- $fake$ is the distance between **fake** sampled word and the predicted embedding


The goal is to bring prediction closer to real words than to fake words

### Things to do:

- (a) Tuple batch should be modified to only consider a fixed number of words
- (b) Model should compute the negative sampling loss


In [4]:
#Changing the collate function
from random import shuffle

max_words = 11000
word_dic = {k:i for i,(k,v) in enumerate(word_count.most_common(max_words)[1000:],2)} # word -> id (pad = 0 ,unk=1)

def tuple_batch_piece_text(l):
    '''
    input l: list of (user,item,review, rating tuples)
    output: formatted batches (in torch tensors)

    takes n-tuples and create batch
    text -> seq word #id
    '''
    users, items, reviews,ratings = zip(*l)
    users_t = torch.LongTensor(users)
    items_t = torch.LongTensor(items)
    ratings_t = torch.FloatTensor(ratings)
    
    
    max_len = 5 ## We only consider a subset of words

    reviews_t = ######################
    
    for i,rev in enumerate(reviews):
        rev_words = [word_dic.get(w,1) for w in rev]
        shuffle(rev_words)
        rev_words = rev_words[:max_len]
        rev_t = torch.LongTensor(rev_words)
        reviews_t[i,:len(rev_words)] = rev_t
    
    return users_t,items_t,reviews_t,ratings_t


SyntaxError: invalid syntax (<ipython-input-4-89fd15c0e16f>, line 23)

In [5]:
from torch.nn.functional import normalize

class TextItem(nn.Module):
    
    def __init__(self,nb_users,nb_items,dic_size,latent_size):
        super(TextItem, self).__init__()

        self.items =#############
        self.text_emb = ########
        self.dic_size = ###########
       
        
        #init
        nn.init.normal_(self.items.weight,0,0.1)
        nn.init.normal_(self.text_emb.weight,0,0.1)

        
    def forward(self, item, text):
        
        embed_i = self.items(item).squeeze(1)
        
        real_text = normalize(self.text_emb(text),2)
        fake_text = normalize(self.text_emb(text.clone().random_(2,self.dic_size)),2)

        norm_i = normalize(embed_i,2)
        
        
        dot_real = ################# cosine distance btw real and embed_i
        dot_fake = ################# cosine distance btw fake and embed_i
        
        close = ####### left part of Negative Sampling

        far = ####### right part of Negative Sampling
        

        return embed_i, close, far
    
    def get_text(self,user,item):
        """
        here we embed
        """
        embed_i_text = self.items(item).squeeze(1)                
        return embed_i_text

SyntaxError: invalid syntax (<ipython-input-5-f3c4ee340970>, line 8)

#### Training Loop

What changes: 

- (a) Model
- (b) Loss function

In [6]:
n_epochs = 50
batch_size = 16
num_feat = 25
size_embedding = 50
lr = 0.01
reg = 0.01




def loss_func(close,far):
   
    error = ######
    
    return torch.mean(error)
    


model = ####################


dataloader_train = DataLoader(prep_train, batch_size=batch_size, shuffle=True, num_workers=0, collate_fn=tuple_batch_piece_text)
dataloader_val = DataLoader(prep_val, batch_size=batch_size, shuffle=True, num_workers=0, collate_fn=tuple_batch_piece_text)
dataloader_test = DataLoader(prep_test, batch_size=batch_size, shuffle=False, num_workers=0, collate_fn=tuple_batch_piece_text)

optimizer = torch.optim.Adam(model.parameters(),lr=lr)


for e in range(n_epochs):
    mean_loss = [0,0,0] #train/val/test

    for users_t,items_t,reviews_t,ratings_t in dataloader_train:
        model.train()
        model.zero_grad()
        pred_txt,close,far = model(items_t,reviews_t)

        loss = loss_func(close,far)
        loss.backward()
        
        mean_loss[0] += mse_loss
        optimizer.step()
    
    

    for users_t,items_t,reviews_t,ratings_t in dataloader_val:
        model.eval()
        pred_txt,close,far = model(items_t,reviews_t)
        loss = loss_func(close,far)
    
        mean_loss[1] += mse_loss    
        
    for users_t,items_t,reviews_t,ratings_t in dataloader_test:
        model.eval()
        pred_txt,close,far = model(items_t,reviews_t)
        loss = loss_func(close,far)
    
        mean_loss[2] += mse_loss    

    print("-"*25)
    print("epoch",e, "loss (train/val/test)", round((mean_loss[0]/len(dataloader_train)).item(),3),"/",  round((mean_loss[1]/len(dataloader_val)).item(),3),"/",  round((mean_loss[2]/len(dataloader_test)).item(),3))
        

SyntaxError: invalid syntax (<ipython-input-6-7dc9b75b6a87>, line 13)

In [None]:
inv_word_dic = {i:k for k,i in word_dic.items()}
inv_word_dic[0] = "pad"
inv_word_dic[1] = "unk"

        
def get_most_similar(text_emb,embeddings,dictionnary,top_k=10):
    """
    Returns the k closest embeddings labels 
    
    text_emb  is a tensor (N,)
    embeddings is a matrix (M,N)
    dictionnary 
    """
    
    
    affinity = torch.sum(text_embs * embeddings,-1) 
    x,ind = torch.sort(affinity)
    ind = ind.tolist()
    top_k = ind[::-1][:top_k]

    return [dictionnary[x] for x in top_k]


text_embs = #######
embeddings = ########

print(get_most_similar(text_embs,embeddings,inv_word_dic))

## FINAL Model: Wrapping MF + text prediction: 

We propose to do the following simple SVD model: 

### $$ \min\limits_{U,I}\sum\limits_{(u,i)} \underbrace{(r_{ui} -  (I_i^TU_u + \mu))^2}_\text{minimization} + \underbrace{\lambda(||U_u||^2+||I_u||^2 + \mu) }_\text{regularization} $$

and link it with our previous text prediction method

We propose to link via a linear layer on items.

In [7]:
class TextClassicMF(nn.Module):
    
    def __init__(self,nb_users,nb_items,dic_size,latent_size,word_latent_size):
        super(TextClassicMF, self).__init__()
        self.users = #########
        self.items = ###########
        self.text_emb =##############
        self.to_text = nn.Linear(latent_size,word_latent_size)
        self.mean = ###########
        self.dic_size = dic_size
        
        #init
        nn.init.normal_(self.users.weight,0,0.1)
        nn.init.normal_(self.items.weight,0,0.1)

        
    def forward(self, user, item, text):
        ####
      

        return 
    
    def get_text(self,user,item):
         
        embed_u,embed_i = self.users(user).squeeze(1),self.items(item).squeeze(1)
        out = torch.sum(embed_u*embed_i,1) + self.mean
        embed_i_text = self.to_text(embed_i)
                
        return embed_i_text
        

SyntaxError: invalid syntax (<ipython-input-7-f144ec5d55ef>, line 5)

In [None]:

n_epochs = 50
batch_size = 16
num_feat = 25
size_embedding = 50
lr = 0.1
reg = 0.01






def loss_func(pred,ratings_t,reg,*params):
    '''
    mse loss combined with l2 regularization.
    params assumed 2-dimension
    '''
    mse = F.mse_loss(pred,ratings_t,size_average=False)
    l2 = 0
    for p in params:
        l2 += torch.mean(p.norm(2,-1))
        
    return (mse/pred.size(0)) + reg*l2 , mse
    


model = TextClassicMF(len(u_dic),len(i_dic),max_words,num_feat,size_embedding)


dataloader_train = DataLoader(prep_train, batch_size=batch_size, shuffle=True, num_workers=3, collate_fn=tuple_batch_text)
dataloader_val = DataLoader(prep_val, batch_size=batch_size, shuffle=True, num_workers=3, collate_fn=tuple_batch_text)
dataloader_test = DataLoader(prep_test, batch_size=batch_size, shuffle=False, num_workers=3, collate_fn=tuple_batch_text)

optimizer = torch.optim.SGD(model.parameters(),lr=lr)


for e in range(n_epochs):
    mean_loss = [0,0,0] #train/val/test

    for users_t,items_t,reviews_t,ratings_t in dataloader_train:
        model.train()
        model.zero_grad()
        pred,*params = model(users_t,items_t,reviews_t)

        loss,mse_loss = loss_func(pred,ratings_t,reg,*params)
        loss.backward()
        
        mean_loss[0] += mse_loss
        optimizer.step()
    
    

    for users_t,items_t,reviews_t,ratings_t in dataloader_val:
        model.eval()
        pred,*params = model(users_t,items_t,reviews_t)
        _,mse_loss = loss_func(pred,ratings_t,reg,*params)
    
        mean_loss[1] += mse_loss    
        
    for users_t,items_t,reviews_t,ratings_t in dataloader_test:
        model.eval()
        pred,*params = model(users_t,items_t,reviews_t)
        _,mse_loss = loss_func(pred,ratings_t,reg,*params)
    
        mean_loss[2] += mse_loss    

    print("-"*25)
    print("epoch",e, "mse (train/val/test)", round((mean_loss[0]/len(prep_train)).item(),3),"/",  round((mean_loss[1]/len(prep_val)).item(),3),"/",  round((mean_loss[2]/len(prep_test)).item(),3))
    

In [None]:



text_embs = #######
embeddings = ########

print(get_most_similar(text_embs,embeddings,inv_word_dic))