# Recommender Systems 2020/2021

## Practice Session 11 - MF with PyTorch

## Outline

* Dataset loading
* Main ideas for MF
* Model Creation
* Dataset Loading
* Training

In [1]:
from Data_manager.split_functions.split_train_validation_random_holdout import split_train_in_two_percentage_global_sample
from Data_manager.Movielens.Movielens10MReader import Movielens10MReader

data_reader = Movielens10MReader()
datasets_dict = data_reader.load_data()

Movielens10M: Verifying data consistency...
Movielens10M: Verifying data consistency... Passed!
DataReader: current dataset is: <class 'Data_manager.Dataset.Dataset'>
	Number of items: 10681
	Number of users: 69878
	Number of interactions in URM_all: 10000054
	Value range in URM_all: 0.50-5.00
	Interaction density: 1.34E-02
	Interactions per user:
		 Min: 2.00E+01
		 Avg: 1.43E+02
		 Max: 7.36E+03
	Interactions per item:
		 Min: 0.00E+00
		 Avg: 9.36E+02
		 Max: 3.49E+04
	Gini Index: 0.57

	ICM name: ICM_genres, Value range: 1.00 / 1.00, Num features: 20, feature occurrences: 21564, density 1.01E-01
	ICM name: ICM_tags, Value range: 1.00 / 69.00, Num features: 10217, feature occurrences: 108563, density 9.95E-04
	ICM name: ICM_all, Value range: 1.00 / 69.00, Num features: 10237, feature occurrences: 130127, density 1.19E-03




In [2]:
URM_all = datasets_dict.AVAILABLE_URM["URM_all"]
print(URM_all)

URM_train, URM_test = split_train_in_two_percentage_global_sample(URM_all, train_percentage = 0.8)

  (0, 0)	5.0
  (0, 1)	5.0
  (0, 2)	5.0
  (0, 3)	5.0
  (0, 4)	5.0
  (0, 5)	5.0
  (0, 6)	5.0
  (0, 7)	5.0
  (0, 8)	5.0
  (0, 9)	5.0
  (0, 10)	5.0
  (0, 11)	5.0
  (0, 12)	5.0
  (0, 13)	5.0
  (0, 14)	5.0
  (0, 15)	5.0
  (0, 16)	5.0
  (0, 17)	5.0
  (0, 18)	5.0
  (0, 19)	5.0
  (0, 20)	5.0
  (0, 21)	5.0
  (1, 16)	3.0
  (1, 22)	5.0
  (1, 23)	3.0
  :	:
  (69877, 463)	3.0
  (69877, 467)	1.0
  (69877, 468)	4.0
  (69877, 475)	2.0
  (69877, 481)	3.0
  (69877, 486)	4.0
  (69877, 505)	3.0
  (69877, 518)	1.0
  (69877, 537)	5.0
  (69877, 541)	2.0
  (69877, 1081)	2.0
  (69877, 1302)	4.0
  (69877, 1322)	2.0
  (69877, 1436)	4.0
  (69877, 1609)	1.0
  (69877, 1646)	3.0
  (69877, 1660)	2.0
  (69877, 1671)	2.0
  (69877, 2001)	4.0
  (69877, 2065)	1.0
  (69877, 2941)	1.0
  (69877, 3066)	1.0
  (69877, 3386)	3.0
  (69877, 3448)	1.0
  (69877, 5330)	1.0


### MF models rely upon latent factors for users and items which are called 'embeddings'

![latent factors](https://miro.medium.com/max/988/1*tiF4e4Y-wVH732_6TbJVmQ.png)

In [3]:
num_factors = 10

n_users, n_items = URM_train.shape

In [4]:
import torch

# Creates U
user_factors = torch.nn.Embedding(num_embeddings=n_users, embedding_dim=num_factors)

# Creates V
item_factors = torch.nn.Embedding(num_embeddings=n_items, embedding_dim=num_factors)

In [5]:
user_factors

Embedding(69878, 10)

In [6]:
item_factors

Embedding(10681, 10)

### To compute the prediction we have to multiply the user factors to the item factors, which is a linear operation.

### We define a single layer and an activation function, which takes the result and transforms it in the final prediction. The activation function can be used to restrict the predicted values (e.g., sigmoid is between 0 and 1)

From the [nn.Linear docs](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html?highlight=linear#torch.nn.Linear)

Applies a linear transformation to the incoming data: $$y = xA^T + b$$

In our case, it will transform the element-wise multiplication of `user_factors` and `item_factors` into a rating prediction.


In [7]:
layer_1 = torch.nn.Linear(in_features=num_factors, out_features=1)
layer_1

Linear(in_features=10, out_features=1, bias=True)

From the [ReLU docs](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU)

$$ ReLU(x) = max(0,x) $$

![image](https://pytorch.org/docs/stable/_images/ReLU.png)

In [8]:
activation_function = torch.nn.ReLU()
activation_function

ReLU()

## In order to compute the prediction you have to:
1. Define a list of user and item indices
2. Create a tensor from it
4. Get the user and item embedding
5. Compute the element-wise product of the embeddings
6. Pass the element-wise product to the single layer network
7. Pass the output of the single layer network to the activation function

In [9]:
# 1. Define a list of user/item indices.
item_index = [15]
user_index = [42]
print("1. ", user_index, type(user_index))

# 2. Create a tensor from it. Specify indices are of type int64.
user_index = torch.Tensor(user_index).type(torch.int64)
item_index = torch.Tensor(item_index).type(torch.int64)
print("2. ", user_index, type(user_index))

# 3. Get the user and item embeddings 
current_user_factors = user_factors(user_index)
current_item_factors = item_factors(item_index)
print("3. ", current_user_factors, type(current_user_factors))

# 4. Compute the element-wise product of the embeddings
element_wise_product = torch.mul(current_user_factors, current_item_factors)
print("4. ", element_wise_product, type(element_wise_product))

# 5. Pass the element-wise product of the embeddings
prediction = layer_1(element_wise_product)
print("5. ", prediction, type(prediction))

# 6. Pass the output of the single layer network to the activation function
prediction = activation_function(prediction)
"6. ", prediction, type(prediction)

1.  [42] <class 'list'>
2.  tensor([42]) <class 'torch.Tensor'>
3.  tensor([[-0.0720,  0.5358, -0.8721, -1.7628,  0.6615, -1.7984, -0.9140, -0.8624,
          0.6267,  1.2498]], grad_fn=<EmbeddingBackward>) <class 'torch.Tensor'>
4.  tensor([[ 0.0408,  0.3476, -0.0644, -3.2816,  0.5467,  0.9337, -0.7592, -0.9672,
          1.0566,  0.4944]], grad_fn=<MulBackward0>) <class 'torch.Tensor'>
5.  tensor([[-0.2182]], grad_fn=<AddmmBackward>) <class 'torch.Tensor'>


('6. ', tensor([[0.]], grad_fn=<ReluBackward0>), torch.Tensor)

### To take the result of the prediction and transform it into a traditional numpy array you have to first call .detach() and then .numpy()
### The result is an array of 1 cell

In [10]:
prediction_numpy = prediction.detach().numpy()
print("Prediction is {}".format(prediction_numpy))


Prediction is [[0.]]


# Train a MF MSE model with PyTorch

# Step 1 Create a Model python object

### The model should implement the forward function which computes the prediction as we did before

In [11]:
class MF_MSE_PyTorch_model(torch.nn.Module):
    def __init__(self, n_users: int, n_items: int, n_factors: int):
        super(MF_MSE_PyTorch_model, self).__init__()

        self.n_users = n_users
        self.n_items = n_items
        self.n_factors = n_factors

        self.user_factors = torch.nn.Embedding(num_embeddings=self.n_users, embedding_dim=self.n_factors)
        self.item_factors = torch.nn.Embedding(num_embeddings=self.n_items, embedding_dim=self.n_factors)

        self.layer_1 = torch.nn.Linear(in_features=self.n_factors, out_features=1)
        self.activation_function = torch.nn.ReLU()

    def forward(self, user_coordinates, item_coordinates):
        current_user_factors = self.user_factors(user_coordinates)
        current_item_factors = self.item_factors(item_coordinates)

        prediction = torch.mul(current_user_factors, current_item_factors)
        prediction = self.layer_1(prediction)
        prediction = self.activation_function(prediction)

        return prediction

    def get_W(self):

        return self.user_factors.weight.detach().cpu().numpy()

    def get_H(self):

        return self.item_factors.weight.detach().cpu().numpy()

# Step 2 Setup PyTorch devices and Data Reader

In [12]:
if torch.cuda.is_available():
    device = torch.device('cuda')
    print("MF_MSE_PyTorch: Using CUDA")
else:
    device = torch.device('cpu')
    print("MF_MSE_PyTorch: Using CPU")


MF_MSE_PyTorch: Using CPU


### Create an instance of the model and specify the device it should run on

In [13]:
pyTorchModel = MF_MSE_PyTorch_model(n_users, n_items, num_factors).to(device)

### Choose loss functions (Mean Squared Error in our case), there are quite a few to choose from

In [14]:
lossFunction = torch.nn.MSELoss(reduction="sum")

### Select the optimizer to be used for the model parameters: Adam, AdaGrad, RMSProp etc... 

In [15]:
learning_rate = 1e-4

optimizer = torch.optim.Adagrad(pyTorchModel.parameters(), lr=learning_rate)

### Define the DatasetIterator, which will be used to iterate over the data

### A DatasetIterator will implement the Dataset class and provide the __getitem__(self, index) method, which allows to get the data points indexed by that index.

### Since we need the data to be a tensor, we pre inizialize everything as a tensor. In practice we save the URM in coordinate format (user, item, rating)

In [16]:
from torch.utils.data import Dataset
import numpy as np

class DatasetIterator_URM(Dataset):
    def __init__(self, URM):
        # Remember that URM[row[k], col[k]] = data[k]
        URM = URM.tocoo()

        self.n_data_points = URM.nnz

        # Create a nx2 tensor where: A[i,0] = row[i] and A[i,1] = col[i]
        self.user_item_coordinates = np.empty((self.n_data_points, 2))
        self.user_item_coordinates[:,0] = URM.row.copy()
        self.user_item_coordinates[:,1] = URM.col.copy()
        self.user_item_coordinates = torch.Tensor(self.user_item_coordinates).type(torch.int64)
       
        # Converts ratings to tensor.
        self.rating = URM.data.copy().astype(np.float)
        self.rating = torch.Tensor(self.rating) # No need to specify type as torch.Tensor by default is torch.float32

    def __getitem__(self, index):
        """
        Format is (row, col, data)
        :param index:
        :return:
        """
        return self.user_item_coordinates[index, :], self.rating[index]


    def __len__(self):
        return self.n_data_points


### We pass the DatasetIterator to a DataLoader object which manages the use of batches and so on...

In [17]:
from torch.utils.data import DataLoader

batch_size = 200

dataset_iterator = DatasetIterator_URM(URM_train)

train_data_loader = DataLoader(dataset=dataset_iterator,
                               batch_size=batch_size,
                               shuffle=True,
                               #num_workers = 2,
                              )

## And now we ran the usual epoch steps
* Data point sampling
* Prediction computation
* Loss function computation
* Gradient computation
* Update

In [23]:
%%time
from tqdm import tqdm_notebook as tqdm

for input_data, ratings in tqdm(train_data_loader, 0):
    optimizer.zero_grad()
    cumulative_loss = 0.0

    user_coordinates = input_data[:,0]
    item_coordinates = input_data[:,1]

    # FORWARD pass
    predictions = pyTorchModel(user_coordinates, item_coordinates)
    predictions = predictions.view(-1) # predictions are a 1xbatch tensor, we just want an array of size batch (as ratings)
    
    # Obtain loss score, basically the MSE of the predicted and actual ratings
    loss = lossFunction(predictions, ratings)
    
    # BACKWARD pass
    loss.backward()
    optimizer.step()


HBox(children=(IntProgress(value=0, max=40002), HTML(value='')))


CPU times: user 4min 18s, sys: 16.6 s, total: 4min 35s
Wall time: 3min 12s


## After the train is complete (it may take a while and many epochs), we can get the matrices in the usual numpy format

In [24]:
user_factors = pyTorchModel.get_W()
item_factors = pyTorchModel.get_H()

In [25]:
user_factors, user_factors.shape

(array([[ 0.26520982, -0.68337   , -0.54723114, ...,  1.3686696 ,
          0.61501884,  0.35410157],
        [-1.6309935 ,  0.8811241 ,  1.5944738 , ..., -0.6269186 ,
         -0.12894574,  0.4979117 ],
        [-1.1888154 , -1.8266233 , -0.17145918, ..., -0.6406859 ,
          0.21364413, -0.03301006],
        ...,
        [-0.1464834 ,  1.2325315 ,  0.21650487, ..., -1.081837  ,
         -1.3280292 , -0.5451829 ],
        [-0.48614633, -0.27829456,  0.23680773, ..., -0.2558126 ,
         -0.09771301, -0.53594506],
        [-0.30689088,  0.47045732,  1.2976024 , ..., -1.4160035 ,
         -0.7244118 ,  2.3710787 ]], dtype=float32),
 (69878, 10))

In [26]:
item_factors, item_factors.shape

(array([[ 0.33532315, -1.4667886 ,  1.2425084 , ...,  1.7038391 ,
         -0.51159924, -1.2737192 ],
        [ 0.6346393 , -0.4209208 , -0.973261  , ..., -1.2520459 ,
          0.7278145 , -0.15892538],
        [ 1.1904838 ,  0.10658428,  0.5855952 , ...,  0.5420558 ,
          0.39672065, -0.4849253 ],
        ...,
        [ 1.4554454 ,  1.1165615 ,  0.5367837 , ..., -0.7780073 ,
         -0.8466227 ,  0.74578154],
        [-0.54399824,  1.0597395 ,  0.90279496, ..., -0.55121493,
          0.6213373 ,  0.168343  ],
        [ 0.39889905,  0.27928987, -0.14391598, ...,  1.5336596 ,
          0.45261514, -0.82100457]], dtype=float32),
 (10681, 10))