# **rectorch**: mult-vae model

## Preliminaries

### Dataset download
For the purposes of this tutorial we download the *movielens 20M* dataset. As the name suggests, this dataset contains roughly one million (5 stars) ratings about movies. For more details, please refer to the official web page https://grouplens.org/datasets/movielens/20m/.

In [1]:
%cd /content/
!wget http://files.grouplens.org/datasets/movielens/ml-20m.zip
!unzip ml-20m.zip
!rm ml-20m.zip

[Errno 2] No such file or directory: '/content/'
/Users/ciomi/PycharmProjects/rectorch/tutorials
zsh:1: command not found: wget
unzip:  cannot find or open ml-20m.zip, ml-20m.zip.zip or ml-20m.zip.ZIP.
rm: ml-20m.zip: No such file or directory


In [1]:
%cd /content/
!wget http://files.grouplens.org/datasets/movielens/ml-1m.zip
!unzip ml-1m.zip
!rm ml-1m.zip

/content
--2020-10-08 10:14:07--  http://files.grouplens.org/datasets/movielens/ml-1m.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152
Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5917549 (5.6M) [application/zip]
Saving to: ‘ml-1m.zip’


2020-10-08 10:14:08 (14.9 MB/s) - ‘ml-1m.zip’ saved [5917549/5917549]

Archive:  ml-1m.zip
   creating: ml-1m/
  inflating: ml-1m/movies.dat        
  inflating: ml-1m/ratings.dat       
  inflating: ml-1m/README            
  inflating: ml-1m/users.dat         


### **rectorch** installation

NOTE: in this version of the tutorial we load the *dev* version from [github](https://github.com/makgyver/rectorch).

In [2]:
%cd /content/
!git clone -b dev https://github.com/makgyver/rectorch.git
%cd rectorch
!pip install -r requirements.txt

/content
Cloning into 'rectorch'...
remote: Enumerating objects: 55, done.[K
remote: Counting objects: 100% (55/55), done.[K
remote: Compressing objects: 100% (38/38), done.[K
remote: Total 1705 (delta 17), reused 35 (delta 17), pack-reused 1650[K
Receiving objects: 100% (1705/1705), 3.22 MiB | 14.27 MiB/s, done.
Resolving deltas: 100% (1133/1133), done.
/content/rectorch
Collecting munch>=2.5.0
  Downloading https://files.pythonhosted.org/packages/cc/ab/85d8da5c9a45e072301beb37ad7f833cd344e04c817d97e0cc75681d248f/munch-2.5.0-py2.py3-none-any.whl
Installing collected packages: munch
Successfully installed munch-2.5.0


### Data loading and splitting

In [3]:
#ml20m
cfg_data = {
    "processing": {
        "data_path": "../ml-20m/ratings.csv",
        "threshold": 3.5,
        "separator": ",",
        "header": 0,
        "u_min": 5,
        "i_min": 0
    },
    "splitting": {
        "split_type": "vertical",
        "sort_by": None,
        "seed": 98765,
        "shuffle": True,
        "valid_size": 10000,
        "test_size": 10000,
        "test_prop": 0.2
    }
}

In [3]:
#ml1m
cfg_data = {
    "processing": {
        "data_path": "../ml-1m/ratings.dat",
        "threshold": 3.5,
        "separator": "::",
        #"header": 0,
        "u_min": 5,
        "i_min": 0
    },
    "splitting": {
        "split_type": "vertical",
        "sort_by": None,
        "seed": 98765,
        "shuffle": True,
        "valid_size": 200,
        "test_size": 200,
        "test_prop": 0.2
    }
}

In [5]:
#steam
cfg_data = {
    "processing": {
        "data_path": "../steam_sorted.csv",
        "threshold": 0,
        "separator": ",",
        #"header": 0,
        "u_min": 2,
        "i_min": 0
    },
    "splitting": {
        "split_type": "vertical",
        "sort_by": None,
        "seed": 98765,
        "shuffle": True,
        "valid_size": 1000,
        "test_size": 1000,
        "test_prop": 0.2
    }
}

In [6]:
from rectorch.data import DataProcessing
dataset = DataProcessing(cfg_data).process_and_split()
dataset

[10:17:32-081020]  Reading raw data file ../steam_sorted.csv.
[10:17:33-081020]  Applying filtering.
[10:17:33-081020]  Filtered 5700 ratings.
[10:17:33-081020]  Shuffling data.
[10:17:33-081020]  Calculating splits.
[10:17:33-081020]  Creating validation and test set.
[10:17:33-081020]  Skipped 186 ratings in validation set.
[10:17:33-081020]  Skipped 243 ratings in test set.
[10:17:33-081020]  Skipped 1 users in validation set.
[10:17:33-081020]  Skipped 3 users in test set.


Dataset(n_users=6693, n_items=4779, n_ratings=122671)

For more details about how to load, process and splitting the dataset, please refer to the tutorial [rectorch_data_tutorial.ipynb](https://colab.research.google.com/drive/1gKgMllkYlvvBqh7q6WmmSvtfAOTz7tFh#scrollTo=Cwi1HjgJ-T7Z).

### Sampler creation: rectorch.samplers.DataSampler



In [7]:
from rectorch.samplers import DataSampler
sampler = DataSampler(dataset, mode="train", batch_size=500)

The `mode` of a sampler indicates its current state, that is which part of the dataset is handling. In this case, the training set ("train") since we are going to train the models.

### Mult-VAE recommender

A random recommender is simply a system that recommends random items to users. The only useful parameter to initialize the model is the number of items.

In [8]:
from rectorch.models.nn.multvae import MultVAE
vae = MultVAE(dec_dims=[200,600,dataset.n_items],
              enc_dims=None,
              dropout=0.5,
              beta=.2,
              anneal_steps=100000,
              opt_conf=None,
              device="cuda",
              trainer=None)

[10:18:05-081020]  Generating grammar tables from /usr/lib/python3.6/lib2to3/Grammar.txt
[10:18:05-081020]  Generating grammar tables from /usr/lib/python3.6/lib2to3/PatternGrammar.txt


### Training

In [9]:
vae.train(sampler, valid_metric="ndcg@100")

[10:18:30-081020]  | epoch 1 | 10/10 batches | ms/batch 39.51 | loss 144.45 |
[10:18:30-081020]  | epoch 1 | loss 144.4506 | total time: 0.40s |
[10:18:30-081020]  | epoch 1 | ndcg@100 0.240 (0.0088) |
[10:18:31-081020]  | epoch 2 | 10/10 batches | ms/batch 16.69 | loss 133.65 |
[10:18:31-081020]  | epoch 2 | loss 133.6478 | total time: 0.17s |
[10:18:31-081020]  | epoch 2 | ndcg@100 0.216 (0.0066) |
[10:18:31-081020]  | epoch 3 | 10/10 batches | ms/batch 16.82 | loss 132.08 |
[10:18:31-081020]  | epoch 3 | loss 132.0813 | total time: 0.17s |
[10:18:31-081020]  | epoch 3 | ndcg@100 0.250 (0.0087) |
[10:18:31-081020]  | epoch 4 | 10/10 batches | ms/batch 17.05 | loss 131.77 |
[10:18:31-081020]  | epoch 4 | loss 131.7674 | total time: 0.17s |
[10:18:31-081020]  | epoch 4 | ndcg@100 0.240 (0.0080) |
[10:18:31-081020]  | epoch 5 | 10/10 batches | ms/batch 15.79 | loss 132.81 |
[10:18:31-081020]  | epoch 5 | loss 132.8058 | total time: 0.16s |
[10:18:31-081020]  | epoch 5 | ndcg@100 0.256 (

### Evaluation

In [None]:
from rectorch.evaluation import evaluate
results = evaluate(vae, sampler, ["ndcg@100", "recall@100", "ndcg@20", "recall@20"])

In [None]:
from rectorch.utils import collect_results
collect_results(results)

{'ndcg@100': (0.4134699640575229, 0.2090619683418369),
 'ndcg@20': (0.3231564329510504, 0.22088148824524897),
 'recall@100': (0.6463630794000064, 0.2726948893343989),
 'recall@20': (0.38334497722392846, 0.26767216600690824)}