# It is scratch for full instruction and test of Caser.

### DataModule

----

LightningDataModule for Santander dataset.

A DataModule implements 5 key methods:

- ``prepare_data`` (things to do on 1 GPU/TPU, not on every GPU/TPU in distributed mode)
- ``setup`` (things to do on every accelerator in distributed mode)
- ``train_dataloader`` (the training dataloader)
- ``val_dataloader`` (the validation dataloader(s))
- ``test_dataloader`` (the test dataloader(s))

This allows you to share a full dataset without explaining how to download,
split, transform and process the data.

Read the docs:
    https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html

In [1]:
import warnings
warnings.filterwarnings('ignore')

from src.datamodules.caser_datamodule import SantanderDataModule
import numpy as np
import torch

In [2]:
San = SantanderDataModule(seed=25)

---

``SantanderDataModule.prepare_data()`` - preproces data if needed.


The process consists of two parts: creating subsample and making interactions data. First one allows us to choose the amount of users which will be in dataset. Second one transforms table data in special format suitable for Caser model.

In [3]:
San.prepare_data()

making data subsample...
you have took 2000 users from 601769 possible users
dataset reduced to 34000 entries
process done
making interactions data...
progress: 25.0 %
progress: 75.0 %
        TIMESTAMP  USER_ID  ITEM_ID
0      1422392400  1376263        1
1      1422392400   220061        3
2      1422392400   220061        7
3      1422392400   220061       11
4      1422392400   220061       20
...           ...      ...      ...
62492  1464382800  1196488        1
62493  1464382800  1196669        1
62494  1464382800  1193770        1
62495  1464382800  1114714       11
62496  1464382800  1180766        1

[62497 rows x 3 columns]
created file data\preprocessed\train_ver2.csv
process done
making data split...
process done


You need to make sure that ``data_test.csv``, ``data_train.csv`` and ``data_val.csv`` appered in ``data\preprocessed`` after this part.

---

``SantanderDataModule.setup()`` - Load data. Set variables: `self.data_train`, `self.data_val`, `self.data_test`.

This method is called by lightning when doing `trainer.fit()` and `trainer.test()`.

In [4]:
San.setup()

Train, validation and test datasets are ready for use!


---

As a result, we obtain pytorch dataloaders for train, validation and test.

In [5]:
train_DL = San.train_dataloader()

In [6]:
sequence, user, target, negative_sample = next(iter(train_DL))

print(sequence[0], user[0], target[0], negative_sample[0])

tensor([1, 1, 1, 1, 1]) tensor([1616]) tensor([1]) tensor([10,  2, 15])


---

### Training Caser

In [7]:
%run -i train.py +seed=25

[[36m2023-11-14 15:17:34,478[0m][[34msrc.utils[0m][[32mINFO[0m] - Printing config tree with Rich! <config.print_config=True>[0m


Global seed set to 25


[[36m2023-11-14 15:17:34,569[0m][[34msrc.training_pipeline[0m][[32mINFO[0m] - Instantiating datamodule <src.datamodules.caser_datamodule.SantanderDataModule>[0m
[[36m2023-11-14 15:17:34,573[0m][[34msrc.training_pipeline[0m][[32mINFO[0m] - Instantiating model <src.models.caser.CaserModel>[0m
[[36m2023-11-14 15:17:34,658[0m][[34msrc.training_pipeline[0m][[32mINFO[0m] - Instantiating callback <pytorch_lightning.callbacks.ModelCheckpoint>[0m
[[36m2023-11-14 15:17:34,666[0m][[34msrc.training_pipeline[0m][[32mINFO[0m] - Instantiating callback <pytorch_lightning.callbacks.EarlyStopping>[0m
[[36m2023-11-14 15:17:34,667[0m][[34msrc.training_pipeline[0m][[32mINFO[0m] - Instantiating callback <pytorch_lightning.callbacks.RichModelSummary>[0m
[[36m2023-11-14 15:17:34,669[0m][[34msrc.training_pipeline[0m][[32mINFO[0m] - Instantiating callback <pytorch_lightning.callbacks.RichProgressBar>[0m
[[36m2023-11-14 15:17:34,671[0m][[34msrc.training_pipeline[0m][

CometLogger will be initialized in online mode


[[36m2023-11-14 15:17:34,674[0m][[34msrc.training_pipeline[0m][[32mINFO[0m] - Instantiating trainer <pytorch_lightning.Trainer>[0m


Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.rich_model_summary.RichModelSummary'>]. Skipping setting a default `ModelSummary` callback.
GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


[[36m2023-11-14 15:17:36,281[0m][[34msrc.training_pipeline[0m][[32mINFO[0m] - Logging hyperparameters![0m


[1;38;5;39mCOMET INFO:[0m Experiment is live on comet.com https://www.comet.com/verdangeta/recsys/c55547cb2eb144dc85d36a85391df76b



[[36m2023-11-14 15:17:43,474[0m][[34msrc.training_pipeline[0m][[32mINFO[0m] - Starting training![0m
preparing data has already done!
Train, validation and test datasets are ready for use!


Output()

`Trainer.fit` stopped: `max_epochs=10` reached.


[1;38;5;39mCOMET INFO:[0m ---------------------------------------------------------------------------------------
[1;38;5;39mCOMET INFO:[0m Comet.ml Experiment Summary
[1;38;5;39mCOMET INFO:[0m ---------------------------------------------------------------------------------------
[1;38;5;39mCOMET INFO:[0m   Data:
[1;38;5;39mCOMET INFO:[0m     display_summary_level : 1
[1;38;5;39mCOMET INFO:[0m     url                   : https://www.comet.com/verdangeta/recsys/c55547cb2eb144dc85d36a85391df76b
[1;38;5;39mCOMET INFO:[0m   Metrics [count] (min, max):
[1;38;5;39mCOMET INFO:[0m     train/loss [10]   : (0.0032484401017427444, 0.5466639399528503)
[1;38;5;39mCOMET INFO:[0m     val/MRR@1 [10]    : (0.8504864311315924, 0.9057859703020993)
[1;38;5;39mCOMET INFO:[0m     val/MRR@2 [10]    : (0.8507424475166411, 0.8841525857654889)
[1;38;5;39mCOMET INFO:[0m     val/MRR@3 [10]    : (0.8458781362007171, 0.875405359276327)
[1;38;5;39mCOMET INFO:[0m     val/NDCG@1 [10]   : (0.85

[[36m2023-11-14 15:20:37,745[0m][[34msrc.training_pipeline[0m][[32mINFO[0m] - Starting testing![0m


Restoring states from the checkpoint path at C:\Users\Skoltech\Desktop\RecSystemFinal\Caser\logs\experiments\runs\caser_santander_64\2023-11-14_15-17-34\checkpoints\epoch_001.ckpt
Loaded model weights from the checkpoint at C:\Users\Skoltech\Desktop\RecSystemFinal\Caser\logs\experiments\runs\caser_santander_64\2023-11-14_15-17-34\checkpoints\epoch_001.ckpt


Output()

preparing data has already done!


[1;38;5;39mCOMET INFO:[0m Experiment is live on comet.com https://www.comet.com/verdangeta/recsys/c55547cb2eb144dc85d36a85391df76b



[1;38;5;39mCOMET INFO:[0m ---------------------------------------------------------------------------------------
[1;38;5;39mCOMET INFO:[0m Comet.ml ExistingExperiment Summary
[1;38;5;39mCOMET INFO:[0m ---------------------------------------------------------------------------------------
[1;38;5;39mCOMET INFO:[0m   Data:
[1;38;5;39mCOMET INFO:[0m     display_summary_level : 1
[1;38;5;39mCOMET INFO:[0m     url                   : https://www.comet.com/verdangeta/recsys/c55547cb2eb144dc85d36a85391df76b
[1;38;5;39mCOMET INFO:[0m   Metrics:
[1;38;5;39mCOMET INFO:[0m     test/MRR@1    : 0.8453661034306196
[1;38;5;39mCOMET INFO:[0m     test/MRR@2    : 0.8428059395801332
[1;38;5;39mCOMET INFO:[0m     test/MRR@3    : 0.8419667747624736
[1;38;5;39mCOMET INFO:[0m     test/NDCG@1   : 0.8453661203384399
[1;38;5;39mCOMET INFO:[0m     test/NDCG@2   : 0.82179194688797
[1;38;5;39mCOMET INFO:[0m     test/NDCG@3   : 0.8243833184242249
[1;38;5;39mCOMET INFO:[0m     test/Recal

[[36m2023-11-14 15:20:43,584[0m][[34msrc.training_pipeline[0m][[32mINFO[0m] - Finalizing![0m
[[36m2023-11-14 15:20:43,585[0m][[34msrc.training_pipeline[0m][[32mINFO[0m] - Best model ckpt at C:\Users\Skoltech\Desktop\RecSystemFinal\Caser\logs\experiments\runs\caser_santander_64\2023-11-14_15-17-34\checkpoints\epoch_001.ckpt[0m


## Testing Caser

In [14]:
%run -i test.py +seed=25

Global seed set to 25


CometLogger will be initialized in online mode


GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[1;38;5;39mCOMET INFO:[0m Experiment is live on comet.com https://www.comet.com/verdangeta/recsys/e0ef6f8a21634228b332a9e82f9f1b2f



Restoring states from the checkpoint path at C:\Users\Skoltech\Desktop\RecSystemFinal\Caser\test_ckpt\last.ckpt
Loaded model weights from the checkpoint at C:\Users\Skoltech\Desktop\RecSystemFinal\Caser\test_ckpt\last.ckpt


Testing: 0it [00:00, ?it/s]

[1;38;5;39mCOMET INFO:[0m ---------------------------------------------------------------------------------------
[1;38;5;39mCOMET INFO:[0m Comet.ml Experiment Summary
[1;38;5;39mCOMET INFO:[0m ---------------------------------------------------------------------------------------
[1;38;5;39mCOMET INFO:[0m   Data:
[1;38;5;39mCOMET INFO:[0m     display_summary_level : 1
[1;38;5;39mCOMET INFO:[0m     url                   : https://www.comet.com/verdangeta/recsys/e0ef6f8a21634228b332a9e82f9f1b2f
[1;38;5;39mCOMET INFO:[0m   Metrics:
[1;38;5;39mCOMET INFO:[0m     test/MRR@1    : 0.8131080389144906
[1;38;5;39mCOMET INFO:[0m     test/MRR@2    : 0.8269329237071172
[1;38;5;39mCOMET INFO:[0m     test/MRR@3    : 0.8260937588894579
[1;38;5;39mCOMET INFO:[0m     test/NDCG@1   : 0.8131080269813538
[1;38;5;39mCOMET INFO:[0m     test/NDCG@2   : 0.7887366414070129
[1;38;5;39mCOMET INFO:[0m     test/NDCG@3   : 0.791541337966919
[1;38;5;39mCOMET INFO:[0m     test/Recall@1 : 0

Model works well if all metrics are higher than 0.8

## Product recommendation for users

In [7]:
ckpt_path = "./logs/experiments/runs/caser_santander_64/2023-11-07_16-49-37/checkpoints/epoch_002.ckpt"

user_id = np.random.choice(train_DL.dataset.users.reshape(-1),2)

print(user_id)

[ 965 1460]


In [12]:
from inference_caser import make_predict

make_predict(train_DL, user_id,ckpt_path, k = 5)

Resume training from ./logs/experiments/runs/caser_santander_64/2023-11-07_16-49-37/checkpoints/epoch_002.ckpt
Predictions for users:
{965: tensor([ 4, 11, 10,  7,  5]), 1460: tensor([ 9,  4,  8, 10, 14])}
