<a href="https://colab.research.google.com/github/Kharinaev/RePlay/blob/rl_ddpg/cql_replay_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Offline RL + RecSys 
> *Kharinaev Artyom*  
> *under the leadership of Panov Alexander*  
> *AIRI summer school, 2022*  

Added [CQL](https://arxiv.org/abs/2006.04779) implementation from [d3rlpy](https://github.com/takuseno/d3rlpy/blob/master/d3rlpy/algos/cql.py) to [RePlay](https://github.com/sb-ai-lab/RePlay) by SB AI Lab  

RePlay repo forked [here](https://github.com/Kharinaev/RePlay/tree/rl_ddpg)  
Changes:
- Fixed [ddpg.py](https://github.com/Kharinaev/RePlay/blob/rl_ddpg/replay/models/ddpg.py) to be able to run
- Added [cql.py](https://github.com/Kharinaev/RePlay/blob/rl_ddpg/replay/models/cql.py)

In [1]:
!pip install --upgrade pip setuptools wheel
!pip install pandas --upgrade
!pip install implicit --upgrade
# !pip install git+https://github.com/Kharinaev/RePlay.git@rl_ddpg
# unable to import this way, problems with pyproject.toml
!pip install replay-rec
!wget https://raw.githubusercontent.com/Kharinaev/RePlay/rl_ddpg/replay/models/cql.py
!pip install pytorch_ranger
!pip install tensorboardX
!pip install gym pyvirtualdisplay > /dev/null 2>&1
!apt-get install -y xvfb python-opengl ffmpeg > /dev/null 2>&1
!pip install d3rlpy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pip
  Downloading pip-22.1.2-py3-none-any.whl (2.1 MB)
[K     |████████████████████████████████| 2.1 MB 31.9 MB/s 
Collecting setuptools
  Downloading setuptools-63.2.0-py3-none-any.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 53.8 MB/s 
Installing collected packages: setuptools, pip
  Attempting uninstall: setuptools
    Found existing installation: setuptools 57.4.0
    Uninstalling setuptools-57.4.0:
      Successfully uninstalled setuptools-57.4.0
  Attempting uninstall: pip
    Found existing installation: pip 21.1.3
    Uninstalling pip-21.1.3:
      Successfully uninstalled pip-21.1.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[0mLooking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting implicit
  Downloading implicit-0.6.0-cp37-cp37m-manylinux2014_x86_64.whl (18.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.6/18.6 MB[0m [31m65.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: implicit
Successfully installed implicit-0.6.0
[0mLooking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting replay-rec
  Downloading replay_rec-0.9.0-py3-none-any.whl (103 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m103.3/103.3 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
Collecting optuna
  Downloading optuna-2.10.1-py3-none-any.whl (308 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m308.2/308.2 kB[0m [31m29.6 MB/s[0m eta [36m0:00:00[0m
Collecti

### **Restart** the environment before running next cells

In [1]:
%load_ext autoreload
%autoreload 2
%config Completer.use_jedi = False

  This is separate from the ipykernel package so we can avoid doing imports until


In [2]:
import tqdm
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

from replay.data_preparator import DataPreparator
from replay.experiment import Experiment
from replay.metrics import HitRate, NDCG, MAP, MRR
from replay.models import ALSWrap, KNN, LightFMWrap, SLIM

from cql import *

K = 5
SEED = 0

import torch
use_gpu = torch.cuda.is_available()

### MovieLens 1m dataset

In [3]:
!wget https://raw.githubusercontent.com/sb-ai-lab/RePlay/rl_ddpg/experiments/data/ml1m_items.dat
!wget https://raw.githubusercontent.com/sb-ai-lab/RePlay/rl_ddpg/experiments/data/ml1m_ratings.dat
!wget https://raw.githubusercontent.com/sb-ai-lab/RePlay/rl_ddpg/experiments/data/ml1m_users.dat

--2022-07-20 11:50:00--  https://raw.githubusercontent.com/sb-ai-lab/RePlay/rl_ddpg/experiments/data/ml1m_items.dat
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 163604 (160K) [text/plain]
Saving to: ‘ml1m_items.dat’


2022-07-20 11:50:01 (58.8 MB/s) - ‘ml1m_items.dat’ saved [163604/163604]

--2022-07-20 11:50:01--  https://raw.githubusercontent.com/sb-ai-lab/RePlay/rl_ddpg/experiments/data/ml1m_ratings.dat
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21593504 (21M) [text/plain]
Saving to: ‘ml1m_ratings.dat’


2022-07-20 11:5

In [4]:
items = pd.read_csv("ml1m_items.dat", sep="\t", names=['item_id', 'name', 'genre'])
df = pd.read_csv("ml1m_ratings.dat", sep="\t", names=["user_id", "item_id", "relevance", "timestamp"])
users = pd.read_csv("ml1m_users.dat", sep="\t", names=["user_id", "gender", "age", "occupation", "zip_code"])

### Baselines

In [5]:
train, test = train_test_split(df, test_size=0.2, random_state=42)

preparator = DataPreparator()
train_sp, _, _ = preparator(train)
test_sp, _, _ = preparator(test)
test = test.rename(columns = {'user_id' : 'user_idx', 'item_id' : 'item_idx'})
train_sp.count(), test_sp.count()

(800167, 200042)

In [6]:
e_base = Experiment(test, {MAP(): K, NDCG(): K, HitRate(): K, MRR(): K})

baselines = {
  'ALS': ALSWrap(seed=SEED), 
  'KNN': KNN(num_neighbours=K), 
  'LightFM': LightFMWrap(random_state=SEED), 
  'SLIM': SLIM(seed=SEED)
}

In [7]:
for key in tqdm.auto.tqdm(baselines.keys(), desc='Model'):
  model = baselines[key]
  model.fit(log=train_sp)
  pred = model.predict(log = test_sp, k=K).toPandas()
  e_base.add_result(key, pred)

e_base.results

Model:   0%|          | 0/4 [00:00<?, ?it/s]



Unnamed: 0,HitRate@5,MAP@5,MRR@5,NDCG@5
ALS,0.063928,0.006308,0.029549,0.013422
KNN,0.0684,0.007664,0.035544,0.015534
LightFM,0.051507,0.005164,0.023777,0.010924
SLIM,0.054323,0.005315,0.02487,0.011333


### CQL

In [8]:
e_cql = Experiment(test, {MAP(): K, NDCG(): K, HitRate(): K, MRR(): K})

n_runs = 5
for i in tqdm.auto.trange(n_runs):
  model = CQL(use_gpu=use_gpu, k=K, n_epochs=3)
  model.fit(log=train_sp)
  pred = model.predict(log=test_sp, k=K).toPandas()
  e_cql.add_result(f'CQL run {i}', pred)

e_cql.results

  0%|          | 0/5 [00:00<?, ?it/s]

20-Jul-22 12:00:24, replay, INFO: The model is neural network with non-distributed training


2022-07-20 12:00.35 [debug    ] RoundIterator is selected.
2022-07-20 12:00.35 [info     ] Directory is created at d3rlpy_logs/CQL_20220720120035
2022-07-20 12:00.35 [debug    ] Building models...
2022-07-20 12:00.43 [debug    ] Models have been built.
2022-07-20 12:00.43 [info     ] Parameters are saved to d3rlpy_logs/CQL_20220720120035/params.json params={'action_scaler': None, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'actor_learning_rate': 0.0001, 'actor_optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'alpha_learning_rate': 0.0001, 'alpha_optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'alpha_threshold': 10.0, 'batch_size': 256, 'conservative_weight': 5.0, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rat

Epoch 1/3:   0%|          | 0/3125 [00:00<?, ?it/s]

2022-07-20 12:02.14 [info     ] CQL_20220720120035: epoch=1 step=3125 epoch=1 metrics={'time_sample_batch': 0.0007977039337158203, 'time_algorithm_update': 0.027718832550048828, 'temp_loss': -8.466208019542695, 'temp': 1.0743542681121827, 'alpha_loss': 0.32059093093676494, 'alpha': 0.9963784733581543, 'critic_loss': 3379.4185449204256, 'actor_loss': 191.08944788330078, 'time_step': 0.02880756187438965} step=3125
2022-07-20 12:02.14 [info     ] Model parameters are saved to d3rlpy_logs/CQL_20220720120035/model_3125.pt


Epoch 2/3:   0%|          | 0/3125 [00:00<?, ?it/s]

2022-07-20 12:03.45 [info     ] CQL_20220720120035: epoch=2 step=6250 epoch=2 metrics={'time_sample_batch': 0.0007529261779785156, 'time_algorithm_update': 0.027599258346557617, 'temp_loss': -11.728459483413696, 'temp': 1.3691676719665526, 'alpha_loss': 7.754319363920689, 'alpha': 0.7496417041015625, 'critic_loss': 534.9597037537384, 'actor_loss': 102.72606413818359, 'time_step': 0.028671385498046875} step=6250
2022-07-20 12:03.45 [info     ] Model parameters are saved to d3rlpy_logs/CQL_20220720120035/model_6250.pt


Epoch 3/3:   0%|          | 0/3125 [00:00<?, ?it/s]

2022-07-20 12:05.15 [info     ] CQL_20220720120035: epoch=3 step=9375 epoch=3 metrics={'time_sample_batch': 0.0007639115905761719, 'time_algorithm_update': 0.027453634643554686, 'temp_loss': -13.661287539367676, 'temp': 1.780396886253357, 'alpha_loss': 8.07205187907219, 'alpha': 0.5423362936210633, 'critic_loss': 344.5403865765381, 'actor_loss': 118.70714498046875, 'time_step': 0.02854317726135254} step=9375
2022-07-20 12:05.15 [info     ] Model parameters are saved to d3rlpy_logs/CQL_20220720120035/model_9375.pt




Users: 6038, items: 3683


User:   0%|          | 0/6038 [00:00<?, ?it/s]

20-Jul-22 12:05:50, replay, INFO: The model is neural network with non-distributed training


2022-07-20 12:06.01 [debug    ] RoundIterator is selected.
2022-07-20 12:06.01 [info     ] Directory is created at d3rlpy_logs/CQL_20220720120601
2022-07-20 12:06.01 [debug    ] Building models...
2022-07-20 12:06.01 [debug    ] Models have been built.
2022-07-20 12:06.01 [info     ] Parameters are saved to d3rlpy_logs/CQL_20220720120601/params.json params={'action_scaler': None, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'actor_learning_rate': 0.0001, 'actor_optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'alpha_learning_rate': 0.0001, 'alpha_optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'alpha_threshold': 10.0, 'batch_size': 256, 'conservative_weight': 5.0, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rat

Epoch 1/3:   0%|          | 0/3125 [00:00<?, ?it/s]

2022-07-20 12:07.31 [info     ] CQL_20220720120601: epoch=1 step=3125 epoch=1 metrics={'time_sample_batch': 0.0007498005676269532, 'time_algorithm_update': 0.027502444458007813, 'temp_loss': -8.333558283014298, 'temp': 1.0835148392868041, 'alpha_loss': 9.672517245314197, 'alpha': 0.877416841545105, 'critic_loss': 158.84598571350097, 'actor_loss': 22.226845051798822, 'time_step': 0.028554134902954102} step=3125
2022-07-20 12:07.31 [info     ] Model parameters are saved to d3rlpy_logs/CQL_20220720120601/model_3125.pt


Epoch 2/3:   0%|          | 0/3125 [00:00<?, ?it/s]

2022-07-20 12:09.02 [info     ] CQL_20220720120601: epoch=2 step=6250 epoch=2 metrics={'time_sample_batch': 0.0007932197570800782, 'time_algorithm_update': 0.027741191024780273, 'temp_loss': -10.794937705383301, 'temp': 1.3952857304382325, 'alpha_loss': 10.565150788650513, 'alpha': 0.6386795189666749, 'critic_loss': 168.88422263931275, 'actor_loss': 13.684119074444771, 'time_step': 0.028869962463378907} step=6250
2022-07-20 12:09.02 [info     ] Model parameters are saved to d3rlpy_logs/CQL_20220720120601/model_6250.pt


Epoch 3/3:   0%|          | 0/3125 [00:00<?, ?it/s]

2022-07-20 12:10.34 [info     ] CQL_20220720120601: epoch=3 step=9375 epoch=3 metrics={'time_sample_batch': 0.0007765656280517578, 'time_algorithm_update': 0.027861919860839843, 'temp_loss': -13.044104780426025, 'temp': 1.8386179540252685, 'alpha_loss': 6.860115845675021, 'alpha': 0.488494686164856, 'critic_loss': 112.38991430870057, 'actor_loss': 62.66917905212402, 'time_step': 0.02897675064086914} step=9375
2022-07-20 12:10.34 [info     ] Model parameters are saved to d3rlpy_logs/CQL_20220720120601/model_9375.pt




Users: 6038, items: 3683


User:   0%|          | 0/6038 [00:00<?, ?it/s]

20-Jul-22 12:11:08, replay, INFO: The model is neural network with non-distributed training


2022-07-20 12:11.19 [debug    ] RoundIterator is selected.
2022-07-20 12:11.19 [info     ] Directory is created at d3rlpy_logs/CQL_20220720121119
2022-07-20 12:11.19 [debug    ] Building models...
2022-07-20 12:11.19 [debug    ] Models have been built.
2022-07-20 12:11.19 [info     ] Parameters are saved to d3rlpy_logs/CQL_20220720121119/params.json params={'action_scaler': None, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'actor_learning_rate': 0.0001, 'actor_optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'alpha_learning_rate': 0.0001, 'alpha_optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'alpha_threshold': 10.0, 'batch_size': 256, 'conservative_weight': 5.0, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rat

Epoch 1/3:   0%|          | 0/3125 [00:00<?, ?it/s]

2022-07-20 12:12.52 [info     ] CQL_20220720121119: epoch=1 step=3125 epoch=1 metrics={'time_sample_batch': 0.0008780233001708985, 'time_algorithm_update': 0.028175759201049803, 'temp_loss': -8.64403066004753, 'temp': 1.1000136588668823, 'alpha_loss': 1.485554116845727, 'alpha': 1.0168094246292114, 'critic_loss': 4414.549558946152, 'actor_loss': 216.58821542236328, 'time_step': 0.029401839370727538} step=3125
2022-07-20 12:12.52 [info     ] Model parameters are saved to d3rlpy_logs/CQL_20220720121119/model_3125.pt


Epoch 2/3:   0%|          | 0/3125 [00:00<?, ?it/s]

2022-07-20 12:14.25 [info     ] CQL_20220720121119: epoch=2 step=6250 epoch=2 metrics={'time_sample_batch': 0.0008485887145996094, 'time_algorithm_update': 0.028093713760375977, 'temp_loss': -10.799217278327943, 'temp': 1.434646764793396, 'alpha_loss': 7.418259316556751, 'alpha': 0.7587628922462464, 'critic_loss': 437.29884003463746, 'actor_loss': 97.48931579833985, 'time_step': 0.029298094787597657} step=6250
2022-07-20 12:14.25 [info     ] Model parameters are saved to d3rlpy_logs/CQL_20220720121119/model_6250.pt


Epoch 3/3:   0%|          | 0/3125 [00:00<?, ?it/s]

2022-07-20 12:15.57 [info     ] CQL_20220720121119: epoch=3 step=9375 epoch=3 metrics={'time_sample_batch': 0.0007735790252685547, 'time_algorithm_update': 0.02784188652038574, 'temp_loss': -13.915120374526978, 'temp': 1.888143854560852, 'alpha_loss': 9.244368544325829, 'alpha': 0.5518572866249084, 'critic_loss': 219.28406723861696, 'actor_loss': 108.22042982910156, 'time_step': 0.02895171318054199} step=9375
2022-07-20 12:15.57 [info     ] Model parameters are saved to d3rlpy_logs/CQL_20220720121119/model_9375.pt




Users: 6038, items: 3683


User:   0%|          | 0/6038 [00:00<?, ?it/s]

20-Jul-22 12:16:30, replay, INFO: The model is neural network with non-distributed training


2022-07-20 12:16.41 [debug    ] RoundIterator is selected.
2022-07-20 12:16.41 [info     ] Directory is created at d3rlpy_logs/CQL_20220720121641
2022-07-20 12:16.42 [debug    ] Building models...
2022-07-20 12:16.42 [debug    ] Models have been built.
2022-07-20 12:16.42 [info     ] Parameters are saved to d3rlpy_logs/CQL_20220720121641/params.json params={'action_scaler': None, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'actor_learning_rate': 0.0001, 'actor_optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'alpha_learning_rate': 0.0001, 'alpha_optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'alpha_threshold': 10.0, 'batch_size': 256, 'conservative_weight': 5.0, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rat

Epoch 1/3:   0%|          | 0/3125 [00:00<?, ?it/s]

2022-07-20 12:18.14 [info     ] CQL_20220720121641: epoch=1 step=3125 epoch=1 metrics={'time_sample_batch': 0.000856866455078125, 'time_algorithm_update': 0.027947658615112305, 'temp_loss': -8.566855578632355, 'temp': 1.0692504187011718, 'alpha_loss': 9.39944751236178, 'alpha': 0.9076063089370727, 'critic_loss': 214.26692074462892, 'actor_loss': 18.325918111724853, 'time_step': 0.029145255889892578} step=3125
2022-07-20 12:18.14 [info     ] Model parameters are saved to d3rlpy_logs/CQL_20220720121641/model_3125.pt


Epoch 2/3:   0%|          | 0/3125 [00:00<?, ?it/s]

2022-07-20 12:19.45 [info     ] CQL_20220720121641: epoch=2 step=6250 epoch=2 metrics={'time_sample_batch': 0.0007969475555419922, 'time_algorithm_update': 0.02773933319091797, 'temp_loss': -9.494587201957703, 'temp': 1.3664548747634888, 'alpha_loss': 10.714089094772339, 'alpha': 0.664912307472229, 'critic_loss': 173.22180839874267, 'actor_loss': 7.921717283554077, 'time_step': 0.02887682159423828} step=6250
2022-07-20 12:19.45 [info     ] Model parameters are saved to d3rlpy_logs/CQL_20220720121641/model_6250.pt


Epoch 3/3:   0%|          | 0/3125 [00:00<?, ?it/s]

2022-07-20 12:21.17 [info     ] CQL_20220720121641: epoch=3 step=9375 epoch=3 metrics={'time_sample_batch': 0.0008290359497070312, 'time_algorithm_update': 0.02792409049987793, 'temp_loss': -12.882692225341797, 'temp': 1.8531761660766601, 'alpha_loss': 9.175192369127274, 'alpha': 0.4981367278575897, 'critic_loss': 89.52649560379028, 'actor_loss': 56.60678814758301, 'time_step': 0.02910247184753418} step=9375
2022-07-20 12:21.17 [info     ] Model parameters are saved to d3rlpy_logs/CQL_20220720121641/model_9375.pt




Users: 6038, items: 3683


User:   0%|          | 0/6038 [00:00<?, ?it/s]

20-Jul-22 12:21:54, replay, INFO: The model is neural network with non-distributed training


2022-07-20 12:22.05 [debug    ] RoundIterator is selected.
2022-07-20 12:22.05 [info     ] Directory is created at d3rlpy_logs/CQL_20220720122205
2022-07-20 12:22.05 [debug    ] Building models...
2022-07-20 12:22.05 [debug    ] Models have been built.
2022-07-20 12:22.05 [info     ] Parameters are saved to d3rlpy_logs/CQL_20220720122205/params.json params={'action_scaler': None, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'actor_learning_rate': 0.0001, 'actor_optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'alpha_learning_rate': 0.0001, 'alpha_optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'alpha_threshold': 10.0, 'batch_size': 256, 'conservative_weight': 5.0, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rat

Epoch 1/3:   0%|          | 0/3125 [00:00<?, ?it/s]

2022-07-20 12:23.37 [info     ] CQL_20220720122205: epoch=1 step=3125 epoch=1 metrics={'time_sample_batch': 0.0008277806091308593, 'time_algorithm_update': 0.02787920051574707, 'temp_loss': -9.08631777671814, 'temp': 1.1177906085968017, 'alpha_loss': 10.754783372989595, 'alpha': 0.8826592384719849, 'critic_loss': 208.8216959333801, 'actor_loss': 25.362686252818108, 'time_step': 0.02904980079650879} step=3125
2022-07-20 12:23.37 [info     ] Model parameters are saved to d3rlpy_logs/CQL_20220720122205/model_3125.pt


Epoch 2/3:   0%|          | 0/3125 [00:00<?, ?it/s]

2022-07-20 12:25.09 [info     ] CQL_20220720122205: epoch=2 step=6250 epoch=2 metrics={'time_sample_batch': 0.0008410037994384766, 'time_algorithm_update': 0.027957157592773438, 'temp_loss': -10.098542334995269, 'temp': 1.426639111251831, 'alpha_loss': 11.305705852012634, 'alpha': 0.6534831141662598, 'critic_loss': 103.22094306762695, 'actor_loss': 34.76674231964111, 'time_step': 0.029153145141601562} step=6250
2022-07-20 12:25.09 [info     ] Model parameters are saved to d3rlpy_logs/CQL_20220720122205/model_6250.pt


Epoch 3/3:   0%|          | 0/3125 [00:00<?, ?it/s]

2022-07-20 12:26.40 [info     ] CQL_20220720122205: epoch=3 step=9375 epoch=3 metrics={'time_sample_batch': 0.0007493074035644532, 'time_algorithm_update': 0.027661860275268554, 'temp_loss': -12.535439424095154, 'temp': 1.883900902442932, 'alpha_loss': 6.76906134115383, 'alpha': 0.5027898385715485, 'critic_loss': 183.44129998474122, 'actor_loss': 82.89440930419921, 'time_step': 0.028751477355957032} step=9375
2022-07-20 12:26.40 [info     ] Model parameters are saved to d3rlpy_logs/CQL_20220720122205/model_9375.pt




Users: 6038, items: 3683


User:   0%|          | 0/6038 [00:00<?, ?it/s]

Unnamed: 0,HitRate@5,MAP@5,MRR@5,NDCG@5
CQL run 0,0.071547,0.007629,0.035351,0.015745
CQL run 1,0.071878,0.008041,0.036563,0.016253
CQL run 2,0.070719,0.007742,0.034943,0.015793
CQL run 3,0.062935,0.006969,0.032461,0.014182
CQL run 4,0.072209,0.007946,0.035942,0.016215


In [9]:
res_cql_5 = e_cql.results.mean().to_frame().T.rename(index={0:f'CQL avg {n_runs}'})
res_cql_5

Unnamed: 0,HitRate@5,MAP@5,MRR@5,NDCG@5
CQL avg 5,0.069858,0.007665,0.035052,0.015637


### Results

In [10]:
pd.concat([e_base.results, res_cql_5]).sort_values('NDCG@5', ascending=False)

Unnamed: 0,HitRate@5,MAP@5,MRR@5,NDCG@5
CQL avg 5,0.069858,0.007665,0.035052,0.015637
KNN,0.0684,0.007664,0.035544,0.015534
ALS,0.063928,0.006308,0.029549,0.013422
SLIM,0.054323,0.005315,0.02487,0.011333
LightFM,0.051507,0.005164,0.023777,0.010924
