# Wrapping Up Offline RL as part of AutoMLPipeline Workflow

- Paulito Palmes
- IBM Research Europe
- Dublin Research Lab

## Preliminaries


### Online RL vs Offline RL

### Online RL

<img src="rl.png" alt="online RL" width="800" />

- maximize return (accumulation of sum of discounted current and future rewards)
- each observation is part of a sequence of <s,a,r> trajectory
- each action influences future observations and accumulated rewards
- unlike in typical ML problem where the objective is to make one time prediction of the action to take, RL makes a series of predictions dynamically as it receives observations and optimizes accumulation of corresponding rewards

ref: https://rail.eecs.berkeley.edu/deeprlcourse/

### Offline RL

<img src="offlinerl.png" alt="offline RL" width="1000" />

- in online RL, new data is collected for policy update
- in off-policy RL, some old data are retained together with new data for policy update
- in offline RL, all data is collected in advanced to train an RL agent for optimal policy by sampling episodes

ref: https://rail.eecs.berkeley.edu/deeprlcourse/

### Why Offline RL

- Cost: it can be too expensive to interact repeatedly and explore certain environment such as workload or resource management in the cloud but cheaper to collect logs and statistics for offline RL learning
- Risk: it can be risky to train an agent in autonomous driving and robotic operations
- Technological advancement in batch learning with deep learning architectures that can scale-up effectively for large datasets

### Major Objective

Given a dataset containing <state, action, reward> trajectories, create an AutoMLPipeline wrapper function for offline RL to make it trivial to search for the best data processing pipeline for offline RL application.

### Load packages 

In [87]:
using Distributed

nprocs() == 1 && addprocs() 

@everywhere begin
   using AutoOfflineRL
   using AutoMLPipeline
   using Parquet
   using DataFrames
end

### Load Preprocessing Elements

In [102]:
@everywhere begin
   #### Scaler
   rb = SKPreprocessor("RobustScaler");
   pt = SKPreprocessor("PowerTransformer");
   norm = SKPreprocessor("Normalizer");
   mx = SKPreprocessor("MinMaxScaler");
   std = SKPreprocessor("StandardScaler")
   ##### Column selector
   catf = CatFeatureSelector();
   numf = NumFeatureSelector();
   #### feature extractors
   pca = SKPreprocessor("PCA");
   fa = SKPreprocessor("FactorAnalysis");
   ica = SKPreprocessor("FastICA");
   noop = Identity(Dict(:name => "Noop"));
   #### ML/RL agents
   rf  = RandomForest()
   tree = PrunedTree()
   dqn = DiscreteRLOffline("DQN")
   sac = DiscreteRLOffline("DiscreteSAC")
end

### Load Offline Dataset

In [89]:
path = pkgdir(AutoOfflineRL)
dataset = "$path/data/smalldata.parquet"
df = Parquet.read_parquet(dataset) |> DataFrame |> dropmissing
first(df,10)

Row,day,hour,minute,dow,sensor1,sensor2,sensor3,action,reward
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Float64
1,1,0,0,2,7,110,25,10,0.838122
2,1,0,0,2,444,50,56,10,0.639387
3,1,0,0,2,9,138,61,100,0.416196
4,1,0,0,2,365,51,26,10,0.384344
5,1,0,0,2,312,129,42,10,0.37681
6,1,0,0,2,295,141,58,10,0.641975
7,1,0,1,2,92,18,46,10,0.225288
8,1,0,1,2,77,23,4,50,0.335901
9,1,0,1,2,131,61,13,100,0.993489
10,1,0,1,2,473,1,99,100,0.402383


### Convert dataframe to MDP dataset

In [90]:
srow,_ = size(df)

reward = df[:,["reward"]] |> deepcopy |> DataFrame
action = df[:,["action"]] |> deepcopy |> DataFrame
_terminals = zeros(Int,srow)
_terminals[collect(100:1000:9000)] .= 1
_terminals[end] = 1
terminaldf = DataFrame(terminal=_terminals)

observation = df[:, ["day", "hour", "minute", "dow", "sensor1", "sensor2", "sensor3"]]
action_reward_terminal = DataFrame[action, reward, terminaldf];

### Recalling the AutoML Pipeline Workflow

In [105]:
mypipeline = numf |> std
tr = fit_transform!(mypipeline,observation)
first(tr,3)

Row,x1,x2,x3,x4,x5,x6,x7
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,-0.396769,-1.38433,-1.69854,-0.396769,-1.68999,0.172168,-0.878338
2,-0.396769,-1.38433,-1.69854,-0.396769,1.34805,-0.86808,0.200461
3,-0.396769,-1.38433,-1.69854,-0.396769,-1.67609,0.657617,0.374461


In [106]:
mypipeline = numf |> std |> pca
tr = fit_transform!(mypipeline,observation)
first(tr,3)

Row,x1,x2,x3,x4,x5,x6,x7
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,0.162033,0.283571,0.320088,-0.760166,2.38647,1.51367,-4.96579e-17
2,0.158823,-1.50436,1.51579,-0.755993,-0.22961,1.58652,-3.16875e-17
3,0.163828,0.151378,0.754364,0.502795,2.33643,1.47229,-4.6244e-17


In [107]:
mypipeline = numf |> std |> pca |> rf
perf = crossvalidate(mypipeline,observation,action.action)

fold: 1, 33.800000000000004
fold: 2, 31.2
fold: 3, 31.2
fold: 4, 31.5
fold: 5, 33.5
fold: 6, 33.033033033033036
fold: 7, 34.0
fold: 8, 32.300000000000004
fold: 9, 34.2
fold: 10, 34.9
errors: 0


(mean = 32.9633033033033, std = 1.3394702396340163, folds = 10, errors = 0)

### Check crossvalidation performance of an NFQ agent

In [108]:
nfq = DiscreteRLOffline("NFQ")
pipe = (numf |> mx |> pca) |> nfq
tderror=crossvalidateRL(pipe,observation,action_reward_terminal)

Epoch 1/3:   0%|                                       | 0/218 [00:00<?, ?it/s]Epoch 1/3:   0%|                        | 0/218 [00:00<?, ?it/s, loss=2.66e+13]Epoch 1/3:   0%|                        | 0/218 [00:00<?, ?it/s, loss=1.98e+13]Epoch 1/3:   0%|                        | 0/218 [00:00<?, ?it/s, loss=1.63e+13]Epoch 1/3:   0%|                        | 0/218 [00:00<?, ?it/s, loss=1.37e+13]Epoch 1/3:   0%|                        | 0/218 [00:00<?, ?it/s, loss=1.14e+13]Epoch 1/3:   0%|                        | 0/218 [00:00<?, ?it/s, loss=9.57e+12]Epoch 1/3:   0%|                        | 0/218 [00:00<?, ?it/s, loss=8.19e+12]Epoch 1/3:   0%|                        | 0/218 [00:00<?, ?it/s, loss=7.13e+12]Epoch 1/3:  34%|████▊         | 75/218 [00:00<00:00, 741.65it/s, loss=7.13e+12]Epoch 1/3:  34%|████▊         | 75/218 [00:00<00:00, 741.65it/s, loss=6.33e+12]Epoch 1/3:  34%|█████▏         | 75/218 [00:00<00:00, 741.65it/s, loss=5.7e+12]Epoch 1/3:  34%|█████▏         | 75/218

2023-06-22 04:51:12 [debug    ] RoundIterator is selected.
2023-06-22 04:51:12 [info     ] Directory is created at d3rlpy_logs/NFQ_20230622045112
2023-06-22 04:51:12 [debug    ] Fitting scaler...              scaler=min_max
2023-06-22 04:51:13 [debug    ] Building models...
2023-06-22 04:51:13 [debug    ] Models have been built.
2023-06-22 04:51:13 [info     ] Parameters are saved to d3rlpy_logs/NFQ_20230622045112/params.json params={'action_scaler': None, 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 0.99, 'generated_maxlen': 100000, 'learning_rate': 6.25e-05, 'n_critics': 1, 'n_frames': 1, 'n_steps': 1, 'optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'real_ratio': 1.0, 'reward_scaler': None, 'scaler': {'type': 'min_max', 'params': {'maximum': array

4.7952334405060536e23

In [109]:
tderror

4.7952334405060536e23

The TD error is the difference between the agent's current estimate and target value

### Check validation performance of DQN agent

In [110]:
dqn = DiscreteRLOffline("DQN")
pipe = (numf |> std |> ica) |> dqn
tderror=crossvalidateRL(pipe,observation,action_reward_terminal)

Epoch 1/3:   0%|                                       | 0/218 [00:00<?, ?it/s]Epoch 1/3:   0%|                           | 0/218 [00:00<?, ?it/s, loss=0.298]Epoch 1/3:   0%|                           | 0/218 [00:00<?, ?it/s, loss=0.282]Epoch 1/3:   0%|                           | 0/218 [00:00<?, ?it/s, loss=0.251]Epoch 1/3:   0%|                           | 0/218 [00:00<?, ?it/s, loss=0.227]Epoch 1/3:   0%|                           | 0/218 [00:00<?, ?it/s, loss=0.208]Epoch 1/3:   0%|                            | 0/218 [00:00<?, ?it/s, loss=0.19]Epoch 1/3:   0%|                           | 0/218 [00:00<?, ?it/s, loss=0.174]Epoch 1/3:   0%|                           | 0/218 [00:00<?, ?it/s, loss=0.162]Epoch 1/3:   0%|                            | 0/218 [00:00<?, ?it/s, loss=0.15]Epoch 1/3:  40%|███████▏          | 87/218 [00:00<00:00, 866.05it/s, loss=0.15]Epoch 1/3:  40%|███████▏          | 87/218 [00:00<00:00, 866.05it/s, loss=0.14]Epoch 1/3:  40%|██████▊          | 87/2

2023-06-22 04:53:17 [debug    ] RoundIterator is selected.
2023-06-22 04:53:17 [info     ] Directory is created at d3rlpy_logs/DQN_20230622045317
2023-06-22 04:53:17 [debug    ] Fitting scaler...              scaler=min_max
2023-06-22 04:53:17 [debug    ] Building models...
2023-06-22 04:53:17 [debug    ] Models have been built.
2023-06-22 04:53:17 [info     ] Parameters are saved to d3rlpy_logs/DQN_20230622045317/params.json params={'action_scaler': None, 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 0.99, 'generated_maxlen': 100000, 'learning_rate': 6.25e-05, 'n_critics': 1, 'n_frames': 1, 'n_steps': 1, 'optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'real_ratio': 1.0, 'reward_scaler': None, 'scaler': {'type': 'min_max', 'params': {'maximum': array

1.2257244226676562

In [111]:
tderror

1.2257244226676562

The TD error is the difference between the agent's current estimate and target value

### Find optimal OfflineRL pipeline in parallel

In [29]:
function pipelinesearch()
   agentnames = ["DiscreteCQL","NFQ","DoubleDQN","DiscreteSAC","DiscreteBCQ","DiscreteBC","DQN"]
   scalers =  [rb,pt,norm,std,mx,noop]
   extractors = [pca,ica,fa,noop]
   dfresults = @sync @distributed (vcat) for agentname in agentnames
      @distributed (vcat) for sc in scalers
         @distributed (vcat) for xt  in extractors
            try
               rlagent = DiscreteRLOffline(agentname,Dict(:runtime_args=>Dict(:n_epochs=>1)))
               rlpipeline = ((numf |> sc |> xt)) |> rlagent 
               res = crossvalidateRL(rlpipeline,observation,action_reward_terminal)
               scn   = sc.name[1:end - 4]; xtn = xt.name[1:end - 4]; lrn = rlagent.name[1:end - 4]
               pname = "$scn |> $xtn |> $lrn"
               if !isnan(res)
                  DataFrame(pipeline=pname,td_error=res)
               else
                  DataFrame()
               end
            catch e
               println("error in $agentname")
               DataFrame()
            end
         end
      end
   end
   #sort!(dfresults,:percent_action_matches,rev=true)
   return dfresults
end
dftable= pipelinesearch()

      From worker 2:	2023-06-21 09:17:39 [debug    ] RoundIterator is selected.
      From worker 2:	2023-06-21 09:17:39 [info     ] Directory is created at d3rlpy_logs/DiscreteCQL_20230621091739
      From worker 2:	2023-06-21 09:17:39 [debug    ] Fitting scaler...              scaler=min_max
      From worker 2:	2023-06-21 09:17:39 [debug    ] Building models...
      From worker 2:	2023-06-21 09:17:39 [debug    ] Models have been built.
      From worker 2:	2023-06-21 09:17:39 [info     ] Parameters are saved to d3rlpy_logs/DiscreteCQL_20230621091739/params.json params={'action_scaler': None, 'alpha': 1.0, 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 0.99, 'generated_maxlen': 100000, 'learning_rate': 6.25e-05, 'n_critics': 1, 'n_frames': 1, 'n_steps': 1, 'optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'q_func_factory

      From worker 6:	2023-06-21 09:17:39 [debug    ] RoundIterator is selected.
      From worker 6:	2023-06-21 09:17:39 [info     ] Directory is created at d3rlpy_logs/DiscreteBCQ_20230621091739
      From worker 6:	2023-06-21 09:17:39 [debug    ] Fitting scaler...              scaler=min_max
      From worker 7:	2023-06-21 09:17:39 [debug    ] RoundIterator is selected.
      From worker 7:	2023-06-21 09:17:39 [info     ] Directory is created at d3rlpy_logs/DiscreteBC_20230621091739
      From worker 7:	2023-06-21 09:17:39 [debug    ] Fitting scaler...              scaler=min_max
      From worker 6:	2023-06-21 09:17:39 [debug    ] Building models...
      From worker 7:	2023-06-21 09:17:39 [debug    ] Building models...
      From worker 6:	2023-06-21 09:17:39 [debug    ] Models have been built.
      From worker 7:	2023-06-21 09:17:39 [debug    ] Models have been built.
      From worker 6:	2023-06-21 09:17:39 [info     ] Parameters are saved to d3rlpy_logs/DiscreteBCQ_202306210917

      From worker 4:	2023-06-21 09:17:40 [debug    ] RoundIterator is selected.
      From worker 4:	2023-06-21 09:17:40 [info     ] Directory is created at d3rlpy_logs/DoubleDQN_20230621091740
      From worker 4:	2023-06-21 09:17:40 [debug    ] Fitting scaler...              scaler=min_max
      From worker 4:	2023-06-21 09:17:40 [debug    ] Building models...
      From worker 4:	2023-06-21 09:17:40 [debug    ] Models have been built.
      From worker 4:	2023-06-21 09:17:40 [info     ] Parameters are saved to d3rlpy_logs/DoubleDQN_20230621091740/params.json params={'action_scaler': None, 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 0.99, 'generated_maxlen': 100000, 'learning_rate': 6.25e-05, 'n_critics': 1, 'n_frames': 1, 'n_steps': 1, 'optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'q_func_factory': {'type': 'mean'

      From worker 4:	2023-06-21 09:17:41 [info     ] DoubleDQN_20230621091740: epoch=1 step=190 epoch=1 metrics={'time_sample_batch': 4.182489294754831e-05, 'time_algorithm_update': 0.001713250812731291, 'loss': 0.08083449318808944, 'time_step': 0.0017797846543161492, 'td_error': 0.3419192071373942} step=190
      From worker 4:	2023-06-21 09:17:41 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_20230621091740/model_190.pt
      From worker 4:	2023-06-21 09:17:41 [debug    ] RoundIterator is selected.
      From worker 4:	2023-06-21 09:17:41 [info     ] Directory is created at d3rlpy_logs/DoubleDQN_20230621091741
      From worker 4:	2023-06-21 09:17:41 [debug    ] Fitting scaler...              scaler=min_max
      From worker 6:	2023-06-21 09:17:41 [debug    ] RoundIterator is selected.
      From worker 6:	2023-06-21 09:17:41 [info     ] Directory is created at d3rlpy_logs/DiscreteBCQ_20230621091741
      From worker 6:	2023-06-21 09:17:41 [debug    ] Fitting scaler.

      From worker 2:	2023-06-21 09:17:41 [debug    ] Building models...
      From worker 2:	2023-06-21 09:17:41 [debug    ] Models have been built.
      From worker 3:	2023-06-21 09:17:41 [debug    ] RoundIterator is selected.
      From worker 2:	2023-06-21 09:17:41 [info     ] Parameters are saved to d3rlpy_logs/DiscreteCQL_20230621091741/params.json params={'action_scaler': None, 'alpha': 1.0, 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 0.99, 'generated_maxlen': 100000, 'learning_rate': 6.25e-05, 'n_critics': 1, 'n_frames': 1, 'n_steps': 1, 'optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'real_ratio': 1.0, 'reward_scaler': None, 'scaler': {'type': 'min_max', 'params': {'maximum': array([[1.7241911, 1.5049787, 1.6107398, 1.5126448, 1.7445312, 0

      From worker 4:	error in DiscreteBC
      From worker 4:	2023-06-21 09:17:43 [debug    ] RoundIterator is selected.
      From worker 4:	2023-06-21 09:17:43 [info     ] Directory is created at d3rlpy_logs/DiscreteBC_20230621091743
      From worker 4:	2023-06-21 09:17:43 [debug    ] Fitting scaler...              scaler=min_max
      From worker 4:	2023-06-21 09:17:43 [debug    ] Building models...
      From worker 4:	2023-06-21 09:17:43 [debug    ] Models have been built.
      From worker 4:	2023-06-21 09:17:43 [info     ] Parameters are saved to d3rlpy_logs/DiscreteBC_20230621091743/params.json params={'action_scaler': None, 'batch_size': 100, 'beta': 0.5, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 1.0, 'generated_maxlen': 100000, 'learning_rate': 0.001, 'n_frames': 1, 'n_steps': 1, 'optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': Fals

Epoch 1/1: 100%|███████████████████| 78/78 [00:00<00:00, 838.04it/s, loss=2.26]
Epoch 1/1: 100%|███████████████| 218/218 [00:00<00:00, 799.78it/s, loss=0.0928]
      From worker 2:	2023-06-21 09:17:44 [info     ] DQN_20230621091744: epoch=1 step=218 epoch=1 metrics={'time_sample_batch': 3.022557004876093e-05, 'time_algorithm_update': 0.001195727138344301, 'loss': 0.09131083575102987, 'time_step': 0.0012449113600844638, 'td_error': 0.33717236146601487} step=218
      From worker 2:	2023-06-21 09:17:44 [info     ] Model parameters are saved to d3rlpy_logs/DQN_20230621091744/model_218.pt
      From worker 2:	2023-06-21 09:17:44 [debug    ] RoundIterator is selected.
      From worker 5:	error in DiscreteBC
      From worker 5:	2023-06-21 09:17:44 [debug    ] RoundIterator is selected.
      From worker 4:	2023-06-21 09:17:45 [info     ] Directory is created at d3rlpy_logs/DQN_20230621091745
      From worker 4:	2023-06-21 09:17:45 [debug    ] Fitting scaler...              scaler=min_max


      From worker 3:	2023-06-21 09:17:47 [debug    ] RoundIterator is selected.
Epoch 1/1: 100%|███████████████| 246/246 [00:00<00:00, 895.66it/s, loss=0.0896]
      From worker 2:	2023-06-21 09:17:47 [info     ] DQN_20230621091747: epoch=1 step=246 epoch=1 metrics={'time_sample_batch': 2.6294855567497934e-05, 'time_algorithm_update': 0.0010709723805993553, 'loss': 0.0887927300319439, 'time_step': 0.0011126607414183578, 'td_error': 0.3539537419557423} step=246
      From worker 2:	2023-06-21 09:17:47 [info     ] Model parameters are saved to d3rlpy_logs/DQN_20230621091747/model_246.pt
      From worker 2:	2023-06-21 09:17:47 [debug    ] RoundIterator is selected.
      From worker 3:	2023-06-21 09:17:48 [info     ] Directory is created at d3rlpy_logs/DQN_20230621091748
      From worker 3:	2023-06-21 09:17:48 [debug    ] Fitting scaler...              scaler=min_max
      From worker 3:	2023-06-21 09:17:48 [debug    ] Building models...
      From worker 3:	2023-06-21 09:17:48 [debug  

Epoch 1/1: 100%|███████████████| 218/218 [00:00<00:00, 914.30it/s, loss=0.0887]
      From worker 5:	2023-06-21 09:17:52 [info     ] DQN_20230621091752: epoch=1 step=218 epoch=1 metrics={'time_sample_batch': 2.3355177783091135e-05, 'time_algorithm_update': 0.0010528805059030516, 'loss': 0.08727689919600246, 'time_step': 0.0010901317683928605, 'td_error': 0.33895740038788924} step=218
      From worker 5:	2023-06-21 09:17:52 [info     ] Model parameters are saved to d3rlpy_logs/DQN_20230621091752/model_218.pt
      From worker 5:	2023-06-21 09:17:52 [debug    ] RoundIterator is selected.
      From worker 5:	2023-06-21 09:17:53 [info     ] Directory is created at d3rlpy_logs/DQN_20230621091753
      From worker 5:	2023-06-21 09:17:53 [debug    ] Fitting scaler...              scaler=min_max
      From worker 5:	2023-06-21 09:17:53 [debug    ] Building models...
      From worker 5:	2023-06-21 09:17:53 [debug    ] Models have been built.
      From worker 5:	2023-06-21 09:17:53 [info    

Epoch 1/1: 100%|█| 95/95 [00:00<00:00, 231.40it/s, temp_loss=-.711, temp=1, cri
      From worker 7:	2023-06-21 09:17:55 [info     ] DiscreteSAC_20230621091754: epoch=1 step=95 epoch=1 metrics={'time_sample_batch': 5.0424274645353616e-05, 'time_algorithm_update': 0.0042391827231959296, 'temp_loss': -0.7389284298853263, 'temp': 1.0043758367237292, 'critic_loss': 4.360287598559731, 'actor_loss': -4.9657077889693415, 'time_step': 0.004313064876355623, 'td_error': 0.6025613924128718} step=95
      From worker 7:	2023-06-21 09:17:55 [info     ] Model parameters are saved to d3rlpy_logs/DiscreteSAC_20230621091754/model_95.pt
      From worker 7:	2023-06-21 09:17:55 [debug    ] RoundIterator is selected.
      From worker 7:	2023-06-21 09:17:55 [info     ] Directory is created at d3rlpy_logs/DiscreteSAC_20230621091755
      From worker 7:	2023-06-21 09:17:55 [debug    ] Fitting scaler...              scaler=min_max
      From worker 7:	2023-06-21 09:17:55 [debug    ] Building models...
      

Epoch 1/1: 100%|███████████████| 218/218 [00:00<00:00, 877.64it/s, loss=0.0968]
      From worker 3:	2023-06-21 09:17:56 [info     ] DQN_20230621091756: epoch=1 step=218 epoch=1 metrics={'time_sample_batch': 2.6084961147483336e-05, 'time_algorithm_update': 0.001093139342211802, 'loss': 0.0951381272918314, 'time_step': 0.0011351053867865047, 'td_error': 0.3471383628173835} step=218
      From worker 3:	2023-06-21 09:17:56 [info     ] Model parameters are saved to d3rlpy_logs/DQN_20230621091756/model_218.pt
      From worker 3:	2023-06-21 09:17:56 [debug    ] RoundIterator is selected.
      From worker 2:	2023-06-21 09:17:56 [info     ] Directory is created at d3rlpy_logs/DiscreteBC_20230621091756
      From worker 2:	2023-06-21 09:17:56 [debug    ] Fitting scaler...              scaler=min_max
      From worker 2:	2023-06-21 09:17:56 [debug    ] Building models...
      From worker 2:	2023-06-21 09:17:56 [debug    ] Models have been built.
      From worker 2:	2023-06-21 09:17:56 [info

Epoch 1/1: 100%|███████████████| 218/218 [00:00<00:00, 698.20it/s, loss=0.0834]
Epoch 1/1: 100%|█| 109/109 [00:00<00:00, 213.08it/s, temp_loss=-.588, temp=0.99
      From worker 3:	2023-06-21 09:17:57 [info     ] DQN_20230621091757: epoch=1 step=218 epoch=1 metrics={'time_sample_batch': 3.408729483228211e-05, 'time_algorithm_update': 0.0013705765435455043, 'loss': 0.082053680357378, 'time_step': 0.0014254740618784493, 'td_error': 0.3401573140539434} step=218
      From worker 3:	2023-06-21 09:17:57 [info     ] Model parameters are saved to d3rlpy_logs/DQN_20230621091757/model_218.pt
      From worker 3:	2023-06-21 09:17:57 [debug    ] RoundIterator is selected.
      From worker 2:	2023-06-21 09:17:57 [info     ] DiscreteSAC_20230621091757: epoch=1 step=109 epoch=1 metrics={'time_sample_batch': 5.6034928068108514e-05, 'time_algorithm_update': 0.004601797926316567, 'temp_loss': -0.5967188704034758, 'temp': 0.9934305532262959, 'critic_loss': 4.554059888244769, 'actor_loss': -4.8859213207

      From worker 2:	2023-06-21 09:17:59 [debug    ] RoundIterator is selected.
Epoch 1/1: 100%|█| 95/95 [00:00<00:00, 226.11it/s, temp_loss=-.558, temp=0.991,
      From worker 3:	2023-06-21 09:17:59 [info     ] DiscreteSAC_20230621091759: epoch=1 step=95 epoch=1 metrics={'time_sample_batch': 5.3493600142629526e-05, 'time_algorithm_update': 0.004334371968319542, 'temp_loss': -0.6238991316726529, 'temp': 0.9914160069666411, 'critic_loss': 4.550972572753304, 'actor_loss': -4.8012980159960295, 'time_step': 0.004412668629696494, 'td_error': 0.3674936336468447} step=95
      From worker 3:	2023-06-21 09:17:59 [info     ] Model parameters are saved to d3rlpy_logs/DiscreteSAC_20230621091759/model_95.pt
      From worker 3:	2023-06-21 09:17:59 [debug    ] RoundIterator is selected.
      From worker 6:	2023-06-21 09:17:59 [info     ] Directory is created at d3rlpy_logs/DQN_20230621091759
      From worker 6:	2023-06-21 09:17:59 [debug    ] Fitting scaler...              scaler=min_max
      F

Epoch 1/1: 100%|█| 109/109 [00:00<00:00, 255.19it/s, temp_loss=-.523, temp=0.99
      From worker 6:	2023-06-21 09:18:01 [info     ] Directory is created at d3rlpy_logs/DQN_20230621091801
      From worker 6:	2023-06-21 09:18:01 [debug    ] Fitting scaler...              scaler=min_max
      From worker 6:	2023-06-21 09:18:01 [debug    ] Building models...
      From worker 6:	2023-06-21 09:18:01 [debug    ] Models have been built.
      From worker 6:	2023-06-21 09:18:01 [info     ] Parameters are saved to d3rlpy_logs/DQN_20230621091801/params.json params={'action_scaler': None, 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 0.99, 'generated_maxlen': 100000, 'learning_rate': 6.25e-05, 'n_critics': 1, 'n_frames': 1, 'n_steps': 1, 'optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'q_func_factory': {'type': 'mean', 'params': 

Epoch 1/1: 100%|███████████████| 190/190 [00:00<00:00, 826.76it/s, loss=0.0938]
      From worker 4:	2023-06-21 09:18:03 [info     ] DQN_20230621091803: epoch=1 step=190 epoch=1 metrics={'time_sample_batch': 2.738927540026213e-05, 'time_algorithm_update': 0.0011602765635440224, 'loss': 0.09130570031702519, 'time_step': 0.0012044906616210938, 'td_error': 0.33305522559426054} step=190
      From worker 4:	2023-06-21 09:18:03 [info     ] Model parameters are saved to d3rlpy_logs/DQN_20230621091803/model_190.pt
      From worker 4:	2023-06-21 09:18:03 [debug    ] RoundIterator is selected.
      From worker 2:	2023-06-21 09:18:03 [debug    ] RoundIterator is selected.
      From worker 3:	2023-06-21 09:18:03 [info     ] Directory is created at d3rlpy_logs/DiscreteSAC_20230621091803
      From worker 3:	2023-06-21 09:18:03 [debug    ] Fitting scaler...              scaler=min_max
      From worker 3:	2023-06-21 09:18:03 [debug    ] Building models...
      From worker 3:	2023-06-21 09:18:03

      From worker 2:	2023-06-21 09:18:05 [info     ] Directory is created at d3rlpy_logs/DQN_20230621091805
      From worker 2:	2023-06-21 09:18:05 [debug    ] Fitting scaler...              scaler=min_max
      From worker 2:	2023-06-21 09:18:05 [debug    ] Building models...
      From worker 2:	2023-06-21 09:18:05 [debug    ] Models have been built.
      From worker 2:	2023-06-21 09:18:05 [info     ] Parameters are saved to d3rlpy_logs/DQN_20230621091805/params.json params={'action_scaler': None, 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 0.99, 'generated_maxlen': 100000, 'learning_rate': 6.25e-05, 'n_critics': 1, 'n_frames': 1, 'n_steps': 1, 'optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'real_ratio': 1.0, 'reward_scaler': None, 'scaler': {

      From worker 4:	2023-06-21 09:18:07 [info     ] Parameters are saved to d3rlpy_logs/DiscreteBC_20230621091807/params.json params={'action_scaler': None, 'batch_size': 100, 'beta': 0.5, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 1.0, 'generated_maxlen': 100000, 'learning_rate': 0.001, 'n_frames': 1, 'n_steps': 1, 'optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'real_ratio': 1.0, 'reward_scaler': None, 'scaler': {'type': 'min_max', 'params': {'maximum': array([[1., 1., 1., 1., 1., 1., 1.]], dtype=float32), 'minimum': array([[0., 0., 0., 0., 0., 0., 0.]], dtype=float32)}}, 'use_gpu': None, 'algorithm': 'DiscreteBC', 'observation_shape': (7,), 'action_size': 101}
Epoch 1/1: 100%|███████████████████| 78/78 [00:00<00:00, 884.13it/s, loss=2.33]
      From worker 4:	error in DiscreteBC2023-06-21 09:18:07 [debug    ] RoundIterator is selec

Epoch 1/1: 100%|█| 109/109 [00:00<00:00, 249.91it/s, temp_loss=-.473, temp=0.99
      From worker 6:	2023-06-21 09:18:08 [info     ] DiscreteSAC_20230621091808: epoch=1 step=109 epoch=1 metrics={'time_sample_batch': 4.713688421686855e-05, 'time_algorithm_update': 0.003924969139449093, 'temp_loss': -0.5328724964248409, 'temp': 0.9949889554889924, 'critic_loss': 4.685928180677082, 'actor_loss': -5.001051828401898, 'time_step': 0.003993285905330553, 'td_error': 0.34912000787258507} step=109
      From worker 6:	2023-06-21 09:18:08 [info     ] Model parameters are saved to d3rlpy_logs/DiscreteSAC_20230621091808/model_109.pt
      From worker 6:	2023-06-21 09:18:08 [debug    ] RoundIterator is selected.
      From worker 3:	2023-06-21 09:18:09 [info     ] Directory is created at d3rlpy_logs/DQN_20230621091809
      From worker 3:	2023-06-21 09:18:09 [debug    ] Fitting scaler...              scaler=min_max
      From worker 3:	2023-06-21 09:18:09 [debug    ] Building models...
      From wo

Epoch 1/1: 100%|███████████████| 218/218 [00:00<00:00, 900.34it/s, loss=0.0906]
      From worker 2:	2023-06-21 09:18:11 [info     ] DQN_20230621091810: epoch=1 step=218 epoch=1 metrics={'time_sample_batch': 2.4165582219395068e-05, 'time_algorithm_update': 0.0010677105789884515, 'loss': 0.08903334928044213, 'time_step': 0.0011068341928884523, 'td_error': 0.33816977977189183} step=218
      From worker 2:	2023-06-21 09:18:11 [info     ] Model parameters are saved to d3rlpy_logs/DQN_20230621091810/model_218.pt
      From worker 2:	2023-06-21 09:18:11 [debug    ] RoundIterator is selected.
      From worker 2:	2023-06-21 09:18:11 [info     ] Directory is created at d3rlpy_logs/DQN_20230621091811
      From worker 2:	2023-06-21 09:18:11 [debug    ] Fitting scaler...              scaler=min_max
      From worker 2:	2023-06-21 09:18:11 [debug    ] Building models...
      From worker 2:	2023-06-21 09:18:11 [debug    ] Models have been built.
      From worker 2:	2023-06-21 09:18:11 [info    

Epoch 1/1: 100%|█████████████████| 218/218 [00:00<00:00, 562.95it/s, loss=3.58]
      From worker 7:	2023-06-21 09:18:12 [info     ] DiscreteCQL_20230621091812: epoch=1 step=218 epoch=1 metrics={'time_sample_batch': 3.067397196358497e-05, 'time_algorithm_update': 0.0017221323940732063, 'loss': 3.5521724431886588, 'time_step': 0.0017708004067797179, 'td_error': 0.4025814428985961} step=218
      From worker 7:	2023-06-21 09:18:12 [info     ] Model parameters are saved to d3rlpy_logs/DiscreteCQL_20230621091812/model_218.pt
      From worker 7:	2023-06-21 09:18:12 [debug    ] RoundIterator is selected.
      From worker 3:	2023-06-21 09:18:12 [info     ] Directory is created at d3rlpy_logs/DiscreteSAC_20230621091812
      From worker 3:	2023-06-21 09:18:12 [debug    ] Fitting scaler...              scaler=min_max
      From worker 5:	2023-06-21 09:18:12 [info     ] Directory is created at d3rlpy_logs/DQN_20230621091812
      From worker 5:	2023-06-21 09:18:12 [debug    ] Fitting scaler...

      From worker 6:	2023-06-21 09:18:13 [info     ] Directory is created at d3rlpy_logs/DiscreteSAC_20230621091813
      From worker 6:	2023-06-21 09:18:13 [debug    ] Fitting scaler...              scaler=min_max
      From worker 6:	2023-06-21 09:18:13 [debug    ] Building models...
      From worker 6:	2023-06-21 09:18:13 [debug    ] Models have been built.
      From worker 6:	2023-06-21 09:18:13 [info     ] Parameters are saved to d3rlpy_logs/DiscreteSAC_20230621091813/params.json params={'action_scaler': None, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'actor_learning_rate': 0.0003, 'actor_optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 0.0001, 'weight_decay': 0, 'amsgrad': False}, 'batch_size': 64, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'critic_learning_rate': 0.0003, 'critic_optim_factor

Epoch 1/1: 100%|█████████████████| 218/218 [00:00<00:00, 530.56it/s, loss=3.81]
      From worker 2:	2023-06-21 09:18:14 [info     ] DiscreteCQL_20230621091814: epoch=1 step=218 epoch=1 metrics={'time_sample_batch': 3.284817441887812e-05, 'time_algorithm_update': 0.0018273012353739607, 'loss': 3.778564421408767, 'time_step': 0.0018788105850919671, 'td_error': 0.3453880074458524} step=218
      From worker 2:	2023-06-21 09:18:14 [info     ] Model parameters are saved to d3rlpy_logs/DiscreteCQL_20230621091814/model_218.pt
      From worker 2:	2023-06-21 09:18:14 [debug    ] RoundIterator is selected.
Epoch 1/1: 100%|█████████████| 246/246 [00:00<00:00, 752.75it/s, loss=1.06e+11]
      From worker 5:	2023-06-21 09:18:14 [info     ] DQN_20230621091814: epoch=1 step=246 epoch=1 metrics={'time_sample_batch': 3.2689513229742286e-05, 'time_algorithm_update': 0.0012699772671955387, 'loss': 104035806116.42276, 'time_step': 0.0013225572865183761, 'td_error': 7.85211915821232e+21} step=246
      F

      From worker 2:	2023-06-21 09:18:15 [info     ] Directory is created at d3rlpy_logs/DiscreteCQL_20230621091815
      From worker 2:	2023-06-21 09:18:15 [debug    ] Fitting scaler...              scaler=min_max
      From worker 3:	2023-06-21 09:18:15 [info     ] Directory is created at d3rlpy_logs/DQN_20230621091815
      From worker 3:	2023-06-21 09:18:15 [debug    ] Fitting scaler...              scaler=min_max
      From worker 2:	2023-06-21 09:18:15 [debug    ] Building models...
      From worker 2:	2023-06-21 09:18:15 [debug    ] Models have been built.
      From worker 3:	2023-06-21 09:18:15 [debug    ] Building models...
      From worker 2:	2023-06-21 09:18:15 [info     ] Parameters are saved to d3rlpy_logs/DiscreteCQL_20230621091815/params.json params={'action_scaler': None, 'alpha': 1.0, 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 0.99, 'generated_maxlen': 100000, 'l

      From worker 3:	2023-06-21 09:18:17 [info     ] Directory is created at d3rlpy_logs/DQN_20230621091817
      From worker 3:	2023-06-21 09:18:17 [debug    ] Fitting scaler...              scaler=min_max
      From worker 3:	2023-06-21 09:18:17 [debug    ] Building models...
      From worker 3:	2023-06-21 09:18:17 [debug    ] Models have been built.
      From worker 3:	2023-06-21 09:18:17 [info     ] Parameters are saved to d3rlpy_logs/DQN_20230621091817/params.json params={'action_scaler': None, 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 0.99, 'generated_maxlen': 100000, 'learning_rate': 6.25e-05, 'n_critics': 1, 'n_frames': 1, 'n_steps': 1, 'optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'real_ratio': 1.0, 'reward_scaler': None, 'scaler': {

Epoch 1/1: 100%|██████████████████| 218/218 [00:00<00:00, 346.58it/s, loss=3.7]
      From worker 2:	2023-06-21 09:18:17 [info     ] DiscreteBCQ_20230621091817: epoch=1 step=218 epoch=1 metrics={'time_sample_batch': 4.236523164521664e-05, 'time_algorithm_update': 0.0028101413621814974, 'loss': 3.6654750502437627, 'time_step': 0.0028769773080808306, 'td_error': 0.34223032280941124} step=218
      From worker 2:	2023-06-21 09:18:17 [info     ] Model parameters are saved to d3rlpy_logs/DiscreteBCQ_20230621091817/model_218.pt
      From worker 2:	2023-06-21 09:18:17 [debug    ] RoundIterator is selected.
Epoch 1/1: 100%|█| 123/123 [00:00<00:00, 179.32it/s, temp_loss=-.57, temp=0.994
      From worker 4:	2023-06-21 09:18:17 [info     ] DiscreteSAC_20230621091817: epoch=1 step=123 epoch=1 metrics={'time_sample_batch': 6.329722520781727e-05, 'time_algorithm_update': 0.005471595903722252, 'temp_loss': -0.5742800933785919, 'temp': 0.9941157165581618, 'critic_loss': 4.289953095156972, 'actor_los

Epoch 1/1: 100%|███████████████████| 78/78 [00:00<00:00, 721.39it/s, loss=2.27]
Epoch 1/1: 100%|█████████████████| 218/218 [00:00<00:00, 515.78it/s, loss=3.98]
      From worker 5:	2023-06-21 09:18:19 [info     ] DiscreteCQL_20230621091818: epoch=1 step=218 epoch=1 metrics={'time_sample_batch': 3.342781591852871e-05, 'time_algorithm_update': 0.0018795695873575474, 'loss': 3.9530255116453956, 'time_step': 0.0019324689830115083, 'td_error': 0.3699182840698233} step=218
      From worker 5:	2023-06-21 09:18:19 [info     ] Model parameters are saved to d3rlpy_logs/DiscreteCQL_20230621091818/model_218.pt
      From worker 5:	2023-06-21 09:18:19 [debug    ] RoundIterator is selected.
      From worker 5:	2023-06-21 09:18:19 [info     ] Directory is created at d3rlpy_logs/DiscreteCQL_20230621091819
      From worker 5:	2023-06-21 09:18:19 [debug    ] Fitting scaler...              scaler=min_max
      From worker 5:	2023-06-21 09:18:19 [debug    ] Building models...
      From worker 5:	2023-

      From worker 3:	2023-06-21 09:18:20 [info     ] Directory is created at d3rlpy_logs/DiscreteCQL_20230621091820
      From worker 3:	2023-06-21 09:18:20 [debug    ] Fitting scaler...              scaler=min_max
      From worker 3:	2023-06-21 09:18:20 [debug    ] Building models...
      From worker 3:	2023-06-21 09:18:20 [debug    ] Models have been built.
      From worker 3:	2023-06-21 09:18:20 [info     ] Parameters are saved to d3rlpy_logs/DiscreteCQL_20230621091820/params.json params={'action_scaler': None, 'alpha': 1.0, 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 0.99, 'generated_maxlen': 100000, 'learning_rate': 6.25e-05, 'n_critics': 1, 'n_frames': 1, 'n_steps': 1, 'optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'real_ratio': 1.0, 'rew

      From worker 5:	2023-06-21 09:18:21 [info     ] Directory is created at d3rlpy_logs/DiscreteCQL_20230621091821
      From worker 5:	2023-06-21 09:18:21 [debug    ] Fitting scaler...              scaler=min_max
      From worker 5:	2023-06-21 09:18:21 [debug    ] Building models...
      From worker 5:	2023-06-21 09:18:21 [debug    ] Models have been built.
      From worker 5:	2023-06-21 09:18:21 [info     ] Parameters are saved to d3rlpy_logs/DiscreteCQL_20230621091821/params.json params={'action_scaler': None, 'alpha': 1.0, 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 0.99, 'generated_maxlen': 100000, 'learning_rate': 6.25e-05, 'n_critics': 1, 'n_frames': 1, 'n_steps': 1, 'optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'real_ratio': 1.0, 'rew

Epoch 1/1: 100%|█| 109/109 [00:00<00:00, 205.43it/s, temp_loss=-4.59, temp=1.02
      From worker 5:	2023-06-21 09:18:22 [info     ] DiscreteSAC_20230621091822: epoch=1 step=109 epoch=1 metrics={'time_sample_batch': 5.916499216622169e-05, 'time_algorithm_update': 0.004772114097525221, 'temp_loss': -4.596942179793611, 'temp': 1.0167383994531194, 'critic_loss': 276650827249.90826, 'actor_loss': -487038654783.41284, 'time_step': 0.004856610516889379, 'td_error': 1.4968336996773644e+22} step=109
      From worker 5:	2023-06-21 09:18:22 [info     ] Model parameters are saved to d3rlpy_logs/DiscreteSAC_20230621091822/model_109.pt
      From worker 5:	2023-06-21 09:18:22 [debug    ] RoundIterator is selected.
      From worker 3:	2023-06-21 09:18:22 [info     ] Directory is created at d3rlpy_logs/DiscreteCQL_20230621091822
      From worker 3:	2023-06-21 09:18:22 [debug    ] Fitting scaler...              scaler=min_max
      From worker 3:	2023-06-21 09:18:22 [debug    ] Building models...
 

      From worker 3:	2023-06-21 09:18:24 [info     ] Directory is created at d3rlpy_logs/DiscreteCQL_20230621091824
      From worker 3:	2023-06-21 09:18:24 [debug    ] Fitting scaler...              scaler=min_max
      From worker 3:	2023-06-21 09:18:24 [debug    ] Building models...
      From worker 3:	2023-06-21 09:18:24 [debug    ] Models have been built.
      From worker 3:	2023-06-21 09:18:24 [info     ] Parameters are saved to d3rlpy_logs/DiscreteCQL_20230621091824/params.json params={'action_scaler': None, 'alpha': 1.0, 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 0.99, 'generated_maxlen': 100000, 'learning_rate': 6.25e-05, 'n_critics': 1, 'n_frames': 1, 'n_steps': 1, 'optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'real_ratio': 1.0, 'rew

      From worker 5:	2023-06-21 09:18:25 [debug    ] RoundIterator is selected.
Epoch 1/1: 100%|█████████████████| 246/246 [00:00<00:00, 573.03it/s, loss=3.69]
      From worker 3:	2023-06-21 09:18:25 [info     ] DiscreteCQL_20230621091825: epoch=1 step=246 epoch=1 metrics={'time_sample_batch': 2.762554137687373e-05, 'time_algorithm_update': 0.0016954652662199688, 'loss': 3.665842409056377, 'time_step': 0.0017403906922999436, 'td_error': 0.38482568770775694} step=246
      From worker 3:	2023-06-21 09:18:25 [info     ] Model parameters are saved to d3rlpy_logs/DiscreteCQL_20230621091825/model_246.pt
      From worker 3:	2023-06-21 09:18:25 [debug    ] RoundIterator is selected.
      From worker 5:	2023-06-21 09:18:26 [info     ] Directory is created at d3rlpy_logs/DiscreteCQL_20230621091826
      From worker 5:	2023-06-21 09:18:26 [debug    ] Fitting scaler...              scaler=min_max
      From worker 5:	2023-06-21 09:18:26 [debug    ] Building models...
      From worker 5:	2023-

Excessive output truncated after 524357 bytes.

      From worker 4:	2023-06-21 09:18:26 [info     ] Directory is created at d3rlpy_logs/DiscreteSAC_20230621091826
      From worker 4:	2023-06-21 09:18:26 [debug    ] Fitting scaler...              scaler=min_max
      From worker 4:	2023-06-21 09:18:27 [debug    ] Building models...
      From worker 4:	2023-06-21 09:18:27 [debug    ] Models have been built.
      From worker 4:	2023-06-21 09:18:27 [info     ] Parameters are saved to d3rlpy_logs/DiscreteSAC_20230621091826/params.json params={'action_scaler': None, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'actor_learning_rate': 0.0003, 'actor_optim_factory': {'optim_cls': 'Adam', 'betas': (0.9, 0.999), 'eps': 0.0001, 'weight_decay': 0, 'amsgrad': False}, 'batch_size': 64, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'critic_learning_rate': 0.0003, 'critic_optim_factor

Row,pipeline,td_error
Unnamed: 0_level_1,String,Float64
1,RobustScaler |> PCA |> DiscreteCQL,0.348413
2,RobustScaler |> FastICA |> DiscreteCQL,0.375755
3,PowerTransformer |> PCA |> DiscreteCQL,0.360337
4,PowerTransformer |> FastICA |> DiscreteCQL,0.351133
5,Normalizer |> PCA |> DiscreteCQL,0.358127
6,Normalizer |> FastICA |> DiscreteCQL,0.386176
7,Normalizer |> Noop |> DiscreteCQL,0.37901
8,StandardScaler |> PCA |> DiscreteCQL,0.386691
9,StandardScaler |> Noop |> DiscreteCQL,0.381613
10,MinMaxScaler |> PCA |> DiscreteCQL,0.36608


### Results

In [99]:
sort!(dftable,:td_error,rev=false)
dftable

Row,pipeline,td_error
Unnamed: 0_level_1,String,Float64
1,Noop |> FastICA |> DoubleDQN,0.325789
2,StandardScaler |> FastICA |> DQN,0.329664
3,PowerTransformer |> FastICA |> DQN,0.330706
4,PowerTransformer |> FastICA |> DiscreteSAC,0.331318
5,Normalizer |> FastICA |> DQN,0.332194
6,PowerTransformer |> FastICA |> DiscreteBCQ,0.333078
7,RobustScaler |> FastICA |> DQN,0.33313
8,StandardScaler |> Noop |> DoubleDQN,0.33318
9,StandardScaler |> PCA |> DoubleDQN,0.333818
10,RobustScaler |> PCA |> DoubleDQN,0.334047


##### Top 5 and last 5 results

In [100]:
first(dftable,5)

Row,pipeline,td_error
Unnamed: 0_level_1,String,Float64
1,Noop |> FastICA |> DoubleDQN,0.325789
2,StandardScaler |> FastICA |> DQN,0.329664
3,PowerTransformer |> FastICA |> DQN,0.330706
4,PowerTransformer |> FastICA |> DiscreteSAC,0.331318
5,Normalizer |> FastICA |> DQN,0.332194


In [101]:
last(dftable,5)

Row,pipeline,td_error
Unnamed: 0_level_1,String,Float64
1,Noop |> FactorAnalysis |> NFQ,248772000000000.0
2,PowerTransformer |> Noop |> DQN,5.34366e+21
3,PowerTransformer |> Noop |> DoubleDQN,3.26964e+22
4,RobustScaler |> PCA |> NFQ,3.56353e+25
5,RobustScaler |> PCA |> DiscreteSAC,3.36113e+26
