# CartPole Environment

In [None]:
# |export
import gym
import torch.optim as optim
from torch.optim import Adam
from d3rlpy.algos import DQN, DoubleDQN
from d3rlpy.models.optimizers import OptimizerFactory
from d3rlpy.online.buffers import ReplayBuffer
from d3rlpy.online.explorers import LinearDecayEpsilonGreedy, ConstantEpsilonGreedy,NormalNoise

# Simple DQN

load pre-built environment with openAI Gym

first of all , we start by initializing the environment, which is the CartPole environment in this notebook, then we choose : 
- An optimizer that will update the network weights during training(optim_factory). There are a bunch of optimizers that we can select. [View](https://d3rlpy.readthedocs.io/en/stable/references/optimizers.html) 
- [Replay buffer](https://d3rlpy.readthedocs.io/en/stable/references/generated/d3rlpy.online.buffers.ReplayBuffer.html?highlight=ReplayBuffer)  where we can store experiences.
- The explorer, will deal with the exploration/exploitation dilemma, here we can find 3 possible explorers ([Constant explorer](https://d3rlpy.readthedocs.io/en/stable/references/generated/d3rlpy.online.explorers.ConstantEpsilonGreedy.html#d3rlpy.online.explorers.ConstantEpsilonGreedy), [LinearDecayEpsilonGreedy](https://d3rlpy.readthedocs.io/en/stable/references/generated/d3rlpy.online.explorers.LinearDecayEpsilonGreedy.html#d3rlpy.online.explorers.LinearDecayEpsilonGreedy), [NormalNoise](https://d3rlpy.readthedocs.io/en/stable/references/generated/d3rlpy.online.explorers.NormalNoise.html#d3rlpy.online.explorers.NormalNoise))
- The model, here we used the [DQN](https://d3rlpy.readthedocs.io/en/stable/references/generated/d3rlpy.algos.DQN.html#d3rlpy.algos.DQN) in which we can fix multiple hyper-parameters like : the batch size, learning rate, the interval to synchronize the target network, number of steps, the optimizer, the [network architecture](https://d3rlpy.readthedocs.io/en/stable/references/network_architectures.html) (encoder_factory), [scaler function](https://d3rlpy.readthedocs.io/en/stable/references/preprocessing.html) ( here we can set up a custom scaler to preprocess observations, we have to extend the Scaler class and respect the following structure) [View](https://d3rlpy.readthedocs.io/en/v0.91/_modules/d3rlpy/preprocessing/scalers.html)


Once we have setting up the model, it's time to fit it so we will call the fit_online function which take the following step:
1. [Init a replay buffer in case we didn't specify](https://github.com/takuseno/d3rlpy/blob/master/d3rlpy/algos/base.py). 
2. [Start training](https://github.com/takuseno/d3rlpy/blob/d6e87a2dd042653be8c8689568cb29ce59a92bf6/d3rlpy/online/iterators.py#L99) in which we start by : 
    1. Init loggers
    2. Init algorithm params ( fit scaler ,fit action scaler, setup algo)
    3. verify the shape of the observation space , if 3D we are dealing with images and it will apply the 
    [stacking operation](https://github.com/takuseno/d3rlpy/blob/d6e87a2dd042653be8c8689568cb29ce59a92bf6/d3rlpy/preprocessing/stack.py#L40) ( (1,84,84) to (n,84,84) with n = the number of frames to stack ) 
    4. Setup evaluation scorer
    5. get the first observation, and init rollout return(episode return) to 0
    6. Start Training loop :
       - Case image dataset : we add the observation to the stack
       - 2nd case: the observation to float
       - we perform the exploration/exploitation action using the chosen explorer:
           - we reshape the observation (numpy-ndarray) from (n,84,84) to (1,n,84,84) (numpy ndarray)
           - predict the [action](https://github.com/takuseno/d3rlpy/blob/d6e87a2dd042653be8c8689568cb29ce59a92bf6/d3rlpy/online/explorers.py#L28) , [then](https://github.com/takuseno/d3rlpy/blob/v1.1.1/d3rlpy/algos/base.py), [then transformed to tensor and apply transformer](https://github.com/takuseno/d3rlpy/blob/v1.1.1/d3rlpy/torch_utility.py)
           - step environment ( get next observation, reward) and update the rollout return
           - store observation to buffer
           - sample mini batch 
           - update params
       



For offline training:

1. execute the [fitter function](https://github.com/takuseno/d3rlpy/blob/v1.1.1/d3rlpy/base.py#L428)  
2. using the provided MDPDataset, we gather transitions (episodes) 
3. setting up an iterator ( batch)
4. Init logger
5. Init scalers (transformers)
6. Init algo Implemntation
7. [process observation](https://github.com/takuseno/d3rlpy/blob/1ac85b9955408b5e3e9f67ec6592828d8021885b/d3rlpy/base.py#L728) (case image stacking) 
8. Init eval metrics
9. training loop : 
    - use the iterator to get transitions
    - update params with mini-batch [view](https://github.com/takuseno/d3rlpy/blob/1ac85b9955408b5e3e9f67ec6592828d8021885b/d3rlpy/base.py#L738) using the DQN algo [view](https://github.com/takuseno/d3rlpy/blob/1ac85b9955408b5e3e9f67ec6592828d8021885b/d3rlpy/algos/dqn.py#L129). Here we transform observations to tensors [view](https://github.com/takuseno/d3rlpy/blob/1ac85b9955408b5e3e9f67ec6592828d8021885b/d3rlpy/torch_utility.py#L260)

        

In [None]:
# setup environment
# training env
env = gym.make('CartPole-v0')
# evaluation env
eval_env = gym.make('CartPole-v0')

Initially, we will use Simple DQN algo, and fixing batch size as 32 and synchronize the target network each 100 iterations

In [None]:
# modify weight decay
optim_factory = OptimizerFactory(Adam, weight_decay=1e-4)
# setup algorithm
dqn = DQN(batch_size=32, # number of batches
          learning_rate=2.5e-4, # learning rate
          target_update_interval=100, # interval to synchronize the target network
          n_steps=1, # N-step TD calculation
          optim_factory= optim_factory # optimizer
         )

Setup a buffer that will save experiences

In [None]:
# setup replay buffer
buffer = ReplayBuffer(maxlen=1000000, env=env)

Then, We will setup an explorer to deal with the exploration/exploitation problem.
Here, we can use :
    
    - Epsilon Greedy : the agent starts with 100% of exploration and end with 10%
    
    - Constant Epsilon Greedy : a fixed percentage of exploration
    
    - Normal Noise : Normal Noise Explorer

In [None]:
# setup explorers
explorer = LinearDecayEpsilonGreedy(start_epsilon=1.0,
                                    end_epsilon=0.1,
                                    duration=10000)
#explorer = ConstantEpsilonGreedy(0.3)
#explorer = NormalNoise(mean=0, std=0.1)

Train the simple DQN

In [None]:
dqn.fit_online(
    env,  # environment
    buffer,  # buffer
    explorer=explorer,  # buffer
    eval_env=eval_env,  # eval environment
    # n_epochs=30,
    n_steps_per_epoch=1000,  # the number of steps per epoch.
    update_interval=100,
    eval_epsilon=0.3,
    save_metrics=True,
    tensorboard_dir="runs",
)

2023-01-09 22:36.35 [info     ] Directory is created at d3rlpy_logs/DQN_online_20230109223635
2023-01-09 22:36.35 [debug    ] Building model...
2023-01-09 22:36.35 [debug    ] Model has been built.
2023-01-09 22:36.35 [info     ] Parameters are saved to d3rlpy_logs/DQN_online_20230109223635/params.json params={'action_scaler': None, 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 0.99, 'generated_maxlen': 100000, 'learning_rate': 0.00025, 'n_critics': 1, 'n_frames': 1, 'n_steps': 1, 'optim_factory': {'optim_cls': 'Adam', 'weight_decay': 0.0001}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'real_ratio': 1.0, 'reward_scaler': None, 'scaler': None, 'target_update_interval': 100, 'use_gpu': None, 'algorithm': 'DQN', 'observation_shape': (4,), 'action_size': 2}


  0%|          | 0/1000000 [00:00<?, ?it/s]

2023-01-09 22:36.35 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_1000.pt
2023-01-09 22:36.35 [info     ] DQN_online_20230109223635: epoch=1 step=1000 epoch=1 metrics={'time_inference': 0.00018756651878356934, 'time_environment_step': 1.2215137481689454e-05, 'time_step': 0.00022780752182006837, 'rollout_return': 27.555555555555557, 'time_sample_batch': 5.5742263793945315e-05, 'time_algorithm_update': 0.0013378620147705077, 'loss': 0.45788215696811674, 'evaluation': 10.1} step=1000
2023-01-09 22:36.35 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_2000.pt
2023-01-09 22:36.35 [info     ] DQN_online_20230109223635: epoch=2 step=2000 epoch=2 metrics={'time_inference': 0.00017923951148986816, 'time_environment_step': 1.2298345565795898e-05, 'time_step': 0.00021900510787963866, 'rollout_return': 16.74576271186441, 'time_sample_batch': 6.020069122314453e-05, 'time_algorithm_update': 0.0012368440628051757, 'loss': 0

2023-01-09 22:36.39 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_17000.pt
2023-01-09 22:36.39 [info     ] DQN_online_20230109223635: epoch=17 step=17000 epoch=17 metrics={'time_inference': 0.00018497991561889648, 'time_environment_step': 1.254892349243164e-05, 'time_step': 0.0002299017906188965, 'rollout_return': 10.923076923076923, 'time_sample_batch': 8.761882781982422e-05, 'time_algorithm_update': 0.0016250848770141602, 'loss': 0.06535573452711105, 'evaluation': 11.8} step=17000
2023-01-09 22:36.39 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_18000.pt
2023-01-09 22:36.39 [info     ] DQN_online_20230109223635: epoch=18 step=18000 epoch=18 metrics={'time_inference': 0.00017815589904785156, 'time_environment_step': 1.2163639068603516e-05, 'time_step': 0.00021811413764953613, 'rollout_return': 12.419753086419753, 'time_sample_batch': 6.601810455322265e-05, 'time_algorithm_update': 0.0012514114379882813, 'l

2023-01-09 22:36.43 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_33000.pt
2023-01-09 22:36.43 [info     ] DQN_online_20230109223635: epoch=33 step=33000 epoch=33 metrics={'time_inference': 0.0001876828670501709, 'time_environment_step': 1.2257575988769531e-05, 'time_step': 0.00022875380516052245, 'rollout_return': 27.11111111111111, 'time_sample_batch': 7.593631744384766e-05, 'time_algorithm_update': 0.0013556957244873046, 'loss': 0.20546391755342483, 'evaluation': 55.9} step=33000
2023-01-09 22:36.43 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_34000.pt
2023-01-09 22:36.43 [info     ] DQN_online_20230109223635: epoch=34 step=34000 epoch=34 metrics={'time_inference': 0.00017185258865356446, 'time_environment_step': 1.166248321533203e-05, 'time_step': 0.00020886659622192382, 'rollout_return': 65.86666666666666, 'time_sample_batch': 6.380081176757813e-05, 'time_algorithm_update': 0.0011276721954345704, 'los

2023-01-09 22:36.48 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_49000.pt
2023-01-09 22:36.48 [info     ] DQN_online_20230109223635: epoch=49 step=49000 epoch=49 metrics={'time_inference': 0.0001737215518951416, 'time_environment_step': 1.183319091796875e-05, 'time_step': 0.0002122178077697754, 'rollout_return': 56.88235294117647, 'time_sample_batch': 7.703304290771485e-05, 'time_algorithm_update': 0.0012346267700195312, 'loss': 0.31386263892054556, 'evaluation': 105.3} step=49000
2023-01-09 22:36.48 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_50000.pt
2023-01-09 22:36.48 [info     ] DQN_online_20230109223635: epoch=50 step=50000 epoch=50 metrics={'time_inference': 0.00017364954948425294, 'time_environment_step': 1.1925935745239258e-05, 'time_step': 0.00021228289604187012, 'rollout_return': 126.125, 'time_sample_batch': 8.220672607421875e-05, 'time_algorithm_update': 0.0012454986572265625, 'loss': 0.2290

2023-01-09 22:36.52 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_65000.pt
2023-01-09 22:36.52 [info     ] DQN_online_20230109223635: epoch=65 step=65000 epoch=65 metrics={'time_inference': 0.0001862013339996338, 'time_environment_step': 1.239919662475586e-05, 'time_step': 0.00022769975662231446, 'rollout_return': 57.166666666666664, 'time_sample_batch': 7.557868957519531e-05, 'time_algorithm_update': 0.001403045654296875, 'loss': 0.28561514914035796, 'evaluation': 43.6} step=65000
2023-01-09 22:36.53 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_66000.pt
2023-01-09 22:36.53 [info     ] DQN_online_20230109223635: epoch=66 step=66000 epoch=66 metrics={'time_inference': 0.00019883561134338378, 'time_environment_step': 1.267552375793457e-05, 'time_step': 0.00024305963516235352, 'rollout_return': 59.05882352941177, 'time_sample_batch': 8.673667907714844e-05, 'time_algorithm_update': 0.0015825510025024414, 'loss

2023-01-09 22:36.58 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_81000.pt
2023-01-09 22:36.58 [info     ] DQN_online_20230109223635: epoch=81 step=81000 epoch=81 metrics={'time_inference': 0.00017859315872192382, 'time_environment_step': 1.2067317962646484e-05, 'time_step': 0.0002181365489959717, 'rollout_return': 150.42857142857142, 'time_sample_batch': 8.757114410400391e-05, 'time_algorithm_update': 0.0012540340423583985, 'loss': 0.5376698791980743, 'evaluation': 133.2} step=81000
2023-01-09 22:36.58 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_82000.pt
2023-01-09 22:36.58 [info     ] DQN_online_20230109223635: epoch=82 step=82000 epoch=82 metrics={'time_inference': 0.00018567252159118654, 'time_environment_step': 1.2724161148071289e-05, 'time_step': 0.0002300877571105957, 'rollout_return': 95.54545454545455, 'time_sample_batch': 8.947849273681641e-05, 'time_algorithm_update': 0.0016419649124145507, 'lo

2023-01-09 22:37.04 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_97000.pt
2023-01-09 22:37.04 [info     ] DQN_online_20230109223635: epoch=97 step=97000 epoch=97 metrics={'time_inference': 0.0001948411464691162, 'time_environment_step': 1.2554407119750976e-05, 'time_step': 0.0002380518913269043, 'rollout_return': 181.83333333333334, 'time_sample_batch': 8.902549743652343e-05, 'time_algorithm_update': 0.0015372276306152345, 'loss': 0.5050291359424591, 'evaluation': 162.9} step=97000
2023-01-09 22:37.05 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_98000.pt
2023-01-09 22:37.05 [info     ] DQN_online_20230109223635: epoch=98 step=98000 epoch=98 metrics={'time_inference': 0.0001788954734802246, 'time_environment_step': 1.2130498886108398e-05, 'time_step': 0.0002195112705230713, 'time_sample_batch': 7.245540618896484e-05, 'time_algorithm_update': 0.0014015674591064454, 'loss': 0.3866982266306877, 'rollout_retur

2023-01-09 22:37.11 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_113000.pt
2023-01-09 22:37.11 [info     ] DQN_online_20230109223635: epoch=113 step=113000 epoch=113 metrics={'time_inference': 0.00018094706535339356, 'time_environment_step': 1.1962413787841797e-05, 'time_step': 0.00021935009956359862, 'rollout_return': 165.5, 'time_sample_batch': 7.121562957763672e-05, 'time_algorithm_update': 0.001202559471130371, 'loss': 0.5463820055127144, 'evaluation': 137.7} step=113000
2023-01-09 22:37.11 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_114000.pt
2023-01-09 22:37.11 [info     ] DQN_online_20230109223635: epoch=114 step=114000 epoch=114 metrics={'time_inference': 0.00017196393013000488, 'time_environment_step': 1.1724233627319335e-05, 'time_step': 0.0002099745273590088, 'rollout_return': 147.71428571428572, 'time_sample_batch': 6.90460205078125e-05, 'time_algorithm_update': 0.001220703125, 'loss': 0.3617

2023-01-09 22:37.17 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_129000.pt
2023-01-09 22:37.17 [info     ] DQN_online_20230109223635: epoch=129 step=129000 epoch=129 metrics={'time_inference': 0.00018306422233581543, 'time_environment_step': 1.2095451354980469e-05, 'time_step': 0.00022280240058898925, 'time_sample_batch': 7.19308853149414e-05, 'time_algorithm_update': 0.00131378173828125, 'loss': 0.3340592343360186, 'rollout_return': 193.2, 'evaluation': 139.3} step=129000
2023-01-09 22:37.18 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_130000.pt
2023-01-09 22:37.18 [info     ] DQN_online_20230109223635: epoch=130 step=130000 epoch=130 metrics={'time_inference': 0.0001771364212036133, 'time_environment_step': 1.183176040649414e-05, 'time_step': 0.00021671128273010253, 'time_sample_batch': 7.143020629882813e-05, 'time_algorithm_update': 0.0013532638549804688, 'loss': 0.4177528312429786, 'rollout_return': 1

2023-01-09 22:37.24 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_145000.pt
2023-01-09 22:37.24 [info     ] DQN_online_20230109223635: epoch=145 step=145000 epoch=145 metrics={'time_inference': 0.00019853496551513672, 'time_environment_step': 1.2722015380859375e-05, 'time_step': 0.0002443883419036865, 'rollout_return': 163.5, 'time_sample_batch': 0.00011272430419921875, 'time_algorithm_update': 0.0017381668090820312, 'loss': 0.39159379303455355, 'evaluation': 139.0} step=145000
2023-01-09 22:37.24 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_146000.pt
2023-01-09 22:37.24 [info     ] DQN_online_20230109223635: epoch=146 step=146000 epoch=146 metrics={'time_inference': 0.00017244362831115721, 'time_environment_step': 1.1710166931152344e-05, 'time_step': 0.00021246337890625, 'rollout_return': 153.14285714285714, 'time_sample_batch': 7.231235504150391e-05, 'time_algorithm_update': 0.0014252901077270509, 'loss'

2023-01-09 22:37.30 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_161000.pt
2023-01-09 22:37.30 [info     ] DQN_online_20230109223635: epoch=161 step=161000 epoch=161 metrics={'time_inference': 0.0001735372543334961, 'time_environment_step': 1.151275634765625e-05, 'time_step': 0.00021779227256774902, 'time_sample_batch': 7.200241088867188e-05, 'time_algorithm_update': 0.001877140998840332, 'loss': 0.438085014373064, 'rollout_return': 148.83333333333334, 'evaluation': 179.6} step=161000
2023-01-09 22:37.30 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_162000.pt
2023-01-09 22:37.30 [info     ] DQN_online_20230109223635: epoch=162 step=162000 epoch=162 metrics={'time_inference': 0.00017778563499450684, 'time_environment_step': 1.1743783950805664e-05, 'time_step': 0.00022287964820861817, 'rollout_return': 196.0, 'time_sample_batch': 8.080005645751953e-05, 'time_algorithm_update': 0.0019268989562988281, 'loss': 

2023-01-09 22:37.37 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_177000.pt
2023-01-09 22:37.37 [info     ] DQN_online_20230109223635: epoch=177 step=177000 epoch=177 metrics={'time_inference': 0.0001755087375640869, 'time_environment_step': 1.167917251586914e-05, 'time_step': 0.00022201871871948244, 'time_sample_batch': 7.79867172241211e-05, 'time_algorithm_update': 0.002080059051513672, 'loss': 0.46788078248500825, 'rollout_return': 193.8, 'evaluation': 154.3} step=177000
2023-01-09 22:37.37 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_178000.pt
2023-01-09 22:37.37 [info     ] DQN_online_20230109223635: epoch=178 step=178000 epoch=178 metrics={'time_inference': 0.00019464755058288573, 'time_environment_step': 1.2392044067382812e-05, 'time_step': 0.0002442045211791992, 'time_sample_batch': 9.32455062866211e-05, 'time_algorithm_update': 0.002195858955383301, 'loss': 0.40453337915241716, 'rollout_return': 1

2023-01-09 22:37.44 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_193000.pt
2023-01-09 22:37.44 [info     ] DQN_online_20230109223635: epoch=193 step=193000 epoch=193 metrics={'time_inference': 0.0001827831268310547, 'time_environment_step': 1.2050867080688477e-05, 'time_step': 0.0002308943271636963, 'time_sample_batch': 7.855892181396484e-05, 'time_algorithm_update': 0.0021694183349609377, 'loss': 0.48049931935966017, 'rollout_return': 192.4, 'evaluation': 192.4} step=193000
2023-01-09 22:37.44 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_194000.pt
2023-01-09 22:37.44 [info     ] DQN_online_20230109223635: epoch=194 step=194000 epoch=194 metrics={'time_inference': 0.00019425201416015624, 'time_environment_step': 1.2410879135131836e-05, 'time_step': 0.0002443222999572754, 'rollout_return': 200.0, 'time_sample_batch': 0.0001071929931640625, 'time_algorithm_update': 0.002243494987487793, 'loss': 0.7109367377

2023-01-09 22:37.51 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_209000.pt
2023-01-09 22:37.51 [info     ] DQN_online_20230109223635: epoch=209 step=209000 epoch=209 metrics={'time_inference': 0.00019406485557556152, 'time_environment_step': 1.2467384338378907e-05, 'time_step': 0.000240769624710083, 'rollout_return': 200.0, 'time_sample_batch': 8.578300476074219e-05, 'time_algorithm_update': 0.0019138574600219727, 'loss': 0.4906088388990611, 'evaluation': 177.4} step=209000
2023-01-09 22:37.51 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_210000.pt
2023-01-09 22:37.51 [info     ] DQN_online_20230109223635: epoch=210 step=210000 epoch=210 metrics={'time_inference': 0.00019077467918395997, 'time_environment_step': 1.225900650024414e-05, 'time_step': 0.0002386176586151123, 'rollout_return': 200.0, 'time_sample_batch': 0.00010192394256591797, 'time_algorithm_update': 0.0020517349243164063, 'loss': 0.4006022507

2023-01-09 22:37.58 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_225000.pt
2023-01-09 22:37.58 [info     ] DQN_online_20230109223635: epoch=225 step=225000 epoch=225 metrics={'time_inference': 0.0001810152530670166, 'time_environment_step': 1.1956214904785157e-05, 'time_step': 0.00022752666473388673, 'time_sample_batch': 7.443428039550781e-05, 'time_algorithm_update': 0.002018904685974121, 'loss': 0.596831574011594, 'rollout_return': 191.8, 'evaluation': 181.4} step=225000
2023-01-09 22:37.58 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_226000.pt
2023-01-09 22:37.58 [info     ] DQN_online_20230109223635: epoch=226 step=226000 epoch=226 metrics={'time_inference': 0.00017750167846679687, 'time_environment_step': 1.1658191680908204e-05, 'time_step': 0.0002219703197479248, 'time_sample_batch': 6.914138793945312e-05, 'time_algorithm_update': 0.00188901424407959, 'loss': 0.515447002556175, 'rollout_return': 195

2023-01-09 22:38.05 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_241000.pt
2023-01-09 22:38.05 [info     ] DQN_online_20230109223635: epoch=241 step=241000 epoch=241 metrics={'time_inference': 0.00017648005485534668, 'time_environment_step': 1.182699203491211e-05, 'time_step': 0.0002227213382720947, 'rollout_return': 184.0, 'time_sample_batch': 7.138252258300781e-05, 'time_algorithm_update': 0.0019754648208618166, 'loss': 0.8450881272554398, 'evaluation': 176.9} step=241000
2023-01-09 22:38.05 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_242000.pt
2023-01-09 22:38.05 [info     ] DQN_online_20230109223635: epoch=242 step=242000 epoch=242 metrics={'time_inference': 0.00018433642387390137, 'time_environment_step': 1.2018203735351563e-05, 'time_step': 0.00023157167434692384, 'time_sample_batch': 7.431507110595703e-05, 'time_algorithm_update': 0.0020740747451782225, 'loss': 0.46929092742502687, 'rollout_return

2023-01-09 22:38.12 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_257000.pt
2023-01-09 22:38.12 [info     ] DQN_online_20230109223635: epoch=257 step=257000 epoch=257 metrics={'time_inference': 0.00017391610145568848, 'time_environment_step': 1.1584758758544923e-05, 'time_step': 0.00022023558616638184, 'rollout_return': 187.0, 'time_sample_batch': 7.402896881103516e-05, 'time_algorithm_update': 0.0020780324935913085, 'loss': 0.4369251366704702, 'evaluation': 194.5} step=257000
2023-01-09 22:38.13 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_258000.pt
2023-01-09 22:38.13 [info     ] DQN_online_20230109223635: epoch=258 step=258000 epoch=258 metrics={'time_inference': 0.00018381500244140625, 'time_environment_step': 1.2010574340820312e-05, 'time_step': 0.00023155450820922852, 'rollout_return': 198.8, 'time_sample_batch': 7.60793685913086e-05, 'time_algorithm_update': 0.002129817008972168, 'loss': 0.358263548

2023-01-09 22:38.19 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_273000.pt
2023-01-09 22:38.19 [info     ] DQN_online_20230109223635: epoch=273 step=273000 epoch=273 metrics={'time_inference': 0.00019267940521240233, 'time_environment_step': 1.2428522109985351e-05, 'time_step': 0.00024285221099853516, 'rollout_return': 121.875, 'time_sample_batch': 0.00011038780212402344, 'time_algorithm_update': 0.002235102653503418, 'loss': 0.5138913692906499, 'evaluation': 138.8} step=273000
2023-01-09 22:38.19 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_274000.pt
2023-01-09 22:38.19 [info     ] DQN_online_20230109223635: epoch=274 step=274000 epoch=274 metrics={'time_inference': 0.0001838700771331787, 'time_environment_step': 1.2084245681762696e-05, 'time_step': 0.00023327970504760743, 'rollout_return': 160.66666666666666, 'time_sample_batch': 7.87496566772461e-05, 'time_algorithm_update': 0.002273821830749512, 'loss

2023-01-09 22:38.26 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_289000.pt
2023-01-09 22:38.26 [info     ] DQN_online_20230109223635: epoch=289 step=289000 epoch=289 metrics={'time_inference': 0.00017995834350585936, 'time_environment_step': 1.1889934539794921e-05, 'time_step': 0.00022687554359436036, 'rollout_return': 190.8, 'time_sample_batch': 8.208751678466797e-05, 'time_algorithm_update': 0.0020644426345825194, 'loss': 0.3841887105256319, 'evaluation': 191.8} step=289000
2023-01-09 22:38.27 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_290000.pt
2023-01-09 22:38.27 [info     ] DQN_online_20230109223635: epoch=290 step=290000 epoch=290 metrics={'time_inference': 0.00017874908447265624, 'time_environment_step': 1.1883020401000977e-05, 'time_step': 0.00022522878646850586, 'rollout_return': 158.57142857142858, 'time_sample_batch': 7.11202621459961e-05, 'time_algorithm_update': 0.0020264148712158202, 'loss

2023-01-09 22:38.33 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_305000.pt
2023-01-09 22:38.33 [info     ] DQN_online_20230109223635: epoch=305 step=305000 epoch=305 metrics={'time_inference': 0.00017596125602722167, 'time_environment_step': 1.1794567108154297e-05, 'time_step': 0.0002211744785308838, 'time_sample_batch': 7.004737854003907e-05, 'time_algorithm_update': 0.0019290685653686524, 'loss': 0.4756120373494923, 'rollout_return': 159.33333333333334, 'evaluation': 137.7} step=305000
2023-01-09 22:38.33 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_306000.pt
2023-01-09 22:38.33 [info     ] DQN_online_20230109223635: epoch=306 step=306000 epoch=306 metrics={'time_inference': 0.0001801133155822754, 'time_environment_step': 1.1943101882934571e-05, 'time_step': 0.00022634220123291015, 'time_sample_batch': 7.665157318115234e-05, 'time_algorithm_update': 0.0019866228103637695, 'loss': 0.5358440709300339, 'ro

2023-01-09 22:38.39 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_321000.pt
2023-01-09 22:38.39 [info     ] DQN_online_20230109223635: epoch=321 step=321000 epoch=321 metrics={'time_inference': 0.00019396400451660156, 'time_environment_step': 1.2417078018188477e-05, 'time_step': 0.00024278950691223144, 'rollout_return': 137.28571428571428, 'time_sample_batch': 8.318424224853515e-05, 'time_algorithm_update': 0.0021357059478759764, 'loss': 0.6792888099327683, 'evaluation': 133.8} step=321000
2023-01-09 22:38.39 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_322000.pt
2023-01-09 22:38.39 [info     ] DQN_online_20230109223635: epoch=322 step=322000 epoch=322 metrics={'time_inference': 0.00018259596824645997, 'time_environment_step': 1.2033939361572265e-05, 'time_step': 0.00023016142845153808, 'rollout_return': 156.28571428571428, 'time_sample_batch': 7.445812225341797e-05, 'time_algorithm_update': 0.002090430259

2023-01-09 22:38.45 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_337000.pt
2023-01-09 22:38.45 [info     ] DQN_online_20230109223635: epoch=337 step=337000 epoch=337 metrics={'time_inference': 0.00017354750633239745, 'time_environment_step': 1.1683225631713867e-05, 'time_step': 0.00021898794174194335, 'rollout_return': 113.11111111111111, 'time_sample_batch': 6.873607635498047e-05, 'time_algorithm_update': 0.001974344253540039, 'loss': 0.499597732629627, 'evaluation': 132.0} step=337000
2023-01-09 22:38.46 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_338000.pt
2023-01-09 22:38.46 [info     ] DQN_online_20230109223635: epoch=338 step=338000 epoch=338 metrics={'time_inference': 0.0002004845142364502, 'time_environment_step': 1.261591911315918e-05, 'time_step': 0.0002521779537200928, 'rollout_return': 131.375, 'time_sample_batch': 8.668899536132813e-05, 'time_algorithm_update': 0.002361297607421875, 'loss': 

2023-01-09 22:38.51 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_353000.pt
2023-01-09 22:38.51 [info     ] DQN_online_20230109223635: epoch=353 step=353000 epoch=353 metrics={'time_inference': 0.0001976490020751953, 'time_environment_step': 1.25274658203125e-05, 'time_step': 0.0002477104663848877, 'rollout_return': 147.0, 'time_sample_batch': 8.420944213867187e-05, 'time_algorithm_update': 0.002225971221923828, 'loss': 0.6567504635080695, 'evaluation': 124.1} step=353000
2023-01-09 22:38.52 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_354000.pt
2023-01-09 22:38.52 [info     ] DQN_online_20230109223635: epoch=354 step=354000 epoch=354 metrics={'time_inference': 0.00017860221862792968, 'time_environment_step': 1.192164421081543e-05, 'time_step': 0.00022608256340026856, 'rollout_return': 113.66666666666667, 'time_sample_batch': 7.30752944946289e-05, 'time_algorithm_update': 0.002121114730834961, 'loss': 0.85

2023-01-09 22:38.57 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_369000.pt
2023-01-09 22:38.57 [info     ] DQN_online_20230109223635: epoch=369 step=369000 epoch=369 metrics={'time_inference': 0.0001996793746948242, 'time_environment_step': 1.2667179107666016e-05, 'time_step': 0.0002505950927734375, 'time_sample_batch': 8.442401885986329e-05, 'time_algorithm_update': 0.002277064323425293, 'loss': 0.6029594798572362, 'rollout_return': 124.875, 'evaluation': 104.9} step=369000
2023-01-09 22:38.58 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_370000.pt
2023-01-09 22:38.58 [info     ] DQN_online_20230109223635: epoch=370 step=370000 epoch=370 metrics={'time_inference': 0.00017387795448303223, 'time_environment_step': 1.1576414108276367e-05, 'time_step': 0.00022155666351318358, 'rollout_return': 133.28571428571428, 'time_sample_batch': 7.190704345703126e-05, 'time_algorithm_update': 0.0022102117538452147, 'loss

2023-01-09 22:39.03 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_385000.pt
2023-01-09 22:39.03 [info     ] DQN_online_20230109223635: epoch=385 step=385000 epoch=385 metrics={'time_inference': 0.00018719959259033204, 'time_environment_step': 1.2168169021606445e-05, 'time_step': 0.00023703718185424805, 'time_sample_batch': 7.965564727783204e-05, 'time_algorithm_update': 0.002276706695556641, 'loss': 0.5697945578023791, 'rollout_return': 129.75, 'evaluation': 97.8} step=385000
2023-01-09 22:39.04 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_386000.pt
2023-01-09 22:39.04 [info     ] DQN_online_20230109223635: epoch=386 step=386000 epoch=386 metrics={'time_inference': 0.00018714284896850585, 'time_environment_step': 1.215362548828125e-05, 'time_step': 0.00023848962783813477, 'time_sample_batch': 8.738040924072266e-05, 'time_algorithm_update': 0.00243988037109375, 'loss': 0.460971007309854, 'rollout_return': 1

2023-01-09 22:39.10 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_401000.pt
2023-01-09 22:39.10 [info     ] DQN_online_20230109223635: epoch=401 step=401000 epoch=401 metrics={'time_inference': 0.00017652297019958495, 'time_environment_step': 1.1736869812011719e-05, 'time_step': 0.00022503399848937988, 'rollout_return': 150.28571428571428, 'time_sample_batch': 7.355213165283203e-05, 'time_algorithm_update': 0.002266550064086914, 'loss': 0.4450822897255421, 'evaluation': 131.4} step=401000
2023-01-09 22:39.10 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_402000.pt
2023-01-09 22:39.10 [info     ] DQN_online_20230109223635: epoch=402 step=402000 epoch=402 metrics={'time_inference': 0.0001770470142364502, 'time_environment_step': 1.1696338653564453e-05, 'time_step': 0.000225799560546875, 'rollout_return': 140.0, 'time_sample_batch': 8.928775787353516e-05, 'time_algorithm_update': 0.0022259950637817383, 'loss': 

2023-01-09 22:39.16 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_417000.pt
2023-01-09 22:39.16 [info     ] DQN_online_20230109223635: epoch=417 step=417000 epoch=417 metrics={'time_inference': 0.00017786264419555664, 'time_environment_step': 1.1818170547485352e-05, 'time_step': 0.00022659754753112793, 'rollout_return': 179.5, 'time_sample_batch': 7.65085220336914e-05, 'time_algorithm_update': 0.002268481254577637, 'loss': 0.8008457693271339, 'evaluation': 163.3} step=417000
2023-01-09 22:39.17 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_418000.pt
2023-01-09 22:39.17 [info     ] DQN_online_20230109223635: epoch=418 step=418000 epoch=418 metrics={'time_inference': 0.0001942460536956787, 'time_environment_step': 1.2360095977783203e-05, 'time_step': 0.0002449495792388916, 'time_sample_batch': 8.685588836669922e-05, 'time_algorithm_update': 0.002322745323181152, 'loss': 0.6557366695255041, 'rollout_return': 1

2023-01-09 22:39.23 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_433000.pt
2023-01-09 22:39.23 [info     ] DQN_online_20230109223635: epoch=433 step=433000 epoch=433 metrics={'time_inference': 0.0001873438358306885, 'time_environment_step': 1.1997461318969726e-05, 'time_step': 0.00023570752143859864, 'time_sample_batch': 0.00010230541229248047, 'time_algorithm_update': 0.0021637678146362305, 'loss': 0.7575558179058135, 'rollout_return': 200.0, 'evaluation': 166.8} step=433000
2023-01-09 22:39.24 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_434000.pt
2023-01-09 22:39.24 [info     ] DQN_online_20230109223635: epoch=434 step=434000 epoch=434 metrics={'time_inference': 0.00017546963691711425, 'time_environment_step': 1.1668682098388672e-05, 'time_step': 0.0002213268280029297, 'time_sample_batch': 7.228851318359375e-05, 'time_algorithm_update': 0.002020859718322754, 'loss': 0.7163553791120648, 'rollout_return'

2023-01-09 22:39.30 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_449000.pt
2023-01-09 22:39.30 [info     ] DQN_online_20230109223635: epoch=449 step=449000 epoch=449 metrics={'time_inference': 0.00017054224014282228, 'time_environment_step': 1.1362552642822266e-05, 'time_step': 0.00021529245376586913, 'time_sample_batch': 7.097721099853515e-05, 'time_algorithm_update': 0.0019713878631591798, 'loss': 0.5944237641990184, 'rollout_return': 194.4, 'evaluation': 182.9} step=449000
2023-01-09 22:39.30 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_450000.pt
2023-01-09 22:39.30 [info     ] DQN_online_20230109223635: epoch=450 step=450000 epoch=450 metrics={'time_inference': 0.0001730942726135254, 'time_environment_step': 1.1571407318115234e-05, 'time_step': 0.0002185811996459961, 'rollout_return': 189.6, 'time_sample_batch': 6.434917449951171e-05, 'time_algorithm_update': 0.002014422416687012, 'loss': 0.3757621187

2023-01-09 22:39.37 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_465000.pt
2023-01-09 22:39.37 [info     ] DQN_online_20230109223635: epoch=465 step=465000 epoch=465 metrics={'time_inference': 0.0001830294132232666, 'time_environment_step': 1.2040376663208007e-05, 'time_step': 0.0002318885326385498, 'rollout_return': 177.4, 'time_sample_batch': 7.66754150390625e-05, 'time_algorithm_update': 0.0022266387939453127, 'loss': 0.560998831037432, 'evaluation': 160.1} step=465000
2023-01-09 22:39.37 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_466000.pt
2023-01-09 22:39.37 [info     ] DQN_online_20230109223635: epoch=466 step=466000 epoch=466 metrics={'time_inference': 0.00017832589149475097, 'time_environment_step': 1.1874675750732421e-05, 'time_step': 0.0002259194850921631, 'rollout_return': 179.83333333333334, 'time_sample_batch': 7.648468017578125e-05, 'time_algorithm_update': 0.0021384000778198243, 'loss': 0

2023-01-09 22:39.44 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_481000.pt
2023-01-09 22:39.44 [info     ] DQN_online_20230109223635: epoch=481 step=481000 epoch=481 metrics={'time_inference': 0.0001789722442626953, 'time_environment_step': 1.1639118194580078e-05, 'time_step': 0.00022936439514160156, 'time_sample_batch': 6.995201110839843e-05, 'time_algorithm_update': 0.002477264404296875, 'loss': 0.8770531866699457, 'rollout_return': 165.66666666666666, 'evaluation': 105.5} step=481000
2023-01-09 22:39.45 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_482000.pt
2023-01-09 22:39.45 [info     ] DQN_online_20230109223635: epoch=482 step=482000 epoch=482 metrics={'time_inference': 0.00017382431030273438, 'time_environment_step': 1.1404037475585938e-05, 'time_step': 0.0002235255241394043, 'rollout_return': 148.66666666666666, 'time_sample_batch': 6.835460662841797e-05, 'time_algorithm_update': 0.002448463439941

2023-01-09 22:39.51 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_497000.pt
2023-01-09 22:39.51 [info     ] DQN_online_20230109223635: epoch=497 step=497000 epoch=497 metrics={'time_inference': 0.00019478821754455566, 'time_environment_step': 1.2853145599365234e-05, 'time_step': 0.00025484323501586916, 'rollout_return': 156.0, 'time_sample_batch': 9.987354278564453e-05, 'time_algorithm_update': 0.0031476736068725584, 'loss': 1.0415899747982622, 'evaluation': 154.1} step=497000
2023-01-09 22:39.51 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_498000.pt
2023-01-09 22:39.51 [info     ] DQN_online_20230109223635: epoch=498 step=498000 epoch=498 metrics={'time_inference': 0.00017383313179016114, 'time_environment_step': 1.143050193786621e-05, 'time_step': 0.00022320246696472167, 'time_sample_batch': 6.706714630126953e-05, 'time_algorithm_update': 0.0024310827255249025, 'loss': 0.4472131594084203, 'rollout_return

2023-01-09 22:39.58 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_513000.pt
2023-01-09 22:39.58 [info     ] DQN_online_20230109223635: epoch=513 step=513000 epoch=513 metrics={'time_inference': 0.00018831968307495117, 'time_environment_step': 1.2238264083862305e-05, 'time_step': 0.00024303436279296875, 'rollout_return': 146.57142857142858, 'time_sample_batch': 7.619857788085938e-05, 'time_algorithm_update': 0.00277254581451416, 'loss': 0.7289510399103165, 'evaluation': 133.6} step=513000
2023-01-09 22:39.58 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_514000.pt
2023-01-09 22:39.58 [info     ] DQN_online_20230109223635: epoch=514 step=514000 epoch=514 metrics={'time_inference': 0.0001808617115020752, 'time_environment_step': 1.169276237487793e-05, 'time_step': 0.0002332460880279541, 'rollout_return': 164.33333333333334, 'time_sample_batch': 7.581710815429688e-05, 'time_algorithm_update': 0.00262446403503417

2023-01-09 22:40.04 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_529000.pt
2023-01-09 22:40.04 [info     ] DQN_online_20230109223635: epoch=529 step=529000 epoch=529 metrics={'time_inference': 0.0001749703884124756, 'time_environment_step': 1.1432647705078125e-05, 'time_step': 0.00022421550750732422, 'time_sample_batch': 6.940364837646485e-05, 'time_algorithm_update': 0.0024098634719848635, 'loss': 0.33304900452494623, 'rollout_return': 159.16666666666666, 'evaluation': 127.8} step=529000
2023-01-09 22:40.04 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_530000.pt
2023-01-09 22:40.04 [info     ] DQN_online_20230109223635: epoch=530 step=530000 epoch=530 metrics={'time_inference': 0.00017456603050231933, 'time_environment_step': 1.1290788650512695e-05, 'time_step': 0.00022284507751464843, 'rollout_return': 141.14285714285714, 'time_sample_batch': 6.475448608398438e-05, 'time_algorithm_update': 0.002340865135

2023-01-09 22:40.10 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_545000.pt
2023-01-09 22:40.10 [info     ] DQN_online_20230109223635: epoch=545 step=545000 epoch=545 metrics={'time_inference': 0.000176605224609375, 'time_environment_step': 1.1433124542236328e-05, 'time_step': 0.00022633147239685059, 'time_sample_batch': 6.520748138427734e-05, 'time_algorithm_update': 0.0024698734283447265, 'loss': 0.6372325958684086, 'rollout_return': 154.0, 'evaluation': 137.6} step=545000
2023-01-09 22:40.10 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_546000.pt
2023-01-09 22:40.10 [info     ] DQN_online_20230109223635: epoch=546 step=546000 epoch=546 metrics={'time_inference': 0.0001748490333557129, 'time_environment_step': 1.1310338973999023e-05, 'time_step': 0.00022420620918273926, 'rollout_return': 139.75, 'time_sample_batch': 6.971359252929687e-05, 'time_algorithm_update': 0.002441716194152832, 'loss': 0.3625092747

2023-01-09 22:40.17 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_561000.pt
2023-01-09 22:40.17 [info     ] DQN_online_20230109223635: epoch=561 step=561000 epoch=561 metrics={'time_inference': 0.00019194412231445313, 'time_environment_step': 1.2348175048828125e-05, 'time_step': 0.00024588489532470704, 'time_sample_batch': 7.109642028808593e-05, 'time_algorithm_update': 0.002613258361816406, 'loss': 0.8716593526303769, 'rollout_return': 158.16666666666666, 'evaluation': 162.0} step=561000
2023-01-09 22:40.17 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_562000.pt
2023-01-09 22:40.17 [info     ] DQN_online_20230109223635: epoch=562 step=562000 epoch=562 metrics={'time_inference': 0.00019649767875671388, 'time_environment_step': 1.2669801712036132e-05, 'time_step': 0.0002505078315734863, 'time_sample_batch': 7.123947143554688e-05, 'time_algorithm_update': 0.0026254653930664062, 'loss': 0.6431202328763902, 'ro

2023-01-09 22:40.24 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_577000.pt
2023-01-09 22:40.24 [info     ] DQN_online_20230109223635: epoch=577 step=577000 epoch=577 metrics={'time_inference': 0.00019455385208129884, 'time_environment_step': 1.2657880783081055e-05, 'time_step': 0.000250485897064209, 'time_sample_batch': 7.49349594116211e-05, 'time_algorithm_update': 0.0028047561645507812, 'loss': 1.1519321285188198, 'rollout_return': 159.66666666666666, 'evaluation': 163.4} step=577000
2023-01-09 22:40.24 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_578000.pt
2023-01-09 22:40.24 [info     ] DQN_online_20230109223635: epoch=578 step=578000 epoch=578 metrics={'time_inference': 0.0001931154727935791, 'time_environment_step': 1.2612342834472656e-05, 'time_step': 0.00024867963790893555, 'rollout_return': 168.0, 'time_sample_batch': 7.774829864501954e-05, 'time_algorithm_update': 0.002780652046203613, 'loss': 0

2023-01-09 22:40.31 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_593000.pt
2023-01-09 22:40.31 [info     ] DQN_online_20230109223635: epoch=593 step=593000 epoch=593 metrics={'time_inference': 0.00017698073387145997, 'time_environment_step': 1.1433839797973633e-05, 'time_step': 0.0002281980514526367, 'time_sample_batch': 6.670951843261719e-05, 'time_algorithm_update': 0.0026154518127441406, 'loss': 0.44817529981955884, 'rollout_return': 168.8, 'evaluation': 138.0} step=593000
2023-01-09 22:40.32 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_594000.pt
2023-01-09 22:40.32 [info     ] DQN_online_20230109223635: epoch=594 step=594000 epoch=594 metrics={'time_inference': 0.0001751866340637207, 'time_environment_step': 1.1345386505126953e-05, 'time_step': 0.00022588372230529784, 'rollout_return': 157.14285714285714, 'time_sample_batch': 6.465911865234375e-05, 'time_algorithm_update': 0.0025835752487182615, 'loss

2023-01-09 22:40.38 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_609000.pt
2023-01-09 22:40.38 [info     ] DQN_online_20230109223635: epoch=609 step=609000 epoch=609 metrics={'time_inference': 0.00018131709098815919, 'time_environment_step': 1.1666774749755859e-05, 'time_step': 0.00023360681533813476, 'time_sample_batch': 7.164478302001953e-05, 'time_algorithm_update': 0.0026723623275756838, 'loss': 0.7493683557957411, 'rollout_return': 188.8, 'evaluation': 163.0} step=609000
2023-01-09 22:40.38 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_610000.pt
2023-01-09 22:40.38 [info     ] DQN_online_20230109223635: epoch=610 step=610000 epoch=610 metrics={'time_inference': 0.00018180108070373535, 'time_environment_step': 1.1668443679809571e-05, 'time_step': 0.0002338283061981201, 'rollout_return': 168.5, 'time_sample_batch': 6.949901580810547e-05, 'time_algorithm_update': 0.002643442153930664, 'loss': 0.138936155

2023-01-09 22:40.45 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_625000.pt
2023-01-09 22:40.45 [info     ] DQN_online_20230109223635: epoch=625 step=625000 epoch=625 metrics={'time_inference': 0.0001943526268005371, 'time_environment_step': 1.270270347595215e-05, 'time_step': 0.0002570874691009522, 'rollout_return': 182.0, 'time_sample_batch': 0.00012714862823486327, 'time_algorithm_update': 0.003424263000488281, 'loss': 0.7689243850298226, 'evaluation': 162.0} step=625000
2023-01-09 22:40.46 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_626000.pt
2023-01-09 22:40.46 [info     ] DQN_online_20230109223635: epoch=626 step=626000 epoch=626 metrics={'time_inference': 0.00017652106285095216, 'time_environment_step': 1.1377811431884766e-05, 'time_step': 0.00022661805152893066, 'time_sample_batch': 6.968975067138672e-05, 'time_algorithm_update': 0.0025105953216552736, 'loss': 0.6896963658742606, 'rollout_return':

2023-01-09 22:40.52 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_641000.pt
2023-01-09 22:40.52 [info     ] DQN_online_20230109223635: epoch=641 step=641000 epoch=641 metrics={'time_inference': 0.00019365859031677247, 'time_environment_step': 1.2415170669555664e-05, 'time_step': 0.0002491765022277832, 'rollout_return': 169.83333333333334, 'time_sample_batch': 8.63790512084961e-05, 'time_algorithm_update': 0.0028122425079345702, 'loss': 0.43798866122961044, 'evaluation': 124.9} step=641000
2023-01-09 22:40.52 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_642000.pt
2023-01-09 22:40.52 [info     ] DQN_online_20230109223635: epoch=642 step=642000 epoch=642 metrics={'time_inference': 0.00017835140228271483, 'time_environment_step': 1.1480331420898438e-05, 'time_step': 0.0002276031970977783, 'rollout_return': 144.28571428571428, 'time_sample_batch': 6.725788116455079e-05, 'time_algorithm_update': 0.00239815711975

2023-01-09 22:40.59 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_657000.pt
2023-01-09 22:40.59 [info     ] DQN_online_20230109223635: epoch=657 step=657000 epoch=657 metrics={'time_inference': 0.00018904399871826172, 'time_environment_step': 1.2083768844604492e-05, 'time_step': 0.00024378633499145508, 'time_sample_batch': 7.95602798461914e-05, 'time_algorithm_update': 0.0028041601181030273, 'loss': 0.2571576084941626, 'rollout_return': 150.16666666666666, 'evaluation': 148.5} step=657000
2023-01-09 22:40.59 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_658000.pt
2023-01-09 22:40.59 [info     ] DQN_online_20230109223635: epoch=658 step=658000 epoch=658 metrics={'time_inference': 0.0001863551139831543, 'time_environment_step': 1.1984586715698243e-05, 'time_step': 0.00023990631103515625, 'rollout_return': 181.5, 'time_sample_batch': 7.328987121582031e-05, 'time_algorithm_update': 0.002713465690612793, 'loss':

2023-01-09 22:41.06 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_673000.pt
2023-01-09 22:41.06 [info     ] DQN_online_20230109223635: epoch=673 step=673000 epoch=673 metrics={'time_inference': 0.00018697524070739746, 'time_environment_step': 1.2007236480712891e-05, 'time_step': 0.0002443642616271973, 'time_sample_batch': 8.747577667236328e-05, 'time_algorithm_update': 0.003079724311828613, 'loss': 1.5622222907841206, 'rollout_return': 160.33333333333334, 'evaluation': 142.3} step=673000
2023-01-09 22:41.06 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_674000.pt
2023-01-09 22:41.06 [info     ] DQN_online_20230109223635: epoch=674 step=674000 epoch=674 metrics={'time_inference': 0.00017663836479187012, 'time_environment_step': 1.138448715209961e-05, 'time_step': 0.00022732925415039064, 'time_sample_batch': 6.630420684814454e-05, 'time_algorithm_update': 0.002568364143371582, 'loss': 0.9291385794058442, 'roll

2023-01-09 22:41.13 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_689000.pt
2023-01-09 22:41.13 [info     ] DQN_online_20230109223635: epoch=689 step=689000 epoch=689 metrics={'time_inference': 0.00019180941581726073, 'time_environment_step': 1.2329816818237306e-05, 'time_step': 0.00025021743774414064, 'rollout_return': 164.16666666666666, 'time_sample_batch': 8.60452651977539e-05, 'time_algorithm_update': 0.003106975555419922, 'loss': 0.66676992084831, 'evaluation': 176.1} step=689000
2023-01-09 22:41.13 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_690000.pt
2023-01-09 22:41.13 [info     ] DQN_online_20230109223635: epoch=690 step=690000 epoch=690 metrics={'time_inference': 0.00018972206115722655, 'time_environment_step': 1.2095212936401368e-05, 'time_step': 0.0002467737197875977, 'time_sample_batch': 8.130073547363281e-05, 'time_algorithm_update': 0.0030276298522949217, 'loss': 0.1973099572584033, 'rollo

2023-01-09 22:41.20 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_705000.pt
2023-01-09 22:41.20 [info     ] DQN_online_20230109223635: epoch=705 step=705000 epoch=705 metrics={'time_inference': 0.00017954111099243163, 'time_environment_step': 1.1394262313842773e-05, 'time_step': 0.00023260664939880372, 'rollout_return': 157.0, 'time_sample_batch': 6.685256958007812e-05, 'time_algorithm_update': 0.0027987003326416016, 'loss': 0.30716795213520526, 'evaluation': 158.9} step=705000
2023-01-09 22:41.20 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_706000.pt
2023-01-09 22:41.20 [info     ] DQN_online_20230109223635: epoch=706 step=706000 epoch=706 metrics={'time_inference': 0.00018236422538757323, 'time_environment_step': 1.16729736328125e-05, 'time_step': 0.00023668456077575683, 'time_sample_batch': 7.154941558837891e-05, 'time_algorithm_update': 0.002864217758178711, 'loss': 0.5099402874708175, 'rollout_return'

2023-01-09 22:41.27 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_721000.pt
2023-01-09 22:41.27 [info     ] DQN_online_20230109223635: epoch=721 step=721000 epoch=721 metrics={'time_inference': 0.00017926549911499024, 'time_environment_step': 1.1383533477783204e-05, 'time_step': 0.0002318708896636963, 'time_sample_batch': 6.513595581054687e-05, 'time_algorithm_update': 0.0027683019638061524, 'loss': 0.7580431945621967, 'rollout_return': 174.83333333333334, 'evaluation': 159.3} step=721000
2023-01-09 22:41.27 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_722000.pt
2023-01-09 22:41.27 [info     ] DQN_online_20230109223635: epoch=722 step=722000 epoch=722 metrics={'time_inference': 0.00018403840065002442, 'time_environment_step': 1.1765241622924805e-05, 'time_step': 0.00024099969863891602, 'time_sample_batch': 7.386207580566407e-05, 'time_algorithm_update': 0.0031041622161865233, 'loss': 0.26661414708942177, '

2023-01-09 22:41.34 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_737000.pt
2023-01-09 22:41.34 [info     ] DQN_online_20230109223635: epoch=737 step=737000 epoch=737 metrics={'time_inference': 0.00020137906074523925, 'time_environment_step': 1.2801170349121094e-05, 'time_step': 0.00026305103302001954, 'time_sample_batch': 8.704662322998047e-05, 'time_algorithm_update': 0.003339815139770508, 'loss': 0.6222398219630122, 'rollout_return': 181.8, 'evaluation': 135.3} step=737000
2023-01-09 22:41.35 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_738000.pt
2023-01-09 22:41.35 [info     ] DQN_online_20230109223635: epoch=738 step=738000 epoch=738 metrics={'time_inference': 0.0001871979236602783, 'time_environment_step': 1.1694669723510743e-05, 'time_step': 0.00024211788177490233, 'rollout_return': 175.66666666666666, 'time_sample_batch': 6.895065307617187e-05, 'time_algorithm_update': 0.0029146671295166016, 'loss'

2023-01-09 22:41.41 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_753000.pt
2023-01-09 22:41.41 [info     ] DQN_online_20230109223635: epoch=753 step=753000 epoch=753 metrics={'time_inference': 0.00017841720581054688, 'time_environment_step': 1.1351585388183594e-05, 'time_step': 0.0002319653034210205, 'time_sample_batch': 6.513595581054687e-05, 'time_algorithm_update': 0.0028664588928222655, 'loss': 0.9100377211347223, 'rollout_return': 169.0, 'evaluation': 131.8} step=753000
2023-01-09 22:41.41 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_754000.pt
2023-01-09 22:41.41 [info     ] DQN_online_20230109223635: epoch=754 step=754000 epoch=754 metrics={'time_inference': 0.0001789093017578125, 'time_environment_step': 1.1362075805664062e-05, 'time_step': 0.00023253417015075683, 'time_sample_batch': 6.508827209472656e-05, 'time_algorithm_update': 0.002863478660583496, 'loss': 0.34568152111023664, 'rollout_return'

2023-01-09 22:41.48 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_769000.pt
2023-01-09 22:41.48 [info     ] DQN_online_20230109223635: epoch=769 step=769000 epoch=769 metrics={'time_inference': 0.00018253135681152343, 'time_environment_step': 1.1559486389160157e-05, 'time_step': 0.00023818302154541014, 'rollout_return': 176.83333333333334, 'time_sample_batch': 6.866455078125e-05, 'time_algorithm_update': 0.0030146360397338865, 'loss': 0.4580315725877881, 'evaluation': 137.0} step=769000
2023-01-09 22:41.48 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_770000.pt
2023-01-09 22:41.48 [info     ] DQN_online_20230109223635: epoch=770 step=770000 epoch=770 metrics={'time_inference': 0.00018099784851074218, 'time_environment_step': 1.1471271514892577e-05, 'time_step': 0.00023573231697082518, 'rollout_return': 173.5, 'time_sample_batch': 6.780624389648437e-05, 'time_algorithm_update': 0.0029486656188964845, 'loss':

2023-01-09 22:41.55 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_785000.pt
2023-01-09 22:41.55 [info     ] DQN_online_20230109223635: epoch=785 step=785000 epoch=785 metrics={'time_inference': 0.00017719364166259764, 'time_environment_step': 1.1316537857055665e-05, 'time_step': 0.00023200511932373047, 'time_sample_batch': 6.427764892578126e-05, 'time_algorithm_update': 0.0029938459396362306, 'loss': 0.4538849055767059, 'rollout_return': 156.0, 'evaluation': 116.6} step=785000
2023-01-09 22:41.55 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_786000.pt
2023-01-09 22:41.55 [info     ] DQN_online_20230109223635: epoch=786 step=786000 epoch=786 metrics={'time_inference': 0.00018040060997009277, 'time_environment_step': 1.1389255523681641e-05, 'time_step': 0.00023551535606384277, 'rollout_return': 147.57142857142858, 'time_sample_batch': 6.601810455322265e-05, 'time_algorithm_update': 0.003003978729248047, 'loss

2023-01-09 22:42.01 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_801000.pt
2023-01-09 22:42.01 [info     ] DQN_online_20230109223635: epoch=801 step=801000 epoch=801 metrics={'time_inference': 0.00017964577674865723, 'time_environment_step': 1.1446952819824219e-05, 'time_step': 0.00023351383209228516, 'time_sample_batch': 7.085800170898437e-05, 'time_algorithm_update': 0.0028720855712890624, 'loss': 0.5404747399501503, 'rollout_return': 152.33333333333334, 'evaluation': 129.4} step=801000
2023-01-09 22:42.02 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_802000.pt
2023-01-09 22:42.02 [info     ] DQN_online_20230109223635: epoch=802 step=802000 epoch=802 metrics={'time_inference': 0.0001789073944091797, 'time_environment_step': 1.136016845703125e-05, 'time_step': 0.00023196077346801758, 'rollout_return': 149.57142857142858, 'time_sample_batch': 6.73055648803711e-05, 'time_algorithm_update': 0.002806091308593

2023-01-09 22:42.08 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_817000.pt
2023-01-09 22:42.08 [info     ] DQN_online_20230109223635: epoch=817 step=817000 epoch=817 metrics={'time_inference': 0.0001799919605255127, 'time_environment_step': 1.1462211608886719e-05, 'time_step': 0.00023400712013244628, 'rollout_return': 171.0, 'time_sample_batch': 6.949901580810547e-05, 'time_algorithm_update': 0.0028850793838500976, 'loss': 0.7564242726191879, 'evaluation': 133.1} step=817000
2023-01-09 22:42.08 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_818000.pt
2023-01-09 22:42.08 [info     ] DQN_online_20230109223635: epoch=818 step=818000 epoch=818 metrics={'time_inference': 0.00017996311187744142, 'time_environment_step': 1.1536121368408204e-05, 'time_step': 0.00023397397994995116, 'rollout_return': 178.33333333333334, 'time_sample_batch': 7.288455963134765e-05, 'time_algorithm_update': 0.0028717041015625, 'loss': 

2023-01-09 22:42.15 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_833000.pt
2023-01-09 22:42.15 [info     ] DQN_online_20230109223635: epoch=833 step=833000 epoch=833 metrics={'time_inference': 0.0001894547939300537, 'time_environment_step': 1.1737585067749024e-05, 'time_step': 0.0002459642887115478, 'time_sample_batch': 7.421970367431641e-05, 'time_algorithm_update': 0.0030597448348999023, 'loss': 0.4721535909920931, 'rollout_return': 186.6, 'evaluation': 162.3} step=833000
2023-01-09 22:42.15 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_834000.pt
2023-01-09 22:42.15 [info     ] DQN_online_20230109223635: epoch=834 step=834000 epoch=834 metrics={'time_inference': 0.00018460464477539063, 'time_environment_step': 1.1571884155273438e-05, 'time_step': 0.00024030661582946777, 'rollout_return': 176.66666666666666, 'time_sample_batch': 7.22646713256836e-05, 'time_algorithm_update': 0.0030164480209350585, 'loss':

2023-01-09 22:42.22 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_849000.pt
2023-01-09 22:42.22 [info     ] DQN_online_20230109223635: epoch=849 step=849000 epoch=849 metrics={'time_inference': 0.00024326848983764648, 'time_environment_step': 1.5230894088745118e-05, 'time_step': 0.00044741034507751466, 'rollout_return': 167.5, 'time_sample_batch': 9.250640869140625e-05, 'time_algorithm_update': 0.017037343978881837, 'loss': 0.6158871959429234, 'evaluation': 165.0} step=849000
2023-01-09 22:42.22 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_850000.pt
2023-01-09 22:42.22 [info     ] DQN_online_20230109223635: epoch=850 step=850000 epoch=850 metrics={'time_inference': 0.00018378686904907225, 'time_environment_step': 1.1568784713745118e-05, 'time_step': 0.0002397603988647461, 'rollout_return': 170.83333333333334, 'time_sample_batch': 7.76529312133789e-05, 'time_algorithm_update': 0.0030456304550170897, 'loss':

2023-01-09 22:42.29 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_865000.pt
2023-01-09 22:42.29 [info     ] DQN_online_20230109223635: epoch=865 step=865000 epoch=865 metrics={'time_inference': 0.0001881260871887207, 'time_environment_step': 1.1936187744140625e-05, 'time_step': 0.0002456364631652832, 'time_sample_batch': 9.160041809082031e-05, 'time_algorithm_update': 0.0030938148498535155, 'loss': 0.21990213599056005, 'rollout_return': 156.66666666666666, 'evaluation': 147.2} step=865000
2023-01-09 22:42.30 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_866000.pt
2023-01-09 22:42.30 [info     ] DQN_online_20230109223635: epoch=866 step=866000 epoch=866 metrics={'time_inference': 0.0001925637722015381, 'time_environment_step': 1.2033462524414063e-05, 'time_step': 0.0002511520385742187, 'rollout_return': 170.5, 'time_sample_batch': 7.977485656738282e-05, 'time_algorithm_update': 0.0031919956207275392, 'loss':

2023-01-09 22:42.36 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_881000.pt
2023-01-09 22:42.36 [info     ] DQN_online_20230109223635: epoch=881 step=881000 epoch=881 metrics={'time_inference': 0.00019167351722717285, 'time_environment_step': 1.2044191360473634e-05, 'time_step': 0.0002498276233673096, 'rollout_return': 154.85714285714286, 'time_sample_batch': 7.643699645996094e-05, 'time_algorithm_update': 0.003164815902709961, 'loss': 0.33115827608853576, 'evaluation': 122.9} step=881000
2023-01-09 22:42.37 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_882000.pt
2023-01-09 22:42.37 [info     ] DQN_online_20230109223635: epoch=882 step=882000 epoch=882 metrics={'time_inference': 0.00018989372253417968, 'time_environment_step': 1.2056112289428711e-05, 'time_step': 0.0002490775585174561, 'rollout_return': 158.5, 'time_sample_batch': 0.0001207113265991211, 'time_algorithm_update': 0.0032251596450805662, 'loss'

2023-01-09 22:42.43 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_897000.pt
2023-01-09 22:42.43 [info     ] DQN_online_20230109223635: epoch=897 step=897000 epoch=897 metrics={'time_inference': 0.00019191908836364746, 'time_environment_step': 1.218271255493164e-05, 'time_step': 0.00024844765663146974, 'time_sample_batch': 7.457733154296875e-05, 'time_algorithm_update': 0.0029769182205200196, 'loss': 0.11884689573198556, 'rollout_return': 176.0, 'evaluation': 171.4} step=897000
2023-01-09 22:42.44 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_898000.pt
2023-01-09 22:42.44 [info     ] DQN_online_20230109223635: epoch=898 step=898000 epoch=898 metrics={'time_inference': 0.00019429922103881837, 'time_environment_step': 1.227569580078125e-05, 'time_step': 0.00025213551521301267, 'rollout_return': 177.66666666666666, 'time_sample_batch': 7.865428924560546e-05, 'time_algorithm_update': 0.003084874153137207, 'loss'

2023-01-09 22:42.51 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_913000.pt
2023-01-09 22:42.51 [info     ] DQN_online_20230109223635: epoch=913 step=913000 epoch=913 metrics={'time_inference': 0.0001886465549468994, 'time_environment_step': 1.1989116668701172e-05, 'time_step': 0.00024613094329833983, 'time_sample_batch': 7.431507110595703e-05, 'time_algorithm_update': 0.0031076669692993164, 'loss': 0.38588133454322815, 'rollout_return': 175.8, 'evaluation': 174.8} step=913000
2023-01-09 22:42.51 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_914000.pt
2023-01-09 22:42.51 [info     ] DQN_online_20230109223635: epoch=914 step=914000 epoch=914 metrics={'time_inference': 0.00019161033630371094, 'time_environment_step': 1.2107610702514649e-05, 'time_step': 0.0002492797374725342, 'rollout_return': 180.66666666666666, 'time_sample_batch': 7.431507110595703e-05, 'time_algorithm_update': 0.003104424476623535, 'loss'

2023-01-09 22:42.58 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_929000.pt
2023-01-09 22:42.58 [info     ] DQN_online_20230109223635: epoch=929 step=929000 epoch=929 metrics={'time_inference': 0.00018852043151855468, 'time_environment_step': 1.2020349502563477e-05, 'time_step': 0.0002460746765136719, 'rollout_return': 191.83333333333334, 'time_sample_batch': 7.500648498535157e-05, 'time_algorithm_update': 0.0031226396560668944, 'loss': 0.4890184126794338, 'evaluation': 141.6} step=929000
2023-01-09 22:42.58 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_930000.pt
2023-01-09 22:42.58 [info     ] DQN_online_20230109223635: epoch=930 step=930000 epoch=930 metrics={'time_inference': 0.0001873586177825928, 'time_environment_step': 1.1795282363891601e-05, 'time_step': 0.00024559545516967776, 'time_sample_batch': 9.04083251953125e-05, 'time_algorithm_update': 0.003199458122253418, 'loss': 0.5710880764760077, 'roll

2023-01-09 22:43.05 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_945000.pt
2023-01-09 22:43.05 [info     ] DQN_online_20230109223635: epoch=945 step=945000 epoch=945 metrics={'time_inference': 0.00019880890846252442, 'time_environment_step': 1.2511253356933594e-05, 'time_step': 0.0002622816562652588, 'time_sample_batch': 0.00010004043579101563, 'time_algorithm_update': 0.003546285629272461, 'loss': 0.5257206964306533, 'rollout_return': 178.2, 'evaluation': 149.2} step=945000
2023-01-09 22:43.06 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_946000.pt
2023-01-09 22:43.06 [info     ] DQN_online_20230109223635: epoch=946 step=946000 epoch=946 metrics={'time_inference': 0.00021265125274658203, 'time_environment_step': 1.3644218444824219e-05, 'time_step': 0.0002785642147064209, 'rollout_return': 165.57142857142858, 'time_sample_batch': 0.00010597705841064453, 'time_algorithm_update': 0.003551340103149414, 'loss'

2023-01-09 22:43.13 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_961000.pt
2023-01-09 22:43.13 [info     ] DQN_online_20230109223635: epoch=961 step=961000 epoch=961 metrics={'time_inference': 0.00020440101623535157, 'time_environment_step': 1.250910758972168e-05, 'time_step': 0.0002656097412109375, 'rollout_return': 177.16666666666666, 'time_sample_batch': 9.737014770507812e-05, 'time_algorithm_update': 0.003342151641845703, 'loss': 0.4362704297527671, 'evaluation': 178.5} step=961000
2023-01-09 22:43.13 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_962000.pt
2023-01-09 22:43.13 [info     ] DQN_online_20230109223635: epoch=962 step=962000 epoch=962 metrics={'time_inference': 0.00018666315078735352, 'time_environment_step': 1.1761903762817382e-05, 'time_step': 0.0002429778575897217, 'rollout_return': 178.0, 'time_sample_batch': 7.586479187011718e-05, 'time_algorithm_update': 0.0030449628829956055, 'loss': 

2023-01-09 22:43.20 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_977000.pt
2023-01-09 22:43.20 [info     ] DQN_online_20230109223635: epoch=977 step=977000 epoch=977 metrics={'time_inference': 0.00019909214973449707, 'time_environment_step': 1.2601613998413085e-05, 'time_step': 0.00025957322120666505, 'rollout_return': 184.16666666666666, 'time_sample_batch': 7.963180541992188e-05, 'time_algorithm_update': 0.0032491445541381835, 'loss': 0.09290997460484504, 'evaluation': 171.9} step=977000
2023-01-09 22:43.21 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_978000.pt
2023-01-09 22:43.21 [info     ] DQN_online_20230109223635: epoch=978 step=978000 epoch=978 metrics={'time_inference': 0.00019678974151611327, 'time_environment_step': 1.2453556060791016e-05, 'time_step': 0.0002572653293609619, 'time_sample_batch': 7.953643798828126e-05, 'time_algorithm_update': 0.003311443328857422, 'loss': 0.35147553142160176, '

2023-01-09 22:43.28 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_993000.pt
2023-01-09 22:43.28 [info     ] DQN_online_20230109223635: epoch=993 step=993000 epoch=993 metrics={'time_inference': 0.0001863868236541748, 'time_environment_step': 1.1698722839355468e-05, 'time_step': 0.00024347996711730958, 'time_sample_batch': 7.195472717285157e-05, 'time_algorithm_update': 0.0031319856643676758, 'loss': 0.6063910726457834, 'rollout_return': 183.4, 'evaluation': 163.4} step=993000
2023-01-09 22:43.28 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109223635/model_994000.pt
2023-01-09 22:43.28 [info     ] DQN_online_20230109223635: epoch=994 step=994000 epoch=994 metrics={'time_inference': 0.0001854395866394043, 'time_environment_step': 1.224660873413086e-05, 'time_step': 0.00024364304542541504, 'time_sample_batch': 7.46011734008789e-05, 'time_algorithm_update': 0.0031879663467407225, 'loss': 0.6674715025350452, 'rollout_return': 

# DQN Using N-steps and Noisy Networks

In [None]:
# setup environment
# training env
env = gym.make('CartPole-v0')
# evaluation env
eval_env = gym.make('CartPole-v0')

In [None]:
# modify weight decay
optim_factory = OptimizerFactory(Adam, weight_decay=1e-4)
# setup algorithm
dqn = DQN(batch_size=32, # number of batches
          learning_rate=2.5e-4, # learning rate
          target_update_interval=100, # interval to synchronize the target network
          n_steps=4, # N-step TD calculation
          optim_factory= optim_factory # optimizer
         )

In [None]:
# setup replay buffer
buffer = ReplayBuffer(maxlen=1000000, env=env)

In [None]:
explorer = LinearDecayEpsilonGreedy(start_epsilon=1.0,
                                    end_epsilon=0.1,
                                    duration=10000)
#explorer = NormalNoise(mean= 0, std=0.1)

In [None]:
dqn.fit_online(
    env,  # environment
    buffer,  # buffer
    explorer=explorer,  # buffer
    eval_env=eval_env,  # eval environment
    n_steps_per_epoch=1000,  # the number of steps per epoch.
    update_interval=100,
    eval_epsilon=0.3,
    save_metrics=True,
    tensorboard_dir="runs",
)

2023-01-09 22:45.41 [info     ] Directory is created at d3rlpy_logs/DQN_online_20230109224541
2023-01-09 22:45.41 [debug    ] Building model...
2023-01-09 22:45.41 [debug    ] Model has been built.
2023-01-09 22:45.41 [info     ] Parameters are saved to d3rlpy_logs/DQN_online_20230109224541/params.json params={'action_scaler': None, 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 0.99, 'generated_maxlen': 100000, 'learning_rate': 0.00025, 'n_critics': 1, 'n_frames': 1, 'n_steps': 4, 'optim_factory': {'optim_cls': 'Adam', 'weight_decay': 0.0001}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'real_ratio': 1.0, 'reward_scaler': None, 'scaler': None, 'target_update_interval': 100, 'use_gpu': None, 'algorithm': 'DQN', 'observation_shape': (4,), 'action_size': 2}


  0%|          | 0/1000000 [00:00<?, ?it/s]

2023-01-09 22:45.41 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_1000.pt
2023-01-09 22:45.41 [info     ] DQN_online_20230109224541: epoch=1 step=1000 epoch=1 metrics={'time_inference': 0.00017836833000183105, 'time_environment_step': 1.2113809585571288e-05, 'time_step': 0.00021889591217041015, 'rollout_return': 21.88888888888889, 'time_sample_batch': 7.402896881103516e-05, 'time_algorithm_update': 0.001340937614440918, 'loss': 3.1005072593688965, 'evaluation': 11.9} step=1000
2023-01-09 22:45.41 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_2000.pt
2023-01-09 22:45.41 [info     ] DQN_online_20230109224541: epoch=2 step=2000 epoch=2 metrics={'time_inference': 0.00017458009719848632, 'time_environment_step': 1.1927604675292968e-05, 'time_step': 0.0002142350673675537, 'rollout_return': 21.847826086956523, 'time_sample_batch': 8.685588836669922e-05, 'time_algorithm_update': 0.0012852668762207032, 'loss': 2.800

2023-01-09 22:45.45 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_17000.pt
2023-01-09 22:45.45 [info     ] DQN_online_20230109224541: epoch=17 step=17000 epoch=17 metrics={'time_inference': 0.0001729733943939209, 'time_environment_step': 1.205897331237793e-05, 'time_step': 0.00021257472038269042, 'rollout_return': 9.97, 'time_sample_batch': 6.797313690185547e-05, 'time_algorithm_update': 0.001219463348388672, 'loss': 1.198813933134079, 'evaluation': 10.5} step=17000
2023-01-09 22:45.45 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_18000.pt
2023-01-09 22:45.45 [info     ] DQN_online_20230109224541: epoch=18 step=18000 epoch=18 metrics={'time_inference': 0.00017379117012023926, 'time_environment_step': 1.2032032012939453e-05, 'time_step': 0.0002145075798034668, 'rollout_return': 10.309278350515465, 'time_sample_batch': 8.101463317871094e-05, 'time_algorithm_update': 0.0013301849365234375, 'loss': 1.0410839080

2023-01-09 22:45.49 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_33000.pt
2023-01-09 22:45.49 [info     ] DQN_online_20230109224541: epoch=33 step=33000 epoch=33 metrics={'time_inference': 0.00018792557716369629, 'time_environment_step': 1.2913465499877929e-05, 'time_step': 0.00023414850234985352, 'rollout_return': 10.956043956043956, 'time_sample_batch': 8.9263916015625e-05, 'time_algorithm_update': 0.001676487922668457, 'loss': 1.9890812993049622, 'evaluation': 11.6} step=33000
2023-01-09 22:45.49 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_34000.pt
2023-01-09 22:45.49 [info     ] DQN_online_20230109224541: epoch=34 step=34000 epoch=34 metrics={'time_inference': 0.00020075535774230957, 'time_environment_step': 1.326298713684082e-05, 'time_step': 0.0002518947124481201, 'rollout_return': 11.386363636363637, 'time_sample_batch': 0.00010564327239990235, 'time_algorithm_update': 0.002048039436340332, 'loss'

2023-01-09 22:45.53 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_49000.pt
2023-01-09 22:45.53 [info     ] DQN_online_20230109224541: epoch=49 step=49000 epoch=49 metrics={'time_inference': 0.00017322778701782228, 'time_environment_step': 1.1877059936523438e-05, 'time_step': 0.00021294093132019043, 'rollout_return': 14.183098591549296, 'time_sample_batch': 8.490085601806641e-05, 'time_algorithm_update': 0.0012920141220092774, 'loss': 2.21360239982605, 'evaluation': 15.2} step=49000
2023-01-09 22:45.53 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_50000.pt
2023-01-09 22:45.53 [info     ] DQN_online_20230109224541: epoch=50 step=50000 epoch=50 metrics={'time_inference': 0.00017465806007385253, 'time_environment_step': 1.2087583541870117e-05, 'time_step': 0.00021478009223937987, 'rollout_return': 15.53125, 'time_sample_batch': 7.376670837402343e-05, 'time_algorithm_update': 0.001283550262451172, 'loss': 2.0726

2023-01-09 22:45.57 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_65000.pt
2023-01-09 22:45.57 [info     ] DQN_online_20230109224541: epoch=65 step=65000 epoch=65 metrics={'time_inference': 0.00017030620574951173, 'time_environment_step': 1.1711835861206054e-05, 'time_step': 0.00020851612091064454, 'rollout_return': 37.03703703703704, 'time_sample_batch': 7.216930389404297e-05, 'time_algorithm_update': 0.0012253999710083007, 'loss': 1.8147225499153137, 'evaluation': 33.0} step=65000
2023-01-09 22:45.57 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_66000.pt
2023-01-09 22:45.57 [info     ] DQN_online_20230109224541: epoch=66 step=66000 epoch=66 metrics={'time_inference': 0.00017241239547729492, 'time_environment_step': 1.1824369430541992e-05, 'time_step': 0.00021209812164306642, 'rollout_return': 34.55172413793103, 'time_sample_batch': 7.2479248046875e-05, 'time_algorithm_update': 0.0013484477996826172, 'loss

2023-01-09 22:46.01 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_81000.pt
2023-01-09 22:46.01 [info     ] DQN_online_20230109224541: epoch=81 step=81000 epoch=81 metrics={'time_inference': 0.00017502880096435547, 'time_environment_step': 1.1977434158325196e-05, 'time_step': 0.00021352815628051757, 'rollout_return': 46.857142857142854, 'time_sample_batch': 7.536411285400391e-05, 'time_algorithm_update': 0.001204371452331543, 'loss': 2.8786449909210203, 'evaluation': 42.5} step=81000
2023-01-09 22:46.02 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_82000.pt
2023-01-09 22:46.02 [info     ] DQN_online_20230109224541: epoch=82 step=82000 epoch=82 metrics={'time_inference': 0.00017325186729431154, 'time_environment_step': 1.18865966796875e-05, 'time_step': 0.00021246743202209473, 'rollout_return': 44.90909090909091, 'time_sample_batch': 8.237361907958984e-05, 'time_algorithm_update': 0.001297163963317871, 'loss'

2023-01-09 22:46.06 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_97000.pt
2023-01-09 22:46.06 [info     ] DQN_online_20230109224541: epoch=97 step=97000 epoch=97 metrics={'time_inference': 0.0001688532829284668, 'time_environment_step': 1.1443614959716796e-05, 'time_step': 0.0002061314582824707, 'rollout_return': 48.75, 'time_sample_batch': 7.202625274658204e-05, 'time_algorithm_update': 0.0011924028396606446, 'loss': 2.406794810295105, 'evaluation': 47.3} step=97000
2023-01-09 22:46.06 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_98000.pt
2023-01-09 22:46.06 [info     ] DQN_online_20230109224541: epoch=98 step=98000 epoch=98 metrics={'time_inference': 0.00017382168769836427, 'time_environment_step': 1.1943340301513672e-05, 'rollout_return': 48.59090909090909, 'time_step': 0.0002138223648071289, 'time_sample_batch': 9.911060333251953e-05, 'time_algorithm_update': 0.001333928108215332, 'loss': 2.6434879899

2023-01-09 22:46.11 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_113000.pt
2023-01-09 22:46.11 [info     ] DQN_online_20230109224541: epoch=113 step=113000 epoch=113 metrics={'time_inference': 0.00017946958541870117, 'time_environment_step': 1.1955022811889649e-05, 'time_step': 0.0002191600799560547, 'rollout_return': 50.55, 'time_sample_batch': 8.318424224853515e-05, 'time_algorithm_update': 0.0013079643249511719, 'loss': 3.4370392560958862, 'evaluation': 50.4} step=113000
2023-01-09 22:46.11 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_114000.pt
2023-01-09 22:46.11 [info     ] DQN_online_20230109224541: epoch=114 step=114000 epoch=114 metrics={'time_inference': 0.00017742395401000976, 'time_environment_step': 1.205587387084961e-05, 'time_step': 0.00021838712692260741, 'rollout_return': 52.578947368421055, 'time_sample_batch': 8.792877197265625e-05, 'time_algorithm_update': 0.0014107465744018556, 'loss':

2023-01-09 22:46.16 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_129000.pt
2023-01-09 22:46.16 [info     ] DQN_online_20230109224541: epoch=129 step=129000 epoch=129 metrics={'time_inference': 0.00017609691619873048, 'time_environment_step': 1.188802719116211e-05, 'time_step': 0.00021704840660095214, 'rollout_return': 63.9375, 'time_sample_batch': 8.006095886230469e-05, 'time_algorithm_update': 0.0014345169067382813, 'loss': 2.990252900123596, 'evaluation': 59.2} step=129000
2023-01-09 22:46.16 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_130000.pt
2023-01-09 22:46.16 [info     ] DQN_online_20230109224541: epoch=130 step=130000 epoch=130 metrics={'time_inference': 0.00017351603507995605, 'time_environment_step': 1.174163818359375e-05, 'time_step': 0.00021431255340576173, 'rollout_return': 56.611111111111114, 'time_sample_batch': 7.748603820800781e-05, 'time_algorithm_update': 0.0014618635177612305, 'loss'

2023-01-09 22:46.20 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_145000.pt
2023-01-09 22:46.20 [info     ] DQN_online_20230109224541: epoch=145 step=145000 epoch=145 metrics={'time_inference': 0.00017284297943115235, 'time_environment_step': 1.1619091033935547e-05, 'time_step': 0.00021514630317687989, 'rollout_return': 72.28571428571429, 'time_sample_batch': 8.571147918701172e-05, 'time_algorithm_update': 0.0016301870346069336, 'loss': 2.8731896758079527, 'evaluation': 75.7} step=145000
2023-01-09 22:46.21 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_146000.pt
2023-01-09 22:46.21 [info     ] DQN_online_20230109224541: epoch=146 step=146000 epoch=146 metrics={'time_inference': 0.00017501091957092285, 'time_environment_step': 1.1728525161743164e-05, 'time_step': 0.000219635009765625, 'rollout_return': 73.35714285714286, 'time_sample_batch': 8.394718170166016e-05, 'time_algorithm_update': 0.00182917118072509

2023-01-09 22:46.25 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_161000.pt
2023-01-09 22:46.25 [info     ] DQN_online_20230109224541: epoch=161 step=161000 epoch=161 metrics={'time_inference': 0.00018044710159301758, 'time_environment_step': 1.2001752853393554e-05, 'time_step': 0.0002282733917236328, 'rollout_return': 74.92857142857143, 'time_sample_batch': 9.615421295166016e-05, 'time_algorithm_update': 0.002095818519592285, 'loss': 4.196649837493896, 'evaluation': 74.4} step=161000
2023-01-09 22:46.26 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_162000.pt
2023-01-09 22:46.26 [info     ] DQN_online_20230109224541: epoch=162 step=162000 epoch=162 metrics={'time_inference': 0.0001782362461090088, 'time_environment_step': 1.1932611465454101e-05, 'time_step': 0.00022435140609741212, 'rollout_return': 92.27272727272727, 'time_sample_batch': 8.382797241210938e-05, 'time_algorithm_update': 0.0019492626190185547

2023-01-09 22:46.31 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_177000.pt
2023-01-09 22:46.31 [info     ] DQN_online_20230109224541: epoch=177 step=177000 epoch=177 metrics={'time_inference': 0.000176851749420166, 'time_environment_step': 1.1669158935546876e-05, 'time_step': 0.0002232177257537842, 'rollout_return': 110.55555555555556, 'time_sample_batch': 8.323192596435547e-05, 'time_algorithm_update': 0.002054452896118164, 'loss': 3.4247088193893434, 'evaluation': 76.4} step=177000
2023-01-09 22:46.31 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_178000.pt
2023-01-09 22:46.31 [info     ] DQN_online_20230109224541: epoch=178 step=178000 epoch=178 metrics={'time_inference': 0.0001789100170135498, 'time_environment_step': 1.1648416519165039e-05, 'time_step': 0.00022557902336120607, 'rollout_return': 122.375, 'time_sample_batch': 8.23974609375e-05, 'time_algorithm_update': 0.0020665645599365233, 'loss': 3.09

2023-01-09 22:46.36 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_193000.pt
2023-01-09 22:46.36 [info     ] DQN_online_20230109224541: epoch=193 step=193000 epoch=193 metrics={'time_inference': 0.00019195342063903808, 'time_environment_step': 1.217961311340332e-05, 'time_step': 0.0002432863712310791, 'rollout_return': 151.83333333333334, 'time_sample_batch': 8.90970230102539e-05, 'time_algorithm_update': 0.0023497343063354492, 'loss': 3.7730384945869444, 'evaluation': 95.7} step=193000
2023-01-09 22:46.37 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_194000.pt
2023-01-09 22:46.37 [info     ] DQN_online_20230109224541: epoch=194 step=194000 epoch=194 metrics={'time_inference': 0.00018442893028259278, 'time_environment_step': 1.1858224868774415e-05, 'time_step': 0.0002330918312072754, 'rollout_return': 118.25, 'time_sample_batch': 8.289813995361328e-05, 'time_algorithm_update': 0.0021754741668701173, 'loss': 

2023-01-09 22:46.42 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_209000.pt
2023-01-09 22:46.42 [info     ] DQN_online_20230109224541: epoch=209 step=209000 epoch=209 metrics={'time_inference': 0.00018307900428771973, 'time_environment_step': 1.1915922164916992e-05, 'time_step': 0.00023303771018981935, 'time_sample_batch': 9.579658508300782e-05, 'time_algorithm_update': 0.0023252010345458985, 'loss': 2.5630555629730223, 'rollout_return': 145.28571428571428, 'evaluation': 93.6} step=209000
2023-01-09 22:46.43 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_210000.pt
2023-01-09 22:46.43 [info     ] DQN_online_20230109224541: epoch=210 step=210000 epoch=210 metrics={'time_inference': 0.00018039488792419433, 'time_environment_step': 1.1650323867797852e-05, 'time_step': 0.00022837448120117186, 'rollout_return': 151.0, 'time_sample_batch': 9.102821350097657e-05, 'time_algorithm_update': 0.0021869897842407226, 'loss

2023-01-09 22:46.48 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_225000.pt
2023-01-09 22:46.48 [info     ] DQN_online_20230109224541: epoch=225 step=225000 epoch=225 metrics={'time_inference': 0.00020177531242370606, 'time_environment_step': 1.2557506561279297e-05, 'time_step': 0.00025516176223754886, 'rollout_return': 135.75, 'time_sample_batch': 0.00010092258453369141, 'time_algorithm_update': 0.002502846717834473, 'loss': 3.837026023864746, 'evaluation': 133.5} step=225000
2023-01-09 22:46.49 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_226000.pt
2023-01-09 22:46.49 [info     ] DQN_online_20230109224541: epoch=226 step=226000 epoch=226 metrics={'time_inference': 0.00018798279762268066, 'time_environment_step': 1.2636423110961914e-05, 'time_step': 0.00023946642875671386, 'time_sample_batch': 0.00011463165283203125, 'time_algorithm_update': 0.002351069450378418, 'loss': 3.3007903933525085, 'rollout_retur

2023-01-09 22:46.55 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_241000.pt
2023-01-09 22:46.55 [info     ] DQN_online_20230109224541: epoch=241 step=241000 epoch=241 metrics={'time_inference': 0.00018727445602416992, 'time_environment_step': 1.1919260025024415e-05, 'time_step': 0.00023765063285827637, 'time_sample_batch': 8.599758148193359e-05, 'time_algorithm_update': 0.002388787269592285, 'loss': 3.3328821659088135, 'rollout_return': 200.0, 'evaluation': 161.6} step=241000
2023-01-09 22:46.55 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_242000.pt
2023-01-09 22:46.55 [info     ] DQN_online_20230109224541: epoch=242 step=242000 epoch=242 metrics={'time_inference': 0.00018614125251770018, 'time_environment_step': 1.1940240859985352e-05, 'time_step': 0.0002353391647338867, 'time_sample_batch': 9.300708770751954e-05, 'time_algorithm_update': 0.0022511720657348634, 'loss': 3.603861856460571, 'rollout_return':

2023-01-09 22:47.02 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_257000.pt
2023-01-09 22:47.02 [info     ] DQN_online_20230109224541: epoch=257 step=257000 epoch=257 metrics={'time_inference': 0.00018452048301696776, 'time_environment_step': 1.172041893005371e-05, 'time_step': 0.0002349393367767334, 'rollout_return': 200.0, 'time_sample_batch': 9.198188781738282e-05, 'time_algorithm_update': 0.002420639991760254, 'loss': 4.054210507869721, 'evaluation': 142.7} step=257000
2023-01-09 22:47.02 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_258000.pt
2023-01-09 22:47.02 [info     ] DQN_online_20230109224541: epoch=258 step=258000 epoch=258 metrics={'time_inference': 0.00019045114517211913, 'time_environment_step': 1.2040138244628906e-05, 'time_step': 0.0002461245059967041, 'rollout_return': 188.8, 'time_sample_batch': 9.703636169433594e-05, 'time_algorithm_update': 0.0028516054153442383, 'loss': 2.950032496452

2023-01-09 22:47.09 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_273000.pt
2023-01-09 22:47.09 [info     ] DQN_online_20230109224541: epoch=273 step=273000 epoch=273 metrics={'time_inference': 0.00019434309005737305, 'time_environment_step': 1.2166261672973634e-05, 'time_step': 0.0002509255409240723, 'time_sample_batch': 0.00014810562133789064, 'time_algorithm_update': 0.002864170074462891, 'loss': 3.797875261306763, 'rollout_return': 200.0, 'evaluation': 162.7} step=273000
2023-01-09 22:47.09 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_274000.pt
2023-01-09 22:47.09 [info     ] DQN_online_20230109224541: epoch=274 step=274000 epoch=274 metrics={'time_inference': 0.00018247532844543456, 'time_environment_step': 1.1736154556274414e-05, 'time_step': 0.0002338283061981201, 'time_sample_batch': 8.270740509033203e-05, 'time_algorithm_update': 0.0025220155715942384, 'loss': 4.059943270683289, 'rollout_return': 

2023-01-09 22:47.16 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_289000.pt
2023-01-09 22:47.16 [info     ] DQN_online_20230109224541: epoch=289 step=289000 epoch=289 metrics={'time_inference': 0.0001830291748046875, 'time_environment_step': 1.1815547943115234e-05, 'time_step': 0.00023456406593322753, 'rollout_return': 193.0, 'time_sample_batch': 8.85009765625e-05, 'time_algorithm_update': 0.0025296449661254884, 'loss': 3.8902641773223876, 'evaluation': 147.8} step=289000
2023-01-09 22:47.16 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_290000.pt
2023-01-09 22:47.16 [info     ] DQN_online_20230109224541: epoch=290 step=290000 epoch=290 metrics={'time_inference': 0.000184675931930542, 'time_environment_step': 1.1972427368164063e-05, 'time_step': 0.00024114799499511718, 'time_sample_batch': 8.955001831054688e-05, 'time_algorithm_update': 0.0029163599014282227, 'loss': 3.5890611886978148, 'rollout_return': 186

2023-01-09 22:47.23 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_305000.pt
2023-01-09 22:47.23 [info     ] DQN_online_20230109224541: epoch=305 step=305000 epoch=305 metrics={'time_inference': 0.00018380379676818847, 'time_environment_step': 1.1791229248046876e-05, 'time_step': 0.00023557829856872558, 'rollout_return': 163.28571428571428, 'time_sample_batch': 8.41379165649414e-05, 'time_algorithm_update': 0.0025544166564941406, 'loss': 3.850495481491089, 'evaluation': 146.3} step=305000
2023-01-09 22:47.23 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_306000.pt
2023-01-09 22:47.23 [info     ] DQN_online_20230109224541: epoch=306 step=306000 epoch=306 metrics={'time_inference': 0.0001849980354309082, 'time_environment_step': 1.1770963668823243e-05, 'time_step': 0.0002384359836578369, 'time_sample_batch': 8.511543273925781e-05, 'time_algorithm_update': 0.002710914611816406, 'loss': 3.8504348516464235, 'rollo

2023-01-09 22:47.30 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_321000.pt
2023-01-09 22:47.30 [info     ] DQN_online_20230109224541: epoch=321 step=321000 epoch=321 metrics={'time_inference': 0.000186007022857666, 'time_environment_step': 1.1971712112426758e-05, 'time_step': 0.0002393763065338135, 'time_sample_batch': 8.466243743896485e-05, 'time_algorithm_update': 0.0026027202606201173, 'loss': 4.051473689079285, 'rollout_return': 196.6, 'evaluation': 136.7} step=321000
2023-01-09 22:47.31 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_322000.pt
2023-01-09 22:47.31 [info     ] DQN_online_20230109224541: epoch=322 step=322000 epoch=322 metrics={'time_inference': 0.00018451929092407225, 'time_environment_step': 1.1766910552978515e-05, 'time_step': 0.00023855972290039062, 'rollout_return': 193.6, 'time_sample_batch': 0.00010042190551757812, 'time_algorithm_update': 0.002712392807006836, 'loss': 3.92647428512

2023-01-09 22:47.37 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_337000.pt
2023-01-09 22:47.37 [info     ] DQN_online_20230109224541: epoch=337 step=337000 epoch=337 metrics={'time_inference': 0.00019289708137512206, 'time_environment_step': 1.2107372283935546e-05, 'time_step': 0.0002465212345123291, 'time_sample_batch': 9.398460388183593e-05, 'time_algorithm_update': 0.0026479005813598634, 'loss': 4.158038783073425, 'rollout_return': 193.8, 'evaluation': 159.3} step=337000
2023-01-09 22:47.38 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_338000.pt
2023-01-09 22:47.38 [info     ] DQN_online_20230109224541: epoch=338 step=338000 epoch=338 metrics={'time_inference': 0.00018610143661499024, 'time_environment_step': 1.197361946105957e-05, 'time_step': 0.00024015235900878907, 'time_sample_batch': 8.983612060546875e-05, 'time_algorithm_update': 0.002732658386230469, 'loss': 4.597315526008606, 'rollout_return': 1

2023-01-09 22:47.45 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_353000.pt
2023-01-09 22:47.45 [info     ] DQN_online_20230109224541: epoch=353 step=353000 epoch=353 metrics={'time_inference': 0.00018410348892211915, 'time_environment_step': 1.1658906936645509e-05, 'time_step': 0.00023674559593200682, 'time_sample_batch': 8.44717025756836e-05, 'time_algorithm_update': 0.0026703596115112303, 'loss': 4.187082195281983, 'rollout_return': 199.0, 'evaluation': 180.9} step=353000
2023-01-09 22:47.45 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_354000.pt
2023-01-09 22:47.45 [info     ] DQN_online_20230109224541: epoch=354 step=354000 epoch=354 metrics={'time_inference': 0.00018303918838500976, 'time_environment_step': 1.1859416961669921e-05, 'time_step': 0.00023739218711853029, 'time_sample_batch': 9.708404541015625e-05, 'time_algorithm_update': 0.002803349494934082, 'loss': 4.42541241645813, 'rollout_return': 2

2023-01-09 22:47.52 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_369000.pt
2023-01-09 22:47.52 [info     ] DQN_online_20230109224541: epoch=369 step=369000 epoch=369 metrics={'time_inference': 0.00018582844734191896, 'time_environment_step': 1.1934518814086915e-05, 'time_step': 0.00023946857452392578, 'rollout_return': 190.0, 'time_sample_batch': 9.8419189453125e-05, 'time_algorithm_update': 0.002710866928100586, 'loss': 3.6942995309829714, 'evaluation': 183.2} step=369000
2023-01-09 22:47.52 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_370000.pt
2023-01-09 22:47.52 [info     ] DQN_online_20230109224541: epoch=370 step=370000 epoch=370 metrics={'time_inference': 0.0001836273670196533, 'time_environment_step': 1.179671287536621e-05, 'time_step': 0.00023650503158569337, 'time_sample_batch': 9.055137634277344e-05, 'time_algorithm_update': 0.0026720762252807617, 'loss': 3.652223563194275, 'rollout_return': 19

2023-01-09 22:47.59 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_385000.pt
2023-01-09 22:47.59 [info     ] DQN_online_20230109224541: epoch=385 step=385000 epoch=385 metrics={'time_inference': 0.00018597769737243653, 'time_environment_step': 1.1768102645874024e-05, 'time_step': 0.0002413604259490967, 'time_sample_batch': 9.634494781494141e-05, 'time_algorithm_update': 0.0029045820236206056, 'loss': 5.067155575752258, 'rollout_return': 200.0, 'evaluation': 183.8} step=385000
2023-01-09 22:48.00 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_386000.pt
2023-01-09 22:48.00 [info     ] DQN_online_20230109224541: epoch=386 step=386000 epoch=386 metrics={'time_inference': 0.00018500375747680664, 'time_environment_step': 1.1784791946411133e-05, 'time_step': 0.00023864126205444335, 'time_sample_batch': 9.586811065673829e-05, 'time_algorithm_update': 0.002727961540222168, 'loss': 4.373592233657837, 'rollout_return': 

2023-01-09 22:48.07 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_401000.pt
2023-01-09 22:48.07 [info     ] DQN_online_20230109224541: epoch=401 step=401000 epoch=401 metrics={'time_inference': 0.00018110418319702148, 'time_environment_step': 1.1623144149780273e-05, 'time_step': 0.0002346034049987793, 'time_sample_batch': 8.757114410400391e-05, 'time_algorithm_update': 0.0027690410614013674, 'loss': 4.256772041320801, 'rollout_return': 194.8, 'evaluation': 176.7} step=401000
2023-01-09 22:48.07 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_402000.pt
2023-01-09 22:48.07 [info     ] DQN_online_20230109224541: epoch=402 step=402000 epoch=402 metrics={'time_inference': 0.00018799161911010743, 'time_environment_step': 1.1976003646850586e-05, 'time_step': 0.00024151086807250977, 'time_sample_batch': 8.516311645507812e-05, 'time_algorithm_update': 0.0027014970779418944, 'loss': 4.208209156990051, 'rollout_return':

2023-01-09 22:48.14 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_417000.pt
2023-01-09 22:48.14 [info     ] DQN_online_20230109224541: epoch=417 step=417000 epoch=417 metrics={'time_inference': 0.0001874380111694336, 'time_environment_step': 1.182723045349121e-05, 'time_step': 0.0002419438362121582, 'rollout_return': 200.0, 'time_sample_batch': 0.00010526180267333984, 'time_algorithm_update': 0.0027949333190917967, 'loss': 4.5746063709259035, 'evaluation': 164.3} step=417000
2023-01-09 22:48.15 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_418000.pt
2023-01-09 22:48.15 [info     ] DQN_online_20230109224541: epoch=418 step=418000 epoch=418 metrics={'time_inference': 0.00018561387062072753, 'time_environment_step': 1.181793212890625e-05, 'time_step': 0.00023937368392944335, 'rollout_return': 200.0, 'time_sample_batch': 8.73565673828125e-05, 'time_algorithm_update': 0.002747797966003418, 'loss': 5.019265794754

2023-01-09 22:48.22 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_433000.pt
2023-01-09 22:48.22 [info     ] DQN_online_20230109224541: epoch=433 step=433000 epoch=433 metrics={'time_inference': 0.00018842601776123048, 'time_environment_step': 1.186060905456543e-05, 'time_step': 0.00024287867546081543, 'rollout_return': 200.0, 'time_sample_batch': 8.883476257324219e-05, 'time_algorithm_update': 0.00281980037689209, 'loss': 4.190344440937042, 'evaluation': 195.6} step=433000
2023-01-09 22:48.22 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_434000.pt
2023-01-09 22:48.22 [info     ] DQN_online_20230109224541: epoch=434 step=434000 epoch=434 metrics={'time_inference': 0.00019295620918273926, 'time_environment_step': 1.2111663818359374e-05, 'time_step': 0.00024890565872192385, 'rollout_return': 200.0, 'time_sample_batch': 9.293556213378907e-05, 'time_algorithm_update': 0.00290985107421875, 'loss': 3.8751113891601

2023-01-09 22:48.29 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_449000.pt
2023-01-09 22:48.29 [info     ] DQN_online_20230109224541: epoch=449 step=449000 epoch=449 metrics={'time_inference': 0.0001875758171081543, 'time_environment_step': 1.1751413345336914e-05, 'time_step': 0.00024197888374328613, 'rollout_return': 200.0, 'time_sample_batch': 8.27789306640625e-05, 'time_algorithm_update': 0.002847456932067871, 'loss': 5.062467432022094, 'evaluation': 181.1} step=449000
2023-01-09 22:48.30 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_450000.pt
2023-01-09 22:48.30 [info     ] DQN_online_20230109224541: epoch=450 step=450000 epoch=450 metrics={'time_inference': 0.00018807458877563476, 'time_environment_step': 1.1926651000976563e-05, 'time_step': 0.00024404525756835938, 'rollout_return': 200.0, 'time_sample_batch': 9.48190689086914e-05, 'time_algorithm_update': 0.0029592275619506835, 'loss': 4.584607887268

2023-01-09 22:48.37 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_465000.pt
2023-01-09 22:48.37 [info     ] DQN_online_20230109224541: epoch=465 step=465000 epoch=465 metrics={'time_inference': 0.0001818835735321045, 'time_environment_step': 1.1719226837158204e-05, 'time_step': 0.00023939704895019532, 'time_sample_batch': 0.00010936260223388672, 'time_algorithm_update': 0.0031317949295043947, 'loss': 4.235664105415344, 'rollout_return': 200.0, 'evaluation': 181.4} step=465000
2023-01-09 22:48.37 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_466000.pt
2023-01-09 22:48.37 [info     ] DQN_online_20230109224541: epoch=466 step=466000 epoch=466 metrics={'time_inference': 0.0001778581142425537, 'time_environment_step': 1.1528491973876953e-05, 'time_step': 0.000232147216796875, 'time_sample_batch': 8.440017700195312e-05, 'time_algorithm_update': 0.0028859376907348633, 'loss': 4.65201153755188, 'rollout_return': 19

2023-01-09 22:48.44 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_481000.pt
2023-01-09 22:48.44 [info     ] DQN_online_20230109224541: epoch=481 step=481000 epoch=481 metrics={'time_inference': 0.00017836904525756836, 'time_environment_step': 1.1627912521362305e-05, 'time_step': 0.00023387718200683594, 'rollout_return': 192.0, 'time_sample_batch': 8.494853973388672e-05, 'time_algorithm_update': 0.002977800369262695, 'loss': 4.250771117210388, 'evaluation': 177.2} step=481000
2023-01-09 22:48.44 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_482000.pt
2023-01-09 22:48.44 [info     ] DQN_online_20230109224541: epoch=482 step=482000 epoch=482 metrics={'time_inference': 0.00018012762069702148, 'time_environment_step': 1.1627674102783204e-05, 'time_step': 0.00023509407043457032, 'time_sample_batch': 9.11712646484375e-05, 'time_algorithm_update': 0.002916574478149414, 'loss': 5.084747934341431, 'rollout_return': 1

2023-01-09 22:48.51 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_497000.pt
2023-01-09 22:48.51 [info     ] DQN_online_20230109224541: epoch=497 step=497000 epoch=497 metrics={'time_inference': 0.00018908143043518068, 'time_environment_step': 1.2041807174682617e-05, 'time_step': 0.00024371600151062013, 'time_sample_batch': 9.541511535644531e-05, 'time_algorithm_update': 0.00279695987701416, 'loss': 5.587997031211853, 'rollout_return': 197.2, 'evaluation': 179.4} step=497000
2023-01-09 22:48.52 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_498000.pt
2023-01-09 22:48.52 [info     ] DQN_online_20230109224541: epoch=498 step=498000 epoch=498 metrics={'time_inference': 0.000185039758682251, 'time_environment_step': 1.1862754821777344e-05, 'time_step': 0.00023814916610717774, 'time_sample_batch': 8.819103240966796e-05, 'time_algorithm_update': 0.002692770957946777, 'loss': 5.4516857147216795, 'rollout_return': 19

2023-01-09 22:48.59 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_513000.pt
2023-01-09 22:48.59 [info     ] DQN_online_20230109224541: epoch=513 step=513000 epoch=513 metrics={'time_inference': 0.00018047261238098144, 'time_environment_step': 1.1698484420776367e-05, 'time_step': 0.00023380732536315918, 'time_sample_batch': 8.90970230102539e-05, 'time_algorithm_update': 0.002750706672668457, 'loss': 4.079363703727722, 'rollout_return': 192.4, 'evaluation': 176.1} step=513000
2023-01-09 22:48.59 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_514000.pt
2023-01-09 22:48.59 [info     ] DQN_online_20230109224541: epoch=514 step=514000 epoch=514 metrics={'time_inference': 0.00017851972579956054, 'time_environment_step': 1.1702775955200196e-05, 'time_step': 0.00023197650909423829, 'time_sample_batch': 0.00010330677032470703, 'time_algorithm_update': 0.0027477264404296873, 'loss': 4.323057413101196, 'rollout_return':

2023-01-09 22:49.06 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_529000.pt
2023-01-09 22:49.06 [info     ] DQN_online_20230109224541: epoch=529 step=529000 epoch=529 metrics={'time_inference': 0.00019081830978393555, 'time_environment_step': 1.244044303894043e-05, 'time_step': 0.0002459287643432617, 'time_sample_batch': 9.66787338256836e-05, 'time_algorithm_update': 0.002769660949707031, 'loss': 4.402134108543396, 'rollout_return': 191.4, 'evaluation': 158.5} step=529000
2023-01-09 22:49.07 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_530000.pt
2023-01-09 22:49.07 [info     ] DQN_online_20230109224541: epoch=530 step=530000 epoch=530 metrics={'time_inference': 0.0001853299140930176, 'time_environment_step': 1.2001276016235352e-05, 'time_step': 0.00024100828170776366, 'time_sample_batch': 0.00010492801666259766, 'time_algorithm_update': 0.0028977155685424804, 'loss': 6.39500207901001, 'rollout_return': 196

2023-01-09 22:49.14 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_545000.pt
2023-01-09 22:49.14 [info     ] DQN_online_20230109224541: epoch=545 step=545000 epoch=545 metrics={'time_inference': 0.00019085311889648437, 'time_environment_step': 1.2268543243408203e-05, 'time_step': 0.00024532628059387206, 'time_sample_batch': 8.571147918701172e-05, 'time_algorithm_update': 0.0027477025985717775, 'loss': 5.191701650619507, 'rollout_return': 200.0, 'evaluation': 171.5} step=545000
2023-01-09 22:49.14 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_546000.pt
2023-01-09 22:49.14 [info     ] DQN_online_20230109224541: epoch=546 step=546000 epoch=546 metrics={'time_inference': 0.00018599987030029297, 'time_environment_step': 1.2054920196533203e-05, 'time_step': 0.00024142956733703614, 'time_sample_batch': 0.00010879039764404297, 'time_algorithm_update': 0.0028665781021118162, 'loss': 5.579459381103516, 'rollout_return

2023-01-09 22:49.21 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_561000.pt
2023-01-09 22:49.21 [info     ] DQN_online_20230109224541: epoch=561 step=561000 epoch=561 metrics={'time_inference': 0.00018976807594299318, 'time_environment_step': 1.2161731719970703e-05, 'time_step': 0.00024407362937927247, 'time_sample_batch': 8.399486541748047e-05, 'time_algorithm_update': 0.0027505636215209963, 'loss': 6.173399972915649, 'rollout_return': 199.2, 'evaluation': 187.2} step=561000
2023-01-09 22:49.22 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_562000.pt
2023-01-09 22:49.22 [info     ] DQN_online_20230109224541: epoch=562 step=562000 epoch=562 metrics={'time_inference': 0.0001882448196411133, 'time_environment_step': 1.2154817581176758e-05, 'time_step': 0.0002439708709716797, 'time_sample_batch': 9.67264175415039e-05, 'time_algorithm_update': 0.0028844833374023437, 'loss': 5.074644231796265, 'rollout_return': 2

2023-01-09 22:49.29 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_577000.pt
2023-01-09 22:49.29 [info     ] DQN_online_20230109224541: epoch=577 step=577000 epoch=577 metrics={'time_inference': 0.0001810455322265625, 'time_environment_step': 1.2258052825927735e-05, 'time_step': 0.0002355172634124756, 'rollout_return': 198.2, 'time_sample_batch': 9.129047393798828e-05, 'time_algorithm_update': 0.002795243263244629, 'loss': 4.958232259750366, 'evaluation': 179.6} step=577000
2023-01-09 22:49.29 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_578000.pt
2023-01-09 22:49.29 [info     ] DQN_online_20230109224541: epoch=578 step=578000 epoch=578 metrics={'time_inference': 0.0001836111545562744, 'time_environment_step': 1.1922121047973633e-05, 'time_step': 0.00023804855346679687, 'rollout_return': 189.16666666666666, 'time_sample_batch': 8.950233459472656e-05, 'time_algorithm_update': 0.0028162479400634767, 'loss': 5

2023-01-09 22:49.36 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_593000.pt
2023-01-09 22:49.36 [info     ] DQN_online_20230109224541: epoch=593 step=593000 epoch=593 metrics={'time_inference': 0.00018761062622070312, 'time_environment_step': 1.2000560760498048e-05, 'time_step': 0.00024423956871032714, 'rollout_return': 200.0, 'time_sample_batch': 9.751319885253906e-05, 'time_algorithm_update': 0.003011465072631836, 'loss': 4.986162757873535, 'evaluation': 143.9} step=593000
2023-01-09 22:49.37 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_594000.pt
2023-01-09 22:49.37 [info     ] DQN_online_20230109224541: epoch=594 step=594000 epoch=594 metrics={'time_inference': 0.00018228912353515626, 'time_environment_step': 1.1806488037109376e-05, 'time_step': 0.0002366774082183838, 'rollout_return': 200.0, 'time_sample_batch': 8.449554443359374e-05, 'time_algorithm_update': 0.0028403043746948243, 'loss': 5.8174803256

2023-01-09 22:49.44 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_609000.pt
2023-01-09 22:49.44 [info     ] DQN_online_20230109224541: epoch=609 step=609000 epoch=609 metrics={'time_inference': 0.00019885945320129394, 'time_environment_step': 1.2631893157958984e-05, 'time_step': 0.00025702404975891114, 'rollout_return': 200.0, 'time_sample_batch': 9.372234344482422e-05, 'time_algorithm_update': 0.0030234098434448243, 'loss': 5.817115139961243, 'evaluation': 180.3} step=609000
2023-01-09 22:49.44 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_610000.pt
2023-01-09 22:49.44 [info     ] DQN_online_20230109224541: epoch=610 step=610000 epoch=610 metrics={'time_inference': 0.00019047522544860839, 'time_environment_step': 1.2289047241210937e-05, 'time_step': 0.00024688553810119627, 'rollout_return': 196.6, 'time_sample_batch': 9.233951568603516e-05, 'time_algorithm_update': 0.002927136421203613, 'loss': 5.533156657

2023-01-09 22:49.52 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_625000.pt
2023-01-09 22:49.52 [info     ] DQN_online_20230109224541: epoch=625 step=625000 epoch=625 metrics={'time_inference': 0.00018740081787109374, 'time_environment_step': 1.2143611907958984e-05, 'time_step': 0.00024396896362304687, 'time_sample_batch': 0.00011684894561767579, 'time_algorithm_update': 0.0029541730880737306, 'loss': 5.271264386177063, 'rollout_return': 200.0, 'evaluation': 173.4} step=625000
2023-01-09 22:49.52 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_626000.pt
2023-01-09 22:49.52 [info     ] DQN_online_20230109224541: epoch=626 step=626000 epoch=626 metrics={'time_inference': 0.00018711519241333007, 'time_environment_step': 1.2075424194335938e-05, 'time_step': 0.00024332690238952636, 'time_sample_batch': 9.028911590576172e-05, 'time_algorithm_update': 0.00295865535736084, 'loss': 6.043635702133178, 'rollout_return':

2023-01-09 22:50.00 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_641000.pt
2023-01-09 22:50.00 [info     ] DQN_online_20230109224541: epoch=641 step=641000 epoch=641 metrics={'time_inference': 0.00019131112098693848, 'time_environment_step': 1.241445541381836e-05, 'time_step': 0.00024714159965515136, 'rollout_return': 194.6, 'time_sample_batch': 8.754730224609375e-05, 'time_algorithm_update': 0.0028450489044189453, 'loss': 6.191790461540222, 'evaluation': 199.7} step=641000
2023-01-09 22:50.00 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_642000.pt
2023-01-09 22:50.00 [info     ] DQN_online_20230109224541: epoch=642 step=642000 epoch=642 metrics={'time_inference': 0.00018997883796691893, 'time_environment_step': 1.2375593185424804e-05, 'time_step': 0.0002465364933013916, 'rollout_return': 198.2, 'time_sample_batch': 8.65936279296875e-05, 'time_algorithm_update': 0.002938580513000488, 'loss': 6.290962862968

2023-01-09 22:50.07 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_657000.pt
2023-01-09 22:50.07 [info     ] DQN_online_20230109224541: epoch=657 step=657000 epoch=657 metrics={'time_inference': 0.0001833174228668213, 'time_environment_step': 1.1851787567138672e-05, 'time_step': 0.0002429804801940918, 'time_sample_batch': 8.862018585205078e-05, 'time_algorithm_update': 0.003364276885986328, 'loss': 4.973190879821777, 'rollout_return': 191.0, 'evaluation': 187.6} step=657000
2023-01-09 22:50.08 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_658000.pt
2023-01-09 22:50.08 [info     ] DQN_online_20230109224541: epoch=658 step=658000 epoch=658 metrics={'time_inference': 0.00018932723999023438, 'time_environment_step': 1.2122392654418945e-05, 'time_step': 0.0002486996650695801, 'rollout_return': 187.8, 'time_sample_batch': 8.203983306884766e-05, 'time_algorithm_update': 0.0032213211059570314, 'loss': 5.623164844512

2023-01-09 22:50.15 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_673000.pt
2023-01-09 22:50.15 [info     ] DQN_online_20230109224541: epoch=673 step=673000 epoch=673 metrics={'time_inference': 0.0001944401264190674, 'time_environment_step': 1.2450695037841797e-05, 'time_step': 0.00025360894203186034, 'rollout_return': 200.0, 'time_sample_batch': 8.98599624633789e-05, 'time_algorithm_update': 0.003177499771118164, 'loss': 7.069508457183838, 'evaluation': 189.8} step=673000
2023-01-09 22:50.16 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_674000.pt
2023-01-09 22:50.16 [info     ] DQN_online_20230109224541: epoch=674 step=674000 epoch=674 metrics={'time_inference': 0.00018381285667419433, 'time_environment_step': 1.1899232864379883e-05, 'time_step': 0.00024223542213439942, 'rollout_return': 200.0, 'time_sample_batch': 9.274482727050781e-05, 'time_algorithm_update': 0.003224635124206543, 'loss': 4.192519450187

2023-01-09 22:50.23 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_689000.pt
2023-01-09 22:50.23 [info     ] DQN_online_20230109224541: epoch=689 step=689000 epoch=689 metrics={'time_inference': 0.00018054890632629396, 'time_environment_step': 1.1585474014282227e-05, 'time_step': 0.00024019885063171386, 'time_sample_batch': 0.00011255741119384766, 'time_algorithm_update': 0.0033828020095825195, 'loss': 5.796890497207642, 'rollout_return': 198.2, 'evaluation': 197.1} step=689000
2023-01-09 22:50.23 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_690000.pt
2023-01-09 22:50.24 [info     ] DQN_online_20230109224541: epoch=690 step=690000 epoch=690 metrics={'time_inference': 0.0001784341335296631, 'time_environment_step': 1.1665105819702148e-05, 'time_step': 0.00023495650291442872, 'rollout_return': 190.4, 'time_sample_batch': 8.814334869384765e-05, 'time_algorithm_update': 0.003085160255432129, 'loss': 6.227920484

2023-01-09 22:50.31 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_705000.pt
2023-01-09 22:50.31 [info     ] DQN_online_20230109224541: epoch=705 step=705000 epoch=705 metrics={'time_inference': 0.00018850088119506836, 'time_environment_step': 1.2161493301391602e-05, 'time_step': 0.0002481176853179932, 'rollout_return': 195.83333333333334, 'time_sample_batch': 8.890628814697265e-05, 'time_algorithm_update': 0.0032797336578369142, 'loss': 5.784766149520874, 'evaluation': 185.3} step=705000
2023-01-09 22:50.31 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_706000.pt
2023-01-09 22:50.31 [info     ] DQN_online_20230109224541: epoch=706 step=706000 epoch=706 metrics={'time_inference': 0.0001854376792907715, 'time_environment_step': 1.1962890625e-05, 'time_step': 0.00024413490295410156, 'time_sample_batch': 9.455680847167969e-05, 'time_algorithm_update': 0.00323333740234375, 'loss': 7.885943841934204, 'rollout_retu

2023-01-09 22:50.38 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_721000.pt
2023-01-09 22:50.38 [info     ] DQN_online_20230109224541: epoch=721 step=721000 epoch=721 metrics={'time_inference': 0.00018956375122070312, 'time_environment_step': 1.231694221496582e-05, 'time_step': 0.0002493267059326172, 'time_sample_batch': 8.552074432373047e-05, 'time_algorithm_update': 0.0032730579376220705, 'loss': 5.96914222240448, 'rollout_return': 198.8, 'evaluation': 182.6} step=721000
2023-01-09 22:50.39 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_722000.pt
2023-01-09 22:50.39 [info     ] DQN_online_20230109224541: epoch=722 step=722000 epoch=722 metrics={'time_inference': 0.00019173502922058107, 'time_environment_step': 1.2232780456542969e-05, 'time_step': 0.00025186800956726076, 'time_sample_batch': 9.784698486328124e-05, 'time_algorithm_update': 0.003300929069519043, 'loss': 5.691387033462524, 'rollout_return': 19

2023-01-09 22:50.46 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_737000.pt
2023-01-09 22:50.46 [info     ] DQN_online_20230109224541: epoch=737 step=737000 epoch=737 metrics={'time_inference': 0.00019215798377990724, 'time_environment_step': 1.2317419052124023e-05, 'time_step': 0.00025220561027526855, 'time_sample_batch': 8.521080017089843e-05, 'time_algorithm_update': 0.003292703628540039, 'loss': 6.432808852195739, 'rollout_return': 186.8, 'evaluation': 179.0} step=737000
2023-01-09 22:50.47 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_738000.pt
2023-01-09 22:50.47 [info     ] DQN_online_20230109224541: epoch=738 step=738000 epoch=738 metrics={'time_inference': 0.0001871182918548584, 'time_environment_step': 1.2314081192016602e-05, 'time_step': 0.0002470061779022217, 'rollout_return': 200.0, 'time_sample_batch': 0.00010981559753417969, 'time_algorithm_update': 0.003266716003417969, 'loss': 5.50335981845

2023-01-09 22:50.54 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_753000.pt
2023-01-09 22:50.54 [info     ] DQN_online_20230109224541: epoch=753 step=753000 epoch=753 metrics={'time_inference': 0.0001963989734649658, 'time_environment_step': 1.2544155120849609e-05, 'time_step': 0.00025844001770019534, 'rollout_return': 200.0, 'time_sample_batch': 8.928775787353516e-05, 'time_algorithm_update': 0.0034348726272583007, 'loss': 9.19768762588501, 'evaluation': 188.9} step=753000
2023-01-09 22:50.55 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_754000.pt
2023-01-09 22:50.55 [info     ] DQN_online_20230109224541: epoch=754 step=754000 epoch=754 metrics={'time_inference': 0.00019380950927734374, 'time_environment_step': 1.2381553649902344e-05, 'time_step': 0.00025581884384155273, 'rollout_return': 197.83333333333334, 'time_sample_batch': 0.00010461807250976563, 'time_algorithm_update': 0.0034469842910766603, 'loss'

2023-01-09 22:51.02 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_769000.pt
2023-01-09 22:51.02 [info     ] DQN_online_20230109224541: epoch=769 step=769000 epoch=769 metrics={'time_inference': 0.00018949103355407715, 'time_environment_step': 1.2113332748413086e-05, 'time_step': 0.0002481491565704346, 'time_sample_batch': 8.3160400390625e-05, 'time_algorithm_update': 0.0031962871551513674, 'loss': 6.294850432872773, 'rollout_return': 195.2, 'evaluation': 181.7} step=769000
2023-01-09 22:51.03 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_770000.pt
2023-01-09 22:51.03 [info     ] DQN_online_20230109224541: epoch=770 step=770000 epoch=770 metrics={'time_inference': 0.00018793582916259765, 'time_environment_step': 1.20697021484375e-05, 'time_step': 0.0002460887432098389, 'time_sample_batch': 8.33749771118164e-05, 'time_algorithm_update': 0.003168678283691406, 'loss': 7.010326528549195, 'rollout_return': 198.8,

2023-01-09 22:51.10 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_785000.pt
2023-01-09 22:51.10 [info     ] DQN_online_20230109224541: epoch=785 step=785000 epoch=785 metrics={'time_inference': 0.0001865572929382324, 'time_environment_step': 1.193547248840332e-05, 'time_step': 0.00024269962310791016, 'time_sample_batch': 8.473396301269531e-05, 'time_algorithm_update': 0.0029911041259765626, 'loss': 7.725831365585327, 'rollout_return': 197.8, 'evaluation': 186.3} step=785000
2023-01-09 22:51.11 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_786000.pt
2023-01-09 22:51.11 [info     ] DQN_online_20230109224541: epoch=786 step=786000 epoch=786 metrics={'time_inference': 0.0001882476806640625, 'time_environment_step': 1.2068510055541992e-05, 'time_step': 0.00024599242210388185, 'time_sample_batch': 0.00011603832244873047, 'time_algorithm_update': 0.003077530860900879, 'loss': 8.369243550300599, 'rollout_return': 1

2023-01-09 22:51.18 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_801000.pt
2023-01-09 22:51.18 [info     ] DQN_online_20230109224541: epoch=801 step=801000 epoch=801 metrics={'time_inference': 0.0001953155994415283, 'time_environment_step': 1.2665748596191407e-05, 'time_step': 0.00025412869453430176, 'time_sample_batch': 8.25643539428711e-05, 'time_algorithm_update': 0.0031060695648193358, 'loss': 8.483943581581116, 'rollout_return': 193.0, 'evaluation': 179.8} step=801000
2023-01-09 22:51.19 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_802000.pt
2023-01-09 22:51.19 [info     ] DQN_online_20230109224541: epoch=802 step=802000 epoch=802 metrics={'time_inference': 0.00018849396705627441, 'time_environment_step': 1.2194156646728516e-05, 'time_step': 0.00024642181396484375, 'time_sample_batch': 8.897781372070312e-05, 'time_algorithm_update': 0.003117871284484863, 'loss': 7.4190352201461796, 'rollout_return': 

2023-01-09 22:51.26 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_817000.pt
2023-01-09 22:51.26 [info     ] DQN_online_20230109224541: epoch=817 step=817000 epoch=817 metrics={'time_inference': 0.000186049222946167, 'time_environment_step': 1.1985540390014649e-05, 'time_step': 0.00024498105049133303, 'time_sample_batch': 8.854866027832031e-05, 'time_algorithm_update': 0.0032550573348999025, 'loss': 8.096106100082398, 'rollout_return': 200.0, 'evaluation': 190.5} step=817000
2023-01-09 22:51.26 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_818000.pt
2023-01-09 22:51.26 [info     ] DQN_online_20230109224541: epoch=818 step=818000 epoch=818 metrics={'time_inference': 0.0001834733486175537, 'time_environment_step': 1.1785745620727539e-05, 'time_step': 0.00024051880836486816, 'time_sample_batch': 8.554458618164063e-05, 'time_algorithm_update': 0.003106856346130371, 'loss': 6.05350694656372, 'rollout_return': 199

2023-01-09 22:51.33 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_833000.pt
2023-01-09 22:51.33 [info     ] DQN_online_20230109224541: epoch=833 step=833000 epoch=833 metrics={'time_inference': 0.00018460822105407715, 'time_environment_step': 1.1867761611938477e-05, 'time_step': 0.0002428441047668457, 'rollout_return': 200.0, 'time_sample_batch': 8.749961853027344e-05, 'time_algorithm_update': 0.003200054168701172, 'loss': 9.041747283935546, 'evaluation': 190.7} step=833000
2023-01-09 22:51.34 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_834000.pt
2023-01-09 22:51.34 [info     ] DQN_online_20230109224541: epoch=834 step=834000 epoch=834 metrics={'time_inference': 0.00018550467491149902, 'time_environment_step': 1.1857032775878906e-05, 'time_step': 0.00024280142784118653, 'rollout_return': 196.5, 'time_sample_batch': 8.208751678466797e-05, 'time_algorithm_update': 0.003112387657165527, 'loss': 6.67336270809

2023-01-09 22:51.41 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_849000.pt
2023-01-09 22:51.41 [info     ] DQN_online_20230109224541: epoch=849 step=849000 epoch=849 metrics={'time_inference': 0.00018665361404418944, 'time_environment_step': 1.1910676956176758e-05, 'time_step': 0.000244340181350708, 'rollout_return': 194.2, 'time_sample_batch': 9.527206420898438e-05, 'time_algorithm_update': 0.0031368255615234373, 'loss': 5.152860760688782, 'evaluation': 184.7} step=849000
2023-01-09 22:51.41 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_850000.pt
2023-01-09 22:51.41 [info     ] DQN_online_20230109224541: epoch=850 step=850000 epoch=850 metrics={'time_inference': 0.00019010281562805177, 'time_environment_step': 1.2095212936401368e-05, 'time_step': 0.00024737930297851564, 'rollout_return': 200.0, 'time_sample_batch': 9.026527404785157e-05, 'time_algorithm_update': 0.0030548810958862306, 'loss': 7.0590854644

2023-01-09 22:51.49 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_865000.pt
2023-01-09 22:51.49 [info     ] DQN_online_20230109224541: epoch=865 step=865000 epoch=865 metrics={'time_inference': 0.0001890571117401123, 'time_environment_step': 1.2085199356079102e-05, 'time_step': 0.0002462668418884277, 'rollout_return': 193.2, 'time_sample_batch': 8.556842803955078e-05, 'time_algorithm_update': 0.0030519962310791016, 'loss': 6.821307873725891, 'evaluation': 185.2} step=865000
2023-01-09 22:51.49 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_866000.pt
2023-01-09 22:51.49 [info     ] DQN_online_20230109224541: epoch=866 step=866000 epoch=866 metrics={'time_inference': 0.0001896500587463379, 'time_environment_step': 1.1989355087280273e-05, 'time_step': 0.00024669408798217774, 'rollout_return': 196.2, 'time_sample_batch': 8.461475372314453e-05, 'time_algorithm_update': 0.0030107021331787108, 'loss': 5.68895900249

2023-01-09 22:51.56 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_881000.pt
2023-01-09 22:51.56 [info     ] DQN_online_20230109224541: epoch=881 step=881000 epoch=881 metrics={'time_inference': 0.00019763684272766112, 'time_environment_step': 1.2772321701049804e-05, 'time_step': 0.0002580442428588867, 'time_sample_batch': 8.740425109863282e-05, 'time_algorithm_update': 0.003233003616333008, 'loss': 8.201733160018922, 'rollout_return': 200.0, 'evaluation': 176.8} step=881000
2023-01-09 22:51.57 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_882000.pt
2023-01-09 22:51.57 [info     ] DQN_online_20230109224541: epoch=882 step=882000 epoch=882 metrics={'time_inference': 0.00020188021659851073, 'time_environment_step': 1.2864589691162109e-05, 'time_step': 0.00026397323608398437, 'time_sample_batch': 8.845329284667969e-05, 'time_algorithm_update': 0.0033812522888183594, 'loss': 6.41004753112793, 'rollout_return': 1

2023-01-09 22:52.04 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_897000.pt
2023-01-09 22:52.04 [info     ] DQN_online_20230109224541: epoch=897 step=897000 epoch=897 metrics={'time_inference': 0.00018625307083129883, 'time_environment_step': 1.2117147445678711e-05, 'time_step': 0.0002431948184967041, 'rollout_return': 200.0, 'time_sample_batch': 8.051395416259765e-05, 'time_algorithm_update': 0.0030629873275756837, 'loss': 9.85318489074707, 'evaluation': 181.4} step=897000
2023-01-09 22:52.05 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_898000.pt
2023-01-09 22:52.05 [info     ] DQN_online_20230109224541: epoch=898 step=898000 epoch=898 metrics={'time_inference': 0.00018366241455078125, 'time_environment_step': 1.1672258377075195e-05, 'time_step': 0.00024160146713256836, 'rollout_return': 199.2, 'time_sample_batch': 9.496212005615235e-05, 'time_algorithm_update': 0.0032076120376586916, 'loss': 6.3229380846

2023-01-09 22:52.12 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_913000.pt
2023-01-09 22:52.12 [info     ] DQN_online_20230109224541: epoch=913 step=913000 epoch=913 metrics={'time_inference': 0.00018874359130859376, 'time_environment_step': 1.1996030807495117e-05, 'time_step': 0.00024907565116882326, 'time_sample_batch': 9.603500366210938e-05, 'time_algorithm_update': 0.0033780336380004883, 'loss': 7.101761698722839, 'rollout_return': 200.0, 'evaluation': 186.1} step=913000
2023-01-09 22:52.12 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_914000.pt
2023-01-09 22:52.12 [info     ] DQN_online_20230109224541: epoch=914 step=914000 epoch=914 metrics={'time_inference': 0.0001868126392364502, 'time_environment_step': 1.1778354644775391e-05, 'time_step': 0.00024495697021484374, 'time_sample_batch': 8.211135864257812e-05, 'time_algorithm_update': 0.0032195329666137697, 'loss': 9.631559896469117, 'rollout_return':

2023-01-09 22:52.20 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_929000.pt
2023-01-09 22:52.20 [info     ] DQN_online_20230109224541: epoch=929 step=929000 epoch=929 metrics={'time_inference': 0.00018801641464233397, 'time_environment_step': 1.2455940246582031e-05, 'time_step': 0.0002459297180175781, 'time_sample_batch': 8.003711700439453e-05, 'time_algorithm_update': 0.0031229257583618164, 'loss': 10.175838565826416, 'rollout_return': 192.0, 'evaluation': 191.6} step=929000
2023-01-09 22:52.20 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_930000.pt
2023-01-09 22:52.20 [info     ] DQN_online_20230109224541: epoch=930 step=930000 epoch=930 metrics={'time_inference': 0.00019049906730651855, 'time_environment_step': 1.2019634246826172e-05, 'time_step': 0.00024882054328918455, 'rollout_return': 200.0, 'time_sample_batch': 8.578300476074219e-05, 'time_algorithm_update': 0.0031827688217163086, 'loss': 7.68853971

2023-01-09 22:52.27 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_945000.pt
2023-01-09 22:52.27 [info     ] DQN_online_20230109224541: epoch=945 step=945000 epoch=945 metrics={'time_inference': 0.0002013373374938965, 'time_environment_step': 1.2864112854003907e-05, 'time_step': 0.0002645902633666992, 'rollout_return': 198.5, 'time_sample_batch': 8.997917175292968e-05, 'time_algorithm_update': 0.0034978389739990234, 'loss': 7.626542234420777, 'evaluation': 189.0} step=945000
2023-01-09 22:52.28 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_946000.pt
2023-01-09 22:52.28 [info     ] DQN_online_20230109224541: epoch=946 step=946000 epoch=946 metrics={'time_inference': 0.00020232605934143065, 'time_environment_step': 1.2772321701049804e-05, 'time_step': 0.00026475286483764647, 'time_sample_batch': 8.873939514160156e-05, 'time_algorithm_update': 0.0034337759017944334, 'loss': 5.67203848361969, 'rollout_return': 1

2023-01-09 22:52.35 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_961000.pt
2023-01-09 22:52.35 [info     ] DQN_online_20230109224541: epoch=961 step=961000 epoch=961 metrics={'time_inference': 0.00018981719017028807, 'time_environment_step': 1.1979818344116211e-05, 'time_step': 0.00024959588050842287, 'rollout_return': 194.2, 'time_sample_batch': 8.3160400390625e-05, 'time_algorithm_update': 0.0033428430557250976, 'loss': 8.811341953277587, 'evaluation': 198.3} step=961000
2023-01-09 22:52.36 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_962000.pt
2023-01-09 22:52.36 [info     ] DQN_online_20230109224541: epoch=962 step=962000 epoch=962 metrics={'time_inference': 0.00019053053855895997, 'time_environment_step': 1.1989116668701172e-05, 'time_step': 0.0002504019737243652, 'rollout_return': 200.0, 'time_sample_batch': 8.35418701171875e-05, 'time_algorithm_update': 0.003353738784790039, 'loss': 10.179013895988

2023-01-09 22:52.43 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_977000.pt
2023-01-09 22:52.43 [info     ] DQN_online_20230109224541: epoch=977 step=977000 epoch=977 metrics={'time_inference': 0.00019168233871459962, 'time_environment_step': 1.2108325958251953e-05, 'time_step': 0.000252932071685791, 'time_sample_batch': 8.50677490234375e-05, 'time_algorithm_update': 0.0034669876098632813, 'loss': 7.352555441856384, 'rollout_return': 196.2, 'evaluation': 189.1} step=977000
2023-01-09 22:52.44 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_978000.pt
2023-01-09 22:52.44 [info     ] DQN_online_20230109224541: epoch=978 step=978000 epoch=978 metrics={'time_inference': 0.00018972635269165039, 'time_environment_step': 1.1942148208618164e-05, 'time_step': 0.00025107479095458986, 'rollout_return': 187.4, 'time_sample_batch': 0.00010101795196533203, 'time_algorithm_update': 0.0034870147705078126, 'loss': 6.3159824013

2023-01-09 22:52.51 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_993000.pt
2023-01-09 22:52.51 [info     ] DQN_online_20230109224541: epoch=993 step=993000 epoch=993 metrics={'time_inference': 0.00022044730186462403, 'time_environment_step': 1.3932943344116211e-05, 'time_step': 0.0002915012836456299, 'time_sample_batch': 9.78708267211914e-05, 'time_algorithm_update': 0.0039763212203979496, 'loss': 7.558268451690674, 'rollout_return': 190.2, 'evaluation': 188.0} step=993000
2023-01-09 22:52.52 [info     ] Model parameters are saved to d3rlpy_logs/DQN_online_20230109224541/model_994000.pt
2023-01-09 22:52.52 [info     ] DQN_online_20230109224541: epoch=994 step=994000 epoch=994 metrics={'time_inference': 0.00021530652046203614, 'time_environment_step': 1.361846923828125e-05, 'time_step': 0.00028557586669921873, 'rollout_return': 191.4, 'time_sample_batch': 9.989738464355469e-05, 'time_algorithm_update': 0.00401601791381836, 'loss': 7.0261929988861

# Double DQN

In [None]:
# setup environment
# training env
env = gym.make('CartPole-v0')
# evaluation env
eval_env = gym.make('CartPole-v0')

In [None]:
# modify weight decay
optim_factory = OptimizerFactory(Adam, weight_decay=1e-4)
# setup algorithm
ddqn = DoubleDQN(
    batch_size=32,  # number of batches
    learning_rate=2.5e-4,  # learning rate
    target_update_interval=100,  # interval to synchronize the target network
    n_steps=1,  # N-step TD calculation
    optim_factory=optim_factory,  # optimizer
)

In [None]:
# setup replay buffer
buffer = ReplayBuffer(maxlen=1000000, env=env)

In [None]:
explorer = LinearDecayEpsilonGreedy(start_epsilon=1.0,
                                    end_epsilon=0.1,
                                    duration=10000)
#explorer = NormalNoise(mean= 0, std=0.1)

In [None]:
ddqn.fit_online(
    env,  # environment
    buffer,  # buffer
    explorer=explorer,  # buffer
    eval_env=eval_env,  # eval environment
    n_steps_per_epoch=1000,  # the number of steps per epoch.
    update_interval=100,
    eval_epsilon=0.3,
    save_metrics=True,
    tensorboard_dir="runs",
)

2023-01-09 22:54.06 [info     ] Directory is created at d3rlpy_logs/DoubleDQN_online_20230109225406
2023-01-09 22:54.06 [debug    ] Building model...
2023-01-09 22:54.06 [debug    ] Model has been built.
2023-01-09 22:54.06 [info     ] Parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/params.json params={'action_scaler': None, 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 0.99, 'generated_maxlen': 100000, 'learning_rate': 0.00025, 'n_critics': 1, 'n_frames': 1, 'n_steps': 1, 'optim_factory': {'optim_cls': 'Adam', 'weight_decay': 0.0001}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'real_ratio': 1.0, 'reward_scaler': None, 'scaler': None, 'target_update_interval': 100, 'use_gpu': None, 'algorithm': 'DoubleDQN', 'observation_shape': (4,), 'action_size': 2}


  0%|          | 0/1000000 [00:00<?, ?it/s]

2023-01-09 22:54.07 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_1000.pt
2023-01-09 22:54.07 [info     ] DoubleDQN_online_20230109225406: epoch=1 step=1000 epoch=1 metrics={'time_inference': 0.0001834700107574463, 'time_environment_step': 1.2191295623779297e-05, 'time_step': 0.00022473311424255372, 'rollout_return': 21.456521739130434, 'time_sample_batch': 6.0439109802246094e-05, 'time_algorithm_update': 0.001414179801940918, 'loss': 0.4499122738838196, 'evaluation': 17.9} step=1000
2023-01-09 22:54.07 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_2000.pt
2023-01-09 22:54.07 [info     ] DoubleDQN_online_20230109225406: epoch=2 step=2000 epoch=2 metrics={'time_inference': 0.0001723635196685791, 'time_environment_step': 1.1724472045898438e-05, 'time_step': 0.00021134018898010253, 'rollout_return': 22.931818181818183, 'time_sample_batch': 5.958080291748047e-05, 'time_algorithm_update': 0.001273870

2023-01-09 22:54.11 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_17000.pt
2023-01-09 22:54.11 [info     ] DoubleDQN_online_20230109225406: epoch=17 step=17000 epoch=17 metrics={'time_inference': 0.0001748044490814209, 'time_environment_step': 1.2066841125488282e-05, 'time_step': 0.00021494364738464355, 'rollout_return': 12.717948717948717, 'time_sample_batch': 6.334781646728515e-05, 'time_algorithm_update': 0.0013026714324951172, 'loss': 0.05499728135764599, 'evaluation': 13.3} step=17000
2023-01-09 22:54.11 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_18000.pt
2023-01-09 22:54.11 [info     ] DoubleDQN_online_20230109225406: epoch=18 step=18000 epoch=18 metrics={'time_inference': 0.00018171048164367676, 'time_environment_step': 1.2233257293701172e-05, 'time_step': 0.00022275471687316896, 'rollout_return': 11.952380952380953, 'time_sample_batch': 7.030963897705078e-05, 'time_algorithm_update': 

2023-01-09 22:54.14 [info     ] DoubleDQN_online_20230109225406: epoch=32 step=32000 epoch=32 metrics={'time_inference': 0.00017589497566223143, 'time_environment_step': 1.2144565582275391e-05, 'time_step': 0.00021594095230102539, 'rollout_return': 9.92079207920792, 'time_sample_batch': 6.685256958007812e-05, 'time_algorithm_update': 0.0012719869613647462, 'loss': 0.2401246875524521, 'evaluation': 12.5} step=32000
2023-01-09 22:54.15 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_33000.pt
2023-01-09 22:54.15 [info     ] DoubleDQN_online_20230109225406: epoch=33 step=33000 epoch=33 metrics={'time_inference': 0.0001855642795562744, 'time_environment_step': 1.2236595153808593e-05, 'time_step': 0.00022681236267089845, 'rollout_return': 15.80952380952381, 'time_sample_batch': 7.014274597167968e-05, 'time_algorithm_update': 0.0013564825057983398, 'loss': 0.22410463392734528, 'evaluation': 20.5} step=33000
2023-01-09 22:54.15 [info     ] Model para

2023-01-09 22:54.19 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_48000.pt
2023-01-09 22:54.19 [info     ] DoubleDQN_online_20230109225406: epoch=48 step=48000 epoch=48 metrics={'time_inference': 0.00017384624481201172, 'time_environment_step': 1.1805772781372071e-05, 'time_step': 0.00021266841888427734, 'rollout_return': 61.11764705882353, 'time_sample_batch': 6.618499755859376e-05, 'time_algorithm_update': 0.0012651920318603516, 'loss': 0.1872604914009571, 'evaluation': 71.0} step=48000
2023-01-09 22:54.19 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_49000.pt
2023-01-09 22:54.19 [info     ] DoubleDQN_online_20230109225406: epoch=49 step=49000 epoch=49 metrics={'time_inference': 0.00017776083946228027, 'time_environment_step': 1.1902570724487305e-05, 'time_step': 0.00021811246871948243, 'rollout_return': 44.56521739130435, 'time_sample_batch': 8.335113525390626e-05, 'time_algorithm_update': 0.

2023-01-09 22:54.23 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_64000.pt
2023-01-09 22:54.23 [info     ] DoubleDQN_online_20230109225406: epoch=64 step=64000 epoch=64 metrics={'time_inference': 0.00017260360717773437, 'time_environment_step': 1.1660337448120116e-05, 'time_step': 0.00021242427825927734, 'rollout_return': 46.23809523809524, 'time_sample_batch': 8.404254913330078e-05, 'time_algorithm_update': 0.0013854265213012694, 'loss': 0.25447371080517767, 'evaluation': 33.2} step=64000
2023-01-09 22:54.24 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_65000.pt
2023-01-09 22:54.24 [info     ] DoubleDQN_online_20230109225406: epoch=65 step=65000 epoch=65 metrics={'time_inference': 0.00017548775672912597, 'time_environment_step': 1.1896133422851563e-05, 'time_step': 0.000214139461517334, 'rollout_return': 47.80952380952381, 'time_sample_batch': 6.821155548095704e-05, 'time_algorithm_update': 0.0

2023-01-09 22:54.29 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_80000.pt
2023-01-09 22:54.29 [info     ] DoubleDQN_online_20230109225406: epoch=80 step=80000 epoch=80 metrics={'time_inference': 0.0001741957664489746, 'time_environment_step': 1.1846065521240234e-05, 'time_step': 0.00021291685104370117, 'rollout_return': 102.77777777777777, 'time_sample_batch': 6.694793701171875e-05, 'time_algorithm_update': 0.0012774944305419921, 'loss': 0.2869940623641014, 'evaluation': 100.7} step=80000
2023-01-09 22:54.29 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_81000.pt
2023-01-09 22:54.29 [info     ] DoubleDQN_online_20230109225406: epoch=81 step=81000 epoch=81 metrics={'time_inference': 0.00018294239044189454, 'time_environment_step': 1.2303352355957031e-05, 'time_step': 0.00022377586364746092, 'rollout_return': 146.28571428571428, 'time_sample_batch': 9.1552734375e-05, 'time_algorithm_update': 0.001

2023-01-09 22:54.34 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_96000.pt
2023-01-09 22:54.34 [info     ] DoubleDQN_online_20230109225406: epoch=96 step=96000 epoch=96 metrics={'time_inference': 0.00018268346786499024, 'time_environment_step': 1.230764389038086e-05, 'time_step': 0.00022483968734741212, 'time_sample_batch': 7.803440093994141e-05, 'time_algorithm_update': 0.0014617204666137694, 'loss': 0.24570843055844308, 'rollout_return': 110.875, 'evaluation': 141.0} step=96000
2023-01-09 22:54.35 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_97000.pt
2023-01-09 22:54.35 [info     ] DoubleDQN_online_20230109225406: epoch=97 step=97000 epoch=97 metrics={'time_inference': 0.00016727757453918458, 'time_environment_step': 1.1260032653808594e-05, 'time_step': 0.00020522570610046387, 'rollout_return': 177.5, 'time_sample_batch': 8.404254913330078e-05, 'time_algorithm_update': 0.0013089179992675781, 

2023-01-09 22:54.40 [info     ] DoubleDQN_online_20230109225406: epoch=111 step=111000 epoch=111 metrics={'time_inference': 0.00016270542144775391, 'time_environment_step': 1.0998964309692383e-05, 'time_step': 0.00019992995262145996, 'rollout_return': 168.33333333333334, 'time_sample_batch': 6.582736968994141e-05, 'time_algorithm_update': 0.001298999786376953, 'loss': 0.5854161888360977, 'evaluation': 142.9} step=111000
2023-01-09 22:54.40 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_112000.pt
2023-01-09 22:54.40 [info     ] DoubleDQN_online_20230109225406: epoch=112 step=112000 epoch=112 metrics={'time_inference': 0.0001692802906036377, 'time_environment_step': 1.145482063293457e-05, 'time_step': 0.00020653104782104493, 'time_sample_batch': 6.54458999633789e-05, 'time_algorithm_update': 0.001219773292541504, 'loss': 0.4352745324373245, 'rollout_return': 115.625, 'evaluation': 158.5} step=112000
2023-01-09 22:54.41 [info     ] Model parame

2023-01-09 22:54.46 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_127000.pt
2023-01-09 22:54.46 [info     ] DoubleDQN_online_20230109225406: epoch=127 step=127000 epoch=127 metrics={'time_inference': 0.0001672072410583496, 'time_environment_step': 1.1282920837402343e-05, 'time_step': 0.00020603060722351074, 'rollout_return': 134.28571428571428, 'time_sample_batch': 7.064342498779297e-05, 'time_algorithm_update': 0.0014024972915649414, 'loss': 0.35680549275130036, 'evaluation': 142.8} step=127000
2023-01-09 22:54.46 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_128000.pt
2023-01-09 22:54.46 [info     ] DoubleDQN_online_20230109225406: epoch=128 step=128000 epoch=128 metrics={'time_inference': 0.00016584682464599609, 'time_environment_step': 1.1247873306274414e-05, 'time_step': 0.00020542621612548828, 'time_sample_batch': 7.023811340332032e-05, 'time_algorithm_update': 0.0014914989471435547, 'loss

2023-01-09 22:54.51 [info     ] DoubleDQN_online_20230109225406: epoch=142 step=142000 epoch=142 metrics={'time_inference': 0.00016584944725036622, 'time_environment_step': 1.121687889099121e-05, 'time_step': 0.00020744585990905762, 'rollout_return': 96.22222222222223, 'time_sample_batch': 6.81161880493164e-05, 'time_algorithm_update': 0.0016906261444091797, 'loss': 0.37970680352300407, 'evaluation': 173.2} step=142000
2023-01-09 22:54.52 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_143000.pt
2023-01-09 22:54.52 [info     ] DoubleDQN_online_20230109225406: epoch=143 step=143000 epoch=143 metrics={'time_inference': 0.00016579055786132812, 'time_environment_step': 1.1279582977294922e-05, 'time_step': 0.00020574617385864258, 'rollout_return': 184.16666666666666, 'time_sample_batch': 6.573200225830078e-05, 'time_algorithm_update': 0.0015330553054809571, 'loss': 0.5885104581713676, 'evaluation': 135.0} step=143000
2023-01-09 22:54.52 [info     

2023-01-09 22:54.57 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_158000.pt
2023-01-09 22:54.57 [info     ] DoubleDQN_online_20230109225406: epoch=158 step=158000 epoch=158 metrics={'time_inference': 0.00017234230041503906, 'time_environment_step': 1.1328220367431641e-05, 'time_step': 0.00021600770950317382, 'time_sample_batch': 7.059574127197266e-05, 'time_algorithm_update': 0.001881718635559082, 'loss': 0.25299255168065427, 'rollout_return': 170.0, 'evaluation': 168.3} step=158000
2023-01-09 22:54.58 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_159000.pt
2023-01-09 22:54.58 [info     ] DoubleDQN_online_20230109225406: epoch=159 step=159000 epoch=159 metrics={'time_inference': 0.00017511367797851563, 'time_environment_step': 1.1596918106079101e-05, 'time_step': 0.0002188851833343506, 'time_sample_batch': 6.630420684814454e-05, 'time_algorithm_update': 0.0018406391143798828, 'loss': 0.499352237

2023-01-09 22:55.03 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_174000.pt
2023-01-09 22:55.03 [info     ] DoubleDQN_online_20230109225406: epoch=174 step=174000 epoch=174 metrics={'time_inference': 0.0001713094711303711, 'time_environment_step': 1.1303424835205077e-05, 'time_step': 0.00021666717529296876, 'rollout_return': 150.5, 'time_sample_batch': 7.131099700927735e-05, 'time_algorithm_update': 0.0020457983016967775, 'loss': 0.2543452698737383, 'evaluation': 171.6} step=174000
2023-01-09 22:55.04 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_175000.pt
2023-01-09 22:55.04 [info     ] DoubleDQN_online_20230109225406: epoch=175 step=175000 epoch=175 metrics={'time_inference': 0.00017431974411010742, 'time_environment_step': 1.1474609375e-05, 'time_step': 0.00021924996376037598, 'rollout_return': 130.11111111111111, 'time_sample_batch': 6.642341613769532e-05, 'time_algorithm_update': 0.00197598

2023-01-09 22:55.10 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_190000.pt
2023-01-09 22:55.10 [info     ] DoubleDQN_online_20230109225406: epoch=190 step=190000 epoch=190 metrics={'time_inference': 0.0001721487045288086, 'time_environment_step': 1.1313199996948242e-05, 'time_step': 0.0002191143035888672, 'rollout_return': 158.14285714285714, 'time_sample_batch': 6.577968597412109e-05, 'time_algorithm_update': 0.0021506309509277343, 'loss': 0.6982223735190928, 'evaluation': 157.4} step=190000
2023-01-09 22:55.10 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_191000.pt
2023-01-09 22:55.10 [info     ] DoubleDQN_online_20230109225406: epoch=191 step=191000 epoch=191 metrics={'time_inference': 0.0001737658977508545, 'time_environment_step': 1.1350631713867187e-05, 'time_step': 0.00021950674057006836, 'time_sample_batch': 6.966590881347656e-05, 'time_algorithm_update': 0.0020839691162109373, 'loss': 

2023-01-09 22:55.15 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_206000.pt
2023-01-09 22:55.15 [info     ] DoubleDQN_online_20230109225406: epoch=206 step=206000 epoch=206 metrics={'time_inference': 0.0001700584888458252, 'time_environment_step': 1.1331796646118165e-05, 'time_step': 0.00021622157096862793, 'rollout_return': 118.11111111111111, 'time_sample_batch': 8.080005645751953e-05, 'time_algorithm_update': 0.0021183013916015623, 'loss': 0.35475118961185215, 'evaluation': 88.4} step=206000
2023-01-09 22:55.16 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_207000.pt
2023-01-09 22:55.16 [info     ] DoubleDQN_online_20230109225406: epoch=207 step=207000 epoch=207 metrics={'time_inference': 0.00017197537422180176, 'time_environment_step': 1.138162612915039e-05, 'time_step': 0.0002172813415527344, 'rollout_return': 120.0, 'time_sample_batch': 7.593631744384766e-05, 'time_algorithm_update': 0.0020

2023-01-09 22:55.21 [info     ] DoubleDQN_online_20230109225406: epoch=221 step=221000 epoch=221 metrics={'time_inference': 0.00016941022872924805, 'time_environment_step': 1.1220693588256837e-05, 'time_step': 0.00021559906005859375, 'rollout_return': 178.5, 'time_sample_batch': 7.796287536621094e-05, 'time_algorithm_update': 0.0021429538726806642, 'loss': 0.6222825072705745, 'evaluation': 162.1} step=221000
2023-01-09 22:55.21 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_222000.pt
2023-01-09 22:55.21 [info     ] DoubleDQN_online_20230109225406: epoch=222 step=222000 epoch=222 metrics={'time_inference': 0.0001786038875579834, 'time_environment_step': 1.1501312255859375e-05, 'time_step': 0.00022681689262390138, 'rollout_return': 140.85714285714286, 'time_sample_batch': 6.911754608154296e-05, 'time_algorithm_update': 0.0022749900817871094, 'loss': 0.6290378648787737, 'evaluation': 140.1} step=222000
2023-01-09 22:55.22 [info     ] Model para

2023-01-09 22:55.28 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_237000.pt
2023-01-09 22:55.28 [info     ] DoubleDQN_online_20230109225406: epoch=237 step=237000 epoch=237 metrics={'time_inference': 0.00017371821403503417, 'time_environment_step': 1.1282682418823242e-05, 'time_step': 0.00022148561477661132, 'time_sample_batch': 8.013248443603516e-05, 'time_algorithm_update': 0.002287149429321289, 'loss': 0.46875875247642396, 'rollout_return': 121.25, 'evaluation': 151.9} step=237000
2023-01-09 22:55.28 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_238000.pt
2023-01-09 22:55.28 [info     ] DoubleDQN_online_20230109225406: epoch=238 step=238000 epoch=238 metrics={'time_inference': 0.00017397046089172362, 'time_environment_step': 1.1312484741210938e-05, 'time_step': 0.00022243881225585937, 'rollout_return': 168.16666666666666, 'time_sample_batch': 8.318424224853515e-05, 'time_algorithm_update': 0.

2023-01-09 22:55.34 [info     ] DoubleDQN_online_20230109225406: epoch=252 step=252000 epoch=252 metrics={'time_inference': 0.00017683053016662599, 'time_environment_step': 1.1513710021972657e-05, 'time_step': 0.00022707128524780272, 'time_sample_batch': 9.26971435546875e-05, 'time_algorithm_update': 0.0024543523788452147, 'loss': 0.5353903338313103, 'rollout_return': 158.5, 'evaluation': 151.0} step=252000
2023-01-09 22:55.34 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_253000.pt
2023-01-09 22:55.34 [info     ] DoubleDQN_online_20230109225406: epoch=253 step=253000 epoch=253 metrics={'time_inference': 0.0001790950298309326, 'time_environment_step': 1.1471271514892577e-05, 'time_step': 0.00022929978370666505, 'time_sample_batch': 8.199214935302734e-05, 'time_algorithm_update': 0.0024645090103149413, 'loss': 0.5239745449274779, 'rollout_return': 128.875, 'evaluation': 102.9} step=253000
2023-01-09 22:55.34 [info     ] Model parameters are s

2023-01-09 22:55.40 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_268000.pt
2023-01-09 22:55.40 [info     ] DoubleDQN_online_20230109225406: epoch=268 step=268000 epoch=268 metrics={'time_inference': 0.00017607522010803223, 'time_environment_step': 1.1337518692016601e-05, 'time_step': 0.00022650718688964843, 'time_sample_batch': 7.495880126953124e-05, 'time_algorithm_update': 0.0025276899337768554, 'loss': 0.8023725837469101, 'rollout_return': 111.33333333333333, 'evaluation': 137.8} step=268000
2023-01-09 22:55.40 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_269000.pt
2023-01-09 22:55.40 [info     ] DoubleDQN_online_20230109225406: epoch=269 step=269000 epoch=269 metrics={'time_inference': 0.00017572379112243652, 'time_environment_step': 1.1329889297485352e-05, 'time_step': 0.00022580933570861818, 'rollout_return': 153.5, 'time_sample_batch': 6.632804870605468e-05, 'time_algorithm_update': 0.0

2023-01-09 22:55.46 [info     ] DoubleDQN_online_20230109225406: epoch=283 step=283000 epoch=283 metrics={'time_inference': 0.00018201327323913574, 'time_environment_step': 1.167917251586914e-05, 'time_step': 0.0002335209846496582, 'rollout_return': 117.44444444444444, 'time_sample_batch': 7.839202880859375e-05, 'time_algorithm_update': 0.002547430992126465, 'loss': 0.3397279029712081, 'evaluation': 111.3} step=283000
2023-01-09 22:55.46 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_284000.pt
2023-01-09 22:55.46 [info     ] DoubleDQN_online_20230109225406: epoch=284 step=284000 epoch=284 metrics={'time_inference': 0.00017620921134948732, 'time_environment_step': 1.132369041442871e-05, 'time_step': 0.0002269136905670166, 'time_sample_batch': 6.959438323974609e-05, 'time_algorithm_update': 0.0025427818298339845, 'loss': 0.5449631405994296, 'rollout_return': 118.0, 'evaluation': 122.3} step=284000
2023-01-09 22:55.47 [info     ] Model paramete

2023-01-09 22:55.52 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_299000.pt
2023-01-09 22:55.52 [info     ] DoubleDQN_online_20230109225406: epoch=299 step=299000 epoch=299 metrics={'time_inference': 0.0001730334758758545, 'time_environment_step': 1.1191844940185547e-05, 'time_step': 0.00022318458557128907, 'rollout_return': 136.14285714285714, 'time_sample_batch': 8.499622344970703e-05, 'time_algorithm_update': 0.0025133609771728514, 'loss': 0.4686615177430212, 'evaluation': 142.1} step=299000
2023-01-09 22:55.52 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_300000.pt
2023-01-09 22:55.52 [info     ] DoubleDQN_online_20230109225406: epoch=300 step=300000 epoch=300 metrics={'time_inference': 0.00017715620994567872, 'time_environment_step': 1.1574029922485352e-05, 'time_step': 0.0002284843921661377, 'rollout_return': 159.14285714285714, 'time_sample_batch': 6.916522979736329e-05, 'time_algorithm_u

2023-01-09 22:55.58 [info     ] DoubleDQN_online_20230109225406: epoch=314 step=314000 epoch=314 metrics={'time_inference': 0.00017430758476257325, 'time_environment_step': 1.1842966079711914e-05, 'time_step': 0.00022485184669494628, 'rollout_return': 184.0, 'time_sample_batch': 6.809234619140626e-05, 'time_algorithm_update': 0.002508091926574707, 'loss': 0.570948700606823, 'evaluation': 153.4} step=314000
2023-01-09 22:55.58 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_315000.pt
2023-01-09 22:55.58 [info     ] DoubleDQN_online_20230109225406: epoch=315 step=315000 epoch=315 metrics={'time_inference': 0.0001769123077392578, 'time_environment_step': 1.1463642120361328e-05, 'time_step': 0.0002288548946380615, 'time_sample_batch': 7.19308853149414e-05, 'time_algorithm_update': 0.0026561260223388673, 'loss': 0.8517921142280102, 'rollout_return': 184.2, 'evaluation': 182.3} step=315000
2023-01-09 22:55.59 [info     ] Model parameters are saved 

2023-01-09 22:56.05 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_330000.pt
2023-01-09 22:56.05 [info     ] DoubleDQN_online_20230109225406: epoch=330 step=330000 epoch=330 metrics={'time_inference': 0.00017049098014831543, 'time_environment_step': 1.1139392852783203e-05, 'time_step': 0.00022173404693603515, 'rollout_return': 187.66666666666666, 'time_sample_batch': 7.240772247314453e-05, 'time_algorithm_update': 0.0026596784591674805, 'loss': 0.5500595389865339, 'evaluation': 166.9} step=330000
2023-01-09 22:56.05 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_331000.pt
2023-01-09 22:56.05 [info     ] DoubleDQN_online_20230109225406: epoch=331 step=331000 epoch=331 metrics={'time_inference': 0.0001725142002105713, 'time_environment_step': 1.1236906051635742e-05, 'time_step': 0.00022390007972717286, 'time_sample_batch': 7.369518280029297e-05, 'time_algorithm_update': 0.00259552001953125, 'loss': 

2023-01-09 22:56.12 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_346000.pt
2023-01-09 22:56.12 [info     ] DoubleDQN_online_20230109225406: epoch=346 step=346000 epoch=346 metrics={'time_inference': 0.000176849365234375, 'time_environment_step': 1.1432409286499024e-05, 'time_step': 0.00022997069358825684, 'rollout_return': 170.0, 'time_sample_batch': 7.43865966796875e-05, 'time_algorithm_update': 0.0027675867080688477, 'loss': 0.8213896572589874, 'evaluation': 165.9} step=346000
2023-01-09 22:56.12 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_347000.pt
2023-01-09 22:56.12 [info     ] DoubleDQN_online_20230109225406: epoch=347 step=347000 epoch=347 metrics={'time_inference': 0.00017622923851013183, 'time_environment_step': 1.1394023895263672e-05, 'time_step': 0.0002279174327850342, 'rollout_return': 200.0, 'time_sample_batch': 6.833076477050782e-05, 'time_algorithm_update': 0.002652335166931152

2023-01-09 22:56.19 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_362000.pt
2023-01-09 22:56.19 [info     ] DoubleDQN_online_20230109225406: epoch=362 step=362000 epoch=362 metrics={'time_inference': 0.0001873183250427246, 'time_environment_step': 1.2039899826049805e-05, 'time_step': 0.00024575209617614746, 'time_sample_batch': 7.781982421875e-05, 'time_algorithm_update': 0.0031646966934204103, 'loss': 0.42025544196367265, 'rollout_return': 185.4, 'evaluation': 174.9} step=362000
2023-01-09 22:56.19 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_363000.pt
2023-01-09 22:56.19 [info     ] DoubleDQN_online_20230109225406: epoch=363 step=363000 epoch=363 metrics={'time_inference': 0.00017615842819213866, 'time_environment_step': 1.1267423629760742e-05, 'time_step': 0.0002310194969177246, 'time_sample_batch': 8.654594421386719e-05, 'time_algorithm_update': 0.002970242500305176, 'loss': 0.7096911014989

2023-01-09 22:56.25 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_378000.pt
2023-01-09 22:56.25 [info     ] DoubleDQN_online_20230109225406: epoch=378 step=378000 epoch=378 metrics={'time_inference': 0.00018173003196716308, 'time_environment_step': 1.1602401733398438e-05, 'time_step': 0.00023711276054382325, 'rollout_return': 173.16666666666666, 'time_sample_batch': 8.590221405029297e-05, 'time_algorithm_update': 0.002958512306213379, 'loss': 0.845474766753614, 'evaluation': 150.9} step=378000
2023-01-09 22:56.26 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_379000.pt
2023-01-09 22:56.26 [info     ] DoubleDQN_online_20230109225406: epoch=379 step=379000 epoch=379 metrics={'time_inference': 0.00017982029914855957, 'time_environment_step': 1.1534690856933594e-05, 'time_step': 0.00023433971405029298, 'rollout_return': 173.16666666666666, 'time_sample_batch': 7.252693176269531e-05, 'time_algorithm_u

2023-01-09 22:56.32 [info     ] DoubleDQN_online_20230109225406: epoch=393 step=393000 epoch=393 metrics={'time_inference': 0.00018429088592529296, 'time_environment_step': 1.1422157287597657e-05, 'time_step': 0.00023982357978820801, 'time_sample_batch': 7.009506225585938e-05, 'time_algorithm_update': 0.0030017852783203124, 'loss': 0.5056823723018169, 'rollout_return': 176.4, 'evaluation': 162.7} step=393000
2023-01-09 22:56.32 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_394000.pt
2023-01-09 22:56.32 [info     ] DoubleDQN_online_20230109225406: epoch=394 step=394000 epoch=394 metrics={'time_inference': 0.00018627190589904784, 'time_environment_step': 1.153111457824707e-05, 'time_step': 0.00024191951751708984, 'rollout_return': 173.5, 'time_sample_batch': 6.818771362304688e-05, 'time_algorithm_update': 0.0029746532440185548, 'loss': 0.6493188166990876, 'evaluation': 138.2} step=394000
2023-01-09 22:56.32 [info     ] Model parameters are sa

2023-01-09 22:56.38 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_409000.pt
2023-01-09 22:56.38 [info     ] DoubleDQN_online_20230109225406: epoch=409 step=409000 epoch=409 metrics={'time_inference': 0.00018595647811889648, 'time_environment_step': 1.2073993682861329e-05, 'time_step': 0.0002431812286376953, 'rollout_return': 164.16666666666666, 'time_sample_batch': 7.717609405517578e-05, 'time_algorithm_update': 0.0030527114868164062, 'loss': 1.0718899380415678, 'evaluation': 145.7} step=409000
2023-01-09 22:56.39 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_410000.pt
2023-01-09 22:56.39 [info     ] DoubleDQN_online_20230109225406: epoch=410 step=410000 epoch=410 metrics={'time_inference': 0.00018199348449707032, 'time_environment_step': 1.1345863342285156e-05, 'time_step': 0.00023724555969238282, 'rollout_return': 148.71428571428572, 'time_sample_batch': 6.918907165527343e-05, 'time_algorithm_

2023-01-09 22:56.45 [info     ] DoubleDQN_online_20230109225406: epoch=424 step=424000 epoch=424 metrics={'time_inference': 0.00017929005622863769, 'time_environment_step': 1.1221408843994141e-05, 'time_step': 0.00023432683944702148, 'rollout_return': 171.33333333333334, 'time_sample_batch': 6.835460662841797e-05, 'time_algorithm_update': 0.002998161315917969, 'loss': 0.4653616718947887, 'evaluation': 142.4} step=424000
2023-01-09 22:56.45 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_425000.pt
2023-01-09 22:56.45 [info     ] DoubleDQN_online_20230109225406: epoch=425 step=425000 epoch=425 metrics={'time_inference': 0.00018156909942626954, 'time_environment_step': 1.1255502700805665e-05, 'time_step': 0.00023582696914672851, 'rollout_return': 167.0, 'time_sample_batch': 6.768703460693359e-05, 'time_algorithm_update': 0.0029083728790283204, 'loss': 0.7059059605002403, 'evaluation': 156.7} step=425000
2023-01-09 22:56.46 [info     ] Model para

2023-01-09 22:56.51 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_440000.pt
2023-01-09 22:56.51 [info     ] DoubleDQN_online_20230109225406: epoch=440 step=440000 epoch=440 metrics={'time_inference': 0.00018167972564697265, 'time_environment_step': 1.1267662048339843e-05, 'time_step': 0.0002385694980621338, 'time_sample_batch': 7.567405700683593e-05, 'time_algorithm_update': 0.003152298927307129, 'loss': 1.1298734309151768, 'rollout_return': 169.66666666666666, 'evaluation': 178.5} step=440000
2023-01-09 22:56.52 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_441000.pt
2023-01-09 22:56.52 [info     ] DoubleDQN_online_20230109225406: epoch=441 step=441000 epoch=441 metrics={'time_inference': 0.00018068718910217284, 'time_environment_step': 1.1218070983886719e-05, 'time_step': 0.00023588132858276366, 'time_sample_batch': 6.911754608154296e-05, 'time_algorithm_update': 0.003009939193725586, 'loss': 

2023-01-09 22:56.58 [info     ] DoubleDQN_online_20230109225406: epoch=455 step=455000 epoch=455 metrics={'time_inference': 0.00022167372703552247, 'time_environment_step': 1.3597488403320312e-05, 'time_step': 0.00028996944427490235, 'rollout_return': 160.16666666666666, 'time_sample_batch': 0.0001171112060546875, 'time_algorithm_update': 0.0037547826766967775, 'loss': 0.9910888647660613, 'evaluation': 161.0} step=455000
2023-01-09 22:56.59 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_456000.pt
2023-01-09 22:56.59 [info     ] DoubleDQN_online_20230109225406: epoch=456 step=456000 epoch=456 metrics={'time_inference': 0.00020854878425598144, 'time_environment_step': 1.304173469543457e-05, 'time_step': 0.0002731337547302246, 'rollout_return': 163.5, 'time_sample_batch': 7.915496826171875e-05, 'time_algorithm_update': 0.0035516500473022463, 'loss': 0.6410579293966293, 'evaluation': 132.4} step=456000
2023-01-09 22:56.59 [info     ] Model param

2023-01-09 22:57.06 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_471000.pt
2023-01-09 22:57.06 [info     ] DoubleDQN_online_20230109225406: epoch=471 step=471000 epoch=471 metrics={'time_inference': 0.00019556498527526856, 'time_environment_step': 1.2275457382202149e-05, 'time_step': 0.00025891995429992677, 'rollout_return': 177.6, 'time_sample_batch': 9.241104125976563e-05, 'time_algorithm_update': 0.003579449653625488, 'loss': 0.45555243268609047, 'evaluation': 150.7} step=471000
2023-01-09 22:57.06 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_472000.pt
2023-01-09 22:57.06 [info     ] DoubleDQN_online_20230109225406: epoch=472 step=472000 epoch=472 metrics={'time_inference': 0.00019313716888427735, 'time_environment_step': 1.2005329132080078e-05, 'time_step': 0.0002559831142425537, 'rollout_return': 177.0, 'time_sample_batch': 8.1634521484375e-05, 'time_algorithm_update': 0.00358362197875976

2023-01-09 22:57.14 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_487000.pt
2023-01-09 22:57.14 [info     ] DoubleDQN_online_20230109225406: epoch=487 step=487000 epoch=487 metrics={'time_inference': 0.00020304679870605468, 'time_environment_step': 1.2499570846557617e-05, 'time_step': 0.0002663581371307373, 'rollout_return': 154.66666666666666, 'time_sample_batch': 7.674694061279296e-05, 'time_algorithm_update': 0.003517341613769531, 'loss': 0.3286363694816828, 'evaluation': 158.2} step=487000
2023-01-09 22:57.14 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_488000.pt
2023-01-09 22:57.14 [info     ] DoubleDQN_online_20230109225406: epoch=488 step=488000 epoch=488 metrics={'time_inference': 0.00019619083404541016, 'time_environment_step': 1.2120962142944336e-05, 'time_step': 0.0002576971054077148, 'rollout_return': 161.66666666666666, 'time_sample_batch': 7.264614105224609e-05, 'time_algorithm_up

2023-01-09 22:57.20 [info     ] DoubleDQN_online_20230109225406: epoch=502 step=502000 epoch=502 metrics={'time_inference': 0.00019134092330932618, 'time_environment_step': 1.1913061141967773e-05, 'time_step': 0.00025443148612976074, 'rollout_return': 150.85714285714286, 'time_sample_batch': 7.74383544921875e-05, 'time_algorithm_update': 0.003635716438293457, 'loss': 0.0721893236041069, 'evaluation': 143.7} step=502000
2023-01-09 22:57.21 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_503000.pt
2023-01-09 22:57.21 [info     ] DoubleDQN_online_20230109225406: epoch=503 step=503000 epoch=503 metrics={'time_inference': 0.00019208550453186034, 'time_environment_step': 1.1795997619628906e-05, 'time_step': 0.0002525825500488281, 'time_sample_batch': 7.126331329345703e-05, 'time_algorithm_update': 0.003394746780395508, 'loss': 0.8345454435795545, 'rollout_return': 136.57142857142858, 'evaluation': 165.9} step=503000
2023-01-09 22:57.21 [info     ] 

2023-01-09 22:57.27 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_518000.pt
2023-01-09 22:57.27 [info     ] DoubleDQN_online_20230109225406: epoch=518 step=518000 epoch=518 metrics={'time_inference': 0.00018932223320007325, 'time_environment_step': 1.1705875396728515e-05, 'time_step': 0.0002513759136199951, 'rollout_return': 150.85714285714286, 'time_sample_batch': 7.967948913574218e-05, 'time_algorithm_update': 0.0035695552825927733, 'loss': 0.4310992958024144, 'evaluation': 145.9} step=518000
2023-01-09 22:57.28 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_519000.pt
2023-01-09 22:57.28 [info     ] DoubleDQN_online_20230109225406: epoch=519 step=519000 epoch=519 metrics={'time_inference': 0.00019421052932739257, 'time_environment_step': 1.198291778564453e-05, 'time_step': 0.0002583589553833008, 'time_sample_batch': 9.279251098632812e-05, 'time_algorithm_update': 0.003702688217163086, 'loss': 0

2023-01-09 22:57.34 [info     ] DoubleDQN_online_20230109225406: epoch=533 step=533000 epoch=533 metrics={'time_inference': 0.00019129228591918944, 'time_environment_step': 1.17034912109375e-05, 'time_step': 0.0002545726299285889, 'rollout_return': 149.0, 'time_sample_batch': 7.157325744628906e-05, 'time_algorithm_update': 0.0037001848220825197, 'loss': 1.0401857504621148, 'evaluation': 125.2} step=533000
2023-01-09 22:57.34 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_534000.pt
2023-01-09 22:57.34 [info     ] DoubleDQN_online_20230109225406: epoch=534 step=534000 epoch=534 metrics={'time_inference': 0.000195037841796875, 'time_environment_step': 1.2001276016235352e-05, 'time_step': 0.0002592966556549072, 'time_sample_batch': 7.295608520507812e-05, 'time_algorithm_update': 0.0037253856658935546, 'loss': 0.7844118757173419, 'rollout_return': 160.66666666666666, 'evaluation': 131.3} step=534000
2023-01-09 22:57.35 [info     ] Model parameter

2023-01-09 22:57.41 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_549000.pt
2023-01-09 22:57.41 [info     ] DoubleDQN_online_20230109225406: epoch=549 step=549000 epoch=549 metrics={'time_inference': 0.0001902918815612793, 'time_environment_step': 1.1772871017456054e-05, 'time_step': 0.0002508068084716797, 'time_sample_batch': 7.076263427734375e-05, 'time_algorithm_update': 0.003447818756103516, 'loss': 0.6773225851356983, 'rollout_return': 139.28571428571428, 'evaluation': 140.7} step=549000
2023-01-09 22:57.41 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_550000.pt
2023-01-09 22:57.41 [info     ] DoubleDQN_online_20230109225406: epoch=550 step=550000 epoch=550 metrics={'time_inference': 0.00018567395210266114, 'time_environment_step': 1.159524917602539e-05, 'time_step': 0.0002470226287841797, 'rollout_return': 146.14285714285714, 'time_sample_batch': 8.647441864013672e-05, 'time_algorithm_upda

2023-01-09 22:57.47 [info     ] DoubleDQN_online_20230109225406: epoch=564 step=564000 epoch=564 metrics={'time_inference': 0.00018286871910095214, 'time_environment_step': 1.1547327041625977e-05, 'time_step': 0.0002453842163085937, 'time_sample_batch': 7.519721984863281e-05, 'time_algorithm_update': 0.0036911249160766603, 'loss': 0.5683859489858151, 'rollout_return': 145.0, 'evaluation': 135.1} step=564000
2023-01-09 22:57.47 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_565000.pt
2023-01-09 22:57.47 [info     ] DoubleDQN_online_20230109225406: epoch=565 step=565000 epoch=565 metrics={'time_inference': 0.00018458843231201173, 'time_environment_step': 1.1697530746459962e-05, 'time_step': 0.0002458932399749756, 'time_sample_batch': 7.047653198242188e-05, 'time_algorithm_update': 0.0035421371459960936, 'loss': 1.341087720915675, 'rollout_return': 158.0, 'evaluation': 119.2} step=565000
2023-01-09 22:57.48 [info     ] Model parameters are save

2023-01-09 22:57.54 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_580000.pt
2023-01-09 22:57.54 [info     ] DoubleDQN_online_20230109225406: epoch=580 step=580000 epoch=580 metrics={'time_inference': 0.0001930539608001709, 'time_environment_step': 1.2056112289428711e-05, 'time_step': 0.0002564685344696045, 'rollout_return': 154.14285714285714, 'time_sample_batch': 7.07864761352539e-05, 'time_algorithm_update': 0.0036713361740112306, 'loss': 0.7788363974541426, 'evaluation': 103.4} step=580000
2023-01-09 22:57.54 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_581000.pt
2023-01-09 22:57.54 [info     ] DoubleDQN_online_20230109225406: epoch=581 step=581000 epoch=581 metrics={'time_inference': 0.0001921391487121582, 'time_environment_step': 1.1955499649047852e-05, 'time_step': 0.0002588768005371094, 'rollout_return': 154.33333333333334, 'time_sample_batch': 8.945465087890625e-05, 'time_algorithm_upda

2023-01-09 22:58.00 [info     ] DoubleDQN_online_20230109225406: epoch=595 step=595000 epoch=595 metrics={'time_inference': 0.00018710970878601073, 'time_environment_step': 1.170802116394043e-05, 'time_step': 0.00024972963333129883, 'rollout_return': 171.0, 'time_sample_batch': 6.973743438720703e-05, 'time_algorithm_update': 0.003671073913574219, 'loss': 0.5080807892605662, 'evaluation': 153.0} step=595000
2023-01-09 22:58.01 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_596000.pt
2023-01-09 22:58.01 [info     ] DoubleDQN_online_20230109225406: epoch=596 step=596000 epoch=596 metrics={'time_inference': 0.0001889474391937256, 'time_environment_step': 1.1782407760620116e-05, 'time_step': 0.00025212860107421873, 'rollout_return': 156.28571428571428, 'time_sample_batch': 8.172988891601563e-05, 'time_algorithm_update': 0.0036926031112670898, 'loss': 1.0396483579650522, 'evaluation': 147.2} step=596000
2023-01-09 22:58.01 [info     ] Model parame

2023-01-09 22:58.08 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_611000.pt
2023-01-09 22:58.08 [info     ] DoubleDQN_online_20230109225406: epoch=611 step=611000 epoch=611 metrics={'time_inference': 0.00019487762451171874, 'time_environment_step': 1.227426528930664e-05, 'time_step': 0.00026201844215393064, 'rollout_return': 154.14285714285714, 'time_sample_batch': 7.812976837158203e-05, 'time_algorithm_update': 0.0039881706237792965, 'loss': 0.6542205046862364, 'evaluation': 145.6} step=611000
2023-01-09 22:58.08 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_612000.pt
2023-01-09 22:58.08 [info     ] DoubleDQN_online_20230109225406: epoch=612 step=612000 epoch=612 metrics={'time_inference': 0.00020506525039672851, 'time_environment_step': 1.2714624404907227e-05, 'time_step': 0.00027333784103393554, 'rollout_return': 167.83333333333334, 'time_sample_batch': 8.959770202636719e-05, 'time_algorithm_

2023-01-09 22:58.15 [info     ] DoubleDQN_online_20230109225406: epoch=626 step=626000 epoch=626 metrics={'time_inference': 0.00020022201538085938, 'time_environment_step': 1.2339591979980469e-05, 'time_step': 0.00026888179779052733, 'time_sample_batch': 8.988380432128906e-05, 'time_algorithm_update': 0.004111933708190918, 'loss': 0.7543365923687816, 'rollout_return': 141.0, 'evaluation': 152.5} step=626000
2023-01-09 22:58.15 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_627000.pt
2023-01-09 22:58.15 [info     ] DoubleDQN_online_20230109225406: epoch=627 step=627000 epoch=627 metrics={'time_inference': 0.00020016837120056153, 'time_environment_step': 1.233053207397461e-05, 'time_step': 0.00026897263526916504, 'rollout_return': 154.14285714285714, 'time_sample_batch': 8.790493011474609e-05, 'time_algorithm_update': 0.004132914543151856, 'loss': 0.6321686802431941, 'evaluation': 147.5} step=627000
2023-01-09 22:58.16 [info     ] Model parame

2023-01-09 22:58.22 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_642000.pt
2023-01-09 22:58.22 [info     ] DoubleDQN_online_20230109225406: epoch=642 step=642000 epoch=642 metrics={'time_inference': 0.00019532036781311034, 'time_environment_step': 1.2053966522216797e-05, 'time_step': 0.00026003098487854, 'rollout_return': 163.0, 'time_sample_batch': 7.278919219970703e-05, 'time_algorithm_update': 0.003815007209777832, 'loss': 0.508846478164196, 'evaluation': 133.9} step=642000
2023-01-09 22:58.22 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_643000.pt
2023-01-09 22:58.22 [info     ] DoubleDQN_online_20230109225406: epoch=643 step=643000 epoch=643 metrics={'time_inference': 0.00019599771499633788, 'time_environment_step': 1.207733154296875e-05, 'time_step': 0.0002620189189910889, 'rollout_return': 156.28571428571428, 'time_sample_batch': 7.562637329101562e-05, 'time_algorithm_update': 0.00392608

2023-01-09 22:58.29 [info     ] DoubleDQN_online_20230109225406: epoch=657 step=657000 epoch=657 metrics={'time_inference': 0.00019380640983581544, 'time_environment_step': 1.1885643005371093e-05, 'time_step': 0.0002604374885559082, 'time_sample_batch': 7.734298706054687e-05, 'time_algorithm_update': 0.004033088684082031, 'loss': 0.4213420182466507, 'rollout_return': 164.0, 'evaluation': 155.1} step=657000
2023-01-09 22:58.29 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_658000.pt
2023-01-09 22:58.29 [info     ] DoubleDQN_online_20230109225406: epoch=658 step=658000 epoch=658 metrics={'time_inference': 0.00019467306137084962, 'time_environment_step': 1.188945770263672e-05, 'time_step': 0.00026022124290466306, 'time_sample_batch': 7.212162017822266e-05, 'time_algorithm_update': 0.003912115097045898, 'loss': 0.36262237932533026, 'rollout_return': 152.83333333333334, 'evaluation': 143.4} step=658000
2023-01-09 22:58.30 [info     ] Model parame

2023-01-09 22:58.36 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_673000.pt
2023-01-09 22:58.36 [info     ] DoubleDQN_online_20230109225406: epoch=673 step=673000 epoch=673 metrics={'time_inference': 0.00019285845756530763, 'time_environment_step': 1.1907100677490234e-05, 'time_step': 0.0002560126781463623, 'time_sample_batch': 7.119178771972657e-05, 'time_algorithm_update': 0.003680729866027832, 'loss': 0.5375690955668688, 'rollout_return': 172.16666666666666, 'evaluation': 158.6} step=673000
2023-01-09 22:58.37 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_674000.pt
2023-01-09 22:58.37 [info     ] DoubleDQN_online_20230109225406: epoch=674 step=674000 epoch=674 metrics={'time_inference': 0.0001956338882446289, 'time_environment_step': 1.2117385864257812e-05, 'time_step': 0.00026147007942199706, 'rollout_return': 179.8, 'time_sample_batch': 7.443428039550781e-05, 'time_algorithm_update': 0.0039

2023-01-09 22:58.44 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_689000.pt
2023-01-09 22:58.44 [info     ] DoubleDQN_online_20230109225406: epoch=689 step=689000 epoch=689 metrics={'time_inference': 0.00020060515403747559, 'time_environment_step': 1.2336969375610351e-05, 'time_step': 0.00026615238189697267, 'rollout_return': 174.66666666666666, 'time_sample_batch': 9.329319000244141e-05, 'time_algorithm_update': 0.0038019657135009766, 'loss': 0.2116114368662238, 'evaluation': 141.6} step=689000
2023-01-09 22:58.44 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_690000.pt
2023-01-09 22:58.44 [info     ] DoubleDQN_online_20230109225406: epoch=690 step=690000 epoch=690 metrics={'time_inference': 0.00019586515426635743, 'time_environment_step': 1.2162446975708007e-05, 'time_step': 0.00025872349739074706, 'rollout_return': 169.5, 'time_sample_batch': 7.443428039550781e-05, 'time_algorithm_update': 0.0

2023-01-09 22:58.51 [info     ] DoubleDQN_online_20230109225406: epoch=704 step=704000 epoch=704 metrics={'time_inference': 0.00019565558433532715, 'time_environment_step': 1.1972188949584962e-05, 'time_step': 0.0002588653564453125, 'rollout_return': 152.71428571428572, 'time_sample_batch': 7.255077362060547e-05, 'time_algorithm_update': 0.003661966323852539, 'loss': 0.3501926245167851, 'evaluation': 144.5} step=704000
2023-01-09 22:58.51 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_705000.pt
2023-01-09 22:58.51 [info     ] DoubleDQN_online_20230109225406: epoch=705 step=705000 epoch=705 metrics={'time_inference': 0.00019406056404113768, 'time_environment_step': 1.190948486328125e-05, 'time_step': 0.0002570021152496338, 'time_sample_batch': 7.20977783203125e-05, 'time_algorithm_update': 0.0036649703979492188, 'loss': 0.4791562305763364, 'rollout_return': 162.83333333333334, 'evaluation': 139.9} step=705000
2023-01-09 22:58.52 [info     ] M

2023-01-09 22:58.58 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_720000.pt
2023-01-09 22:58.58 [info     ] DoubleDQN_online_20230109225406: epoch=720 step=720000 epoch=720 metrics={'time_inference': 0.00019185638427734374, 'time_environment_step': 1.1710882186889648e-05, 'time_step': 0.0002544431686401367, 'rollout_return': 146.28571428571428, 'time_sample_batch': 7.162094116210937e-05, 'time_algorithm_update': 0.003664374351501465, 'loss': 0.7040030913427472, 'evaluation': 130.9} step=720000
2023-01-09 22:58.58 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_721000.pt
2023-01-09 22:58.58 [info     ] DoubleDQN_online_20230109225406: epoch=721 step=721000 epoch=721 metrics={'time_inference': 0.00018934059143066407, 'time_environment_step': 1.1507987976074219e-05, 'time_step': 0.00025139641761779785, 'time_sample_batch': 7.326602935791016e-05, 'time_algorithm_update': 0.0036605358123779296, 'loss':

2023-01-09 22:59.05 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_736000.pt
2023-01-09 22:59.05 [info     ] DoubleDQN_online_20230109225406: epoch=736 step=736000 epoch=736 metrics={'time_inference': 0.0001962275505065918, 'time_environment_step': 1.2032985687255859e-05, 'time_step': 0.0002630596160888672, 'time_sample_batch': 7.593631744384766e-05, 'time_algorithm_update': 0.004020810127258301, 'loss': 0.7273302424699069, 'rollout_return': 146.5, 'evaluation': 135.6} step=736000
2023-01-09 22:59.06 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_737000.pt
2023-01-09 22:59.06 [info     ] DoubleDQN_online_20230109225406: epoch=737 step=737000 epoch=737 metrics={'time_inference': 0.00020383048057556153, 'time_environment_step': 1.2491941452026368e-05, 'time_step': 0.0002717897891998291, 'rollout_return': 160.0, 'time_sample_batch': 8.683204650878906e-05, 'time_algorithm_update': 0.004017305374145508

2023-01-09 22:59.13 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_752000.pt
2023-01-09 22:59.13 [info     ] DoubleDQN_online_20230109225406: epoch=752 step=752000 epoch=752 metrics={'time_inference': 0.00019617557525634767, 'time_environment_step': 1.2040376663208007e-05, 'time_step': 0.00026261329650878907, 'rollout_return': 181.66666666666666, 'time_sample_batch': 7.987022399902344e-05, 'time_algorithm_update': 0.003981447219848633, 'loss': 0.5092182662338018, 'evaluation': 151.6} step=752000
2023-01-09 22:59.13 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_753000.pt
2023-01-09 22:59.13 [info     ] DoubleDQN_online_20230109225406: epoch=753 step=753000 epoch=753 metrics={'time_inference': 0.00019831490516662598, 'time_environment_step': 1.2229681015014649e-05, 'time_step': 0.0002665815353393555, 'time_sample_batch': 7.472038269042968e-05, 'time_algorithm_update': 0.004130268096923828, 'loss': 

2023-01-09 22:59.20 [info     ] DoubleDQN_online_20230109225406: epoch=767 step=767000 epoch=767 metrics={'time_inference': 0.00020136618614196778, 'time_environment_step': 1.2217283248901368e-05, 'time_step': 0.0002703390121459961, 'time_sample_batch': 7.538795471191406e-05, 'time_algorithm_update': 0.004181814193725586, 'loss': 0.9038437332957983, 'rollout_return': 162.5, 'evaluation': 160.6} step=767000
2023-01-09 22:59.20 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_768000.pt
2023-01-09 22:59.20 [info     ] DoubleDQN_online_20230109225406: epoch=768 step=768000 epoch=768 metrics={'time_inference': 0.00020134663581848145, 'time_environment_step': 1.2364864349365235e-05, 'time_step': 0.00027164578437805176, 'time_sample_batch': 7.562637329101562e-05, 'time_algorithm_update': 0.0042955875396728516, 'loss': 0.501848840713501, 'rollout_return': 168.33333333333334, 'evaluation': 131.9} step=768000
2023-01-09 22:59.21 [info     ] Model parame

2023-01-09 22:59.27 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_783000.pt
2023-01-09 22:59.27 [info     ] DoubleDQN_online_20230109225406: epoch=783 step=783000 epoch=783 metrics={'time_inference': 0.00019245553016662597, 'time_environment_step': 1.1615514755249024e-05, 'time_step': 0.0002593824863433838, 'rollout_return': 150.42857142857142, 'time_sample_batch': 7.007122039794921e-05, 'time_algorithm_update': 0.00410616397857666, 'loss': 0.8959202725440264, 'evaluation': 129.7} step=783000
2023-01-09 22:59.28 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_784000.pt
2023-01-09 22:59.28 [info     ] DoubleDQN_online_20230109225406: epoch=784 step=784000 epoch=784 metrics={'time_inference': 0.00019804930686950683, 'time_environment_step': 1.201343536376953e-05, 'time_step': 0.0002665567398071289, 'time_sample_batch': 8.633136749267579e-05, 'time_algorithm_update': 0.0041522979736328125, 'loss': 1.

2023-01-09 22:59.34 [info     ] DoubleDQN_online_20230109225406: epoch=798 step=798000 epoch=798 metrics={'time_inference': 0.0001912109851837158, 'time_environment_step': 1.1546611785888672e-05, 'time_step': 0.000256361722946167, 'rollout_return': 160.85714285714286, 'time_sample_batch': 7.205009460449218e-05, 'time_algorithm_update': 0.003958225250244141, 'loss': 0.6790997139178216, 'evaluation': 139.3} step=798000
2023-01-09 22:59.35 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_799000.pt
2023-01-09 22:59.35 [info     ] DoubleDQN_online_20230109225406: epoch=799 step=799000 epoch=799 metrics={'time_inference': 0.00019247913360595704, 'time_environment_step': 1.1629104614257813e-05, 'time_step': 0.0002586636543273926, 'time_sample_batch': 8.14199447631836e-05, 'time_algorithm_update': 0.004021835327148437, 'loss': 0.5120035065338016, 'rollout_return': 163.33333333333334, 'evaluation': 125.2} step=799000
2023-01-09 22:59.35 [info     ] Mod

2023-01-09 22:59.42 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_814000.pt
2023-01-09 22:59.42 [info     ] DoubleDQN_online_20230109225406: epoch=814 step=814000 epoch=814 metrics={'time_inference': 0.00020236635208129881, 'time_environment_step': 1.205730438232422e-05, 'time_step': 0.00026912093162536623, 'rollout_return': 170.0, 'time_sample_batch': 9.577274322509765e-05, 'time_algorithm_update': 0.003975820541381836, 'loss': 0.7156571563333273, 'evaluation': 150.3} step=814000
2023-01-09 22:59.42 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_815000.pt
2023-01-09 22:59.42 [info     ] DoubleDQN_online_20230109225406: epoch=815 step=815000 epoch=815 metrics={'time_inference': 0.00019812417030334474, 'time_environment_step': 1.191854476928711e-05, 'time_step': 0.0002641661167144775, 'rollout_return': 159.33333333333334, 'time_sample_batch': 7.326602935791016e-05, 'time_algorithm_update': 0.00396

2023-01-09 22:59.49 [info     ] DoubleDQN_online_20230109225406: epoch=829 step=829000 epoch=829 metrics={'time_inference': 0.00019478750228881837, 'time_environment_step': 1.1817216873168945e-05, 'time_step': 0.0002619791030883789, 'time_sample_batch': 7.467269897460937e-05, 'time_algorithm_update': 0.004103612899780273, 'loss': 0.41383068934082984, 'rollout_return': 172.83333333333334, 'evaluation': 174.8} step=829000
2023-01-09 22:59.50 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_830000.pt
2023-01-09 22:59.50 [info     ] DoubleDQN_online_20230109225406: epoch=830 step=830000 epoch=830 metrics={'time_inference': 0.0002005941867828369, 'time_environment_step': 1.2119531631469726e-05, 'time_step': 0.0002680859565734863, 'time_sample_batch': 7.319450378417969e-05, 'time_algorithm_update': 0.00406949520111084, 'loss': 0.4335486309602857, 'rollout_return': 192.8, 'evaluation': 178.8} step=830000
2023-01-09 22:59.50 [info     ] Model paramete

2023-01-09 22:59.57 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_845000.pt
2023-01-09 22:59.57 [info     ] DoubleDQN_online_20230109225406: epoch=845 step=845000 epoch=845 metrics={'time_inference': 0.0002050924301147461, 'time_environment_step': 1.208949089050293e-05, 'time_step': 0.00027158951759338377, 'rollout_return': 165.33333333333334, 'time_sample_batch': 7.271766662597656e-05, 'time_algorithm_update': 0.003957414627075195, 'loss': 1.2584354620426894, 'evaluation': 170.6} step=845000
2023-01-09 22:59.58 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_846000.pt
2023-01-09 22:59.58 [info     ] DoubleDQN_online_20230109225406: epoch=846 step=846000 epoch=846 metrics={'time_inference': 0.00019870758056640626, 'time_environment_step': 1.1976242065429687e-05, 'time_step': 0.00026621246337890626, 'rollout_return': 169.0, 'time_sample_batch': 7.810592651367188e-05, 'time_algorithm_update': 0.0040

2023-01-09 23:00.04 [info     ] DoubleDQN_online_20230109225406: epoch=860 step=860000 epoch=860 metrics={'time_inference': 0.0002014780044555664, 'time_environment_step': 1.2218952178955078e-05, 'time_step': 0.00026797437667846677, 'time_sample_batch': 7.932186126708984e-05, 'time_algorithm_update': 0.0039321184158325195, 'loss': 0.677887829951942, 'rollout_return': 183.4, 'evaluation': 153.6} step=860000
2023-01-09 23:00.05 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_861000.pt
2023-01-09 23:00.05 [info     ] DoubleDQN_online_20230109225406: epoch=861 step=861000 epoch=861 metrics={'time_inference': 0.0002013874053955078, 'time_environment_step': 1.2162208557128906e-05, 'time_step': 0.00026565957069396974, 'rollout_return': 172.66666666666666, 'time_sample_batch': 7.369518280029297e-05, 'time_algorithm_update': 0.0037392139434814452, 'loss': 0.4398904822766781, 'evaluation': 171.5} step=861000
2023-01-09 23:00.05 [info     ] Model parame

2023-01-09 23:00.12 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_876000.pt
2023-01-09 23:00.12 [info     ] DoubleDQN_online_20230109225406: epoch=876 step=876000 epoch=876 metrics={'time_inference': 0.00019257616996765136, 'time_environment_step': 1.1947393417358398e-05, 'time_step': 0.0002541141510009766, 'rollout_return': 181.0, 'time_sample_batch': 7.221698760986328e-05, 'time_algorithm_update': 0.0035680532455444336, 'loss': 0.6973119458183646, 'evaluation': 157.8} step=876000
2023-01-09 23:00.13 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_877000.pt
2023-01-09 23:00.13 [info     ] DoubleDQN_online_20230109225406: epoch=877 step=877000 epoch=877 metrics={'time_inference': 0.00019731497764587403, 'time_environment_step': 1.182389259338379e-05, 'time_step': 0.0002594428062438965, 'rollout_return': 177.0, 'time_sample_batch': 7.076263427734375e-05, 'time_algorithm_update': 0.00360987186431884

2023-01-09 23:00.19 [info     ] DoubleDQN_online_20230109225406: epoch=891 step=891000 epoch=891 metrics={'time_inference': 0.000204512357711792, 'time_environment_step': 1.2088298797607421e-05, 'time_step': 0.0002690446376800537, 'time_sample_batch': 7.741451263427734e-05, 'time_algorithm_update': 0.003788876533508301, 'loss': 0.5383842095732689, 'rollout_return': 177.4, 'evaluation': 167.9} step=891000
2023-01-09 23:00.20 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_892000.pt
2023-01-09 23:00.20 [info     ] DoubleDQN_online_20230109225406: epoch=892 step=892000 epoch=892 metrics={'time_inference': 0.00019836044311523438, 'time_environment_step': 1.1977434158325196e-05, 'time_step': 0.0002648165225982666, 'rollout_return': 173.0, 'time_sample_batch': 7.374286651611328e-05, 'time_algorithm_update': 0.003999018669128418, 'loss': 0.5197048131376505, 'evaluation': 156.1} step=892000
2023-01-09 23:00.20 [info     ] Model parameters are saved t

2023-01-09 23:00.27 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_907000.pt
2023-01-09 23:00.27 [info     ] DoubleDQN_online_20230109225406: epoch=907 step=907000 epoch=907 metrics={'time_inference': 0.0001999397277832031, 'time_environment_step': 1.2099981307983398e-05, 'time_step': 0.0002660973072052002, 'time_sample_batch': 7.882118225097657e-05, 'time_algorithm_update': 0.0039403438568115234, 'loss': 0.06496593598276376, 'rollout_return': 162.0, 'evaluation': 148.5} step=907000
2023-01-09 23:00.27 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_908000.pt
2023-01-09 23:00.27 [info     ] DoubleDQN_online_20230109225406: epoch=908 step=908000 epoch=908 metrics={'time_inference': 0.0001964409351348877, 'time_environment_step': 1.187300682067871e-05, 'time_step': 0.00026306724548339844, 'time_sample_batch': 7.917881011962891e-05, 'time_algorithm_update': 0.004026985168457032, 'loss': 0.654389556683

2023-01-09 23:00.34 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_923000.pt
2023-01-09 23:00.34 [info     ] DoubleDQN_online_20230109225406: epoch=923 step=923000 epoch=923 metrics={'time_inference': 0.00019051623344421386, 'time_environment_step': 1.163339614868164e-05, 'time_step': 0.00025723719596862794, 'rollout_return': 147.0, 'time_sample_batch': 8.285045623779297e-05, 'time_algorithm_update': 0.004100179672241211, 'loss': 0.7862581051886082, 'evaluation': 137.5} step=923000
2023-01-09 23:00.35 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_924000.pt
2023-01-09 23:00.35 [info     ] DoubleDQN_online_20230109225406: epoch=924 step=924000 epoch=924 metrics={'time_inference': 0.00019389843940734863, 'time_environment_step': 1.172327995300293e-05, 'time_step': 0.00026145100593566895, 'time_sample_batch': 0.00010826587677001954, 'time_algorithm_update': 0.004130434989929199, 'loss': 1.46863377019

2023-01-09 23:00.41 [info     ] DoubleDQN_online_20230109225406: epoch=938 step=938000 epoch=938 metrics={'time_inference': 0.00018326544761657716, 'time_environment_step': 1.125955581665039e-05, 'time_step': 0.0002449667453765869, 'rollout_return': 178.83333333333334, 'time_sample_batch': 6.852149963378906e-05, 'time_algorithm_update': 0.0036962747573852537, 'loss': 0.7214938316494226, 'evaluation': 165.4} step=938000
2023-01-09 23:00.42 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_939000.pt
2023-01-09 23:00.42 [info     ] DoubleDQN_online_20230109225406: epoch=939 step=939000 epoch=939 metrics={'time_inference': 0.00018968892097473146, 'time_environment_step': 1.1741399765014648e-05, 'time_step': 0.0002522366046905518, 'rollout_return': 168.83333333333334, 'time_sample_batch': 7.290840148925782e-05, 'time_algorithm_update': 0.0036767959594726563, 'loss': 0.8748721834272146, 'evaluation': 160.7} step=939000
2023-01-09 23:00.42 [info     ]

2023-01-09 23:00.48 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_954000.pt
2023-01-09 23:00.48 [info     ] DoubleDQN_online_20230109225406: epoch=954 step=954000 epoch=954 metrics={'time_inference': 0.00019645309448242188, 'time_environment_step': 1.2047767639160155e-05, 'time_step': 0.0002601172924041748, 'rollout_return': 166.0, 'time_sample_batch': 7.479190826416015e-05, 'time_algorithm_update': 0.0037172317504882814, 'loss': 0.44719546642154456, 'evaluation': 174.1} step=954000
2023-01-09 23:00.49 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_955000.pt
2023-01-09 23:00.49 [info     ] DoubleDQN_online_20230109225406: epoch=955 step=955000 epoch=955 metrics={'time_inference': 0.0001872408390045166, 'time_environment_step': 1.1641740798950195e-05, 'time_step': 0.00025026273727416995, 'rollout_return': 165.66666666666666, 'time_sample_batch': 7.176399230957031e-05, 'time_algorithm_update': 0.00

2023-01-09 23:00.55 [info     ] DoubleDQN_online_20230109225406: epoch=969 step=969000 epoch=969 metrics={'time_inference': 0.00019206666946411133, 'time_environment_step': 1.2021780014038086e-05, 'time_step': 0.000256094217300415, 'time_sample_batch': 7.138252258300781e-05, 'time_algorithm_update': 0.0037734508514404297, 'loss': 0.6268476622179151, 'rollout_return': 169.5, 'evaluation': 155.6} step=969000
2023-01-09 23:00.55 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_970000.pt
2023-01-09 23:00.55 [info     ] DoubleDQN_online_20230109225406: epoch=970 step=970000 epoch=970 metrics={'time_inference': 0.00018709588050842284, 'time_environment_step': 1.1591672897338867e-05, 'time_step': 0.00024961614608764646, 'rollout_return': 155.66666666666666, 'time_sample_batch': 7.052421569824219e-05, 'time_algorithm_update': 0.00371248722076416, 'loss': 0.8857485834509135, 'evaluation': 158.1} step=970000
2023-01-09 23:00.56 [info     ] Model paramet

2023-01-09 23:01.02 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_985000.pt
2023-01-09 23:01.02 [info     ] DoubleDQN_online_20230109225406: epoch=985 step=985000 epoch=985 metrics={'time_inference': 0.00018503713607788085, 'time_environment_step': 1.1523246765136719e-05, 'time_step': 0.0002491042613983154, 'rollout_return': 161.14285714285714, 'time_sample_batch': 7.023811340332032e-05, 'time_algorithm_update': 0.0038583993911743162, 'loss': 0.8923199677839875, 'evaluation': 147.7} step=985000
2023-01-09 23:01.02 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109225406/model_986000.pt
2023-01-09 23:01.02 [info     ] DoubleDQN_online_20230109225406: epoch=986 step=986000 epoch=986 metrics={'time_inference': 0.00019034981727600097, 'time_environment_step': 1.1890649795532226e-05, 'time_step': 0.00025533246994018555, 'time_sample_batch': 7.264614105224609e-05, 'time_algorithm_update': 0.003881692886352539, 'loss':

2023-01-09 23:01.09 [info     ] DoubleDQN_online_20230109225406: epoch=1000 step=1000000 epoch=1000 metrics={'time_inference': 0.00019473671913146972, 'time_environment_step': 1.2192249298095703e-05, 'time_step': 0.00026212263107299805, 'time_sample_batch': 7.617473602294922e-05, 'time_algorithm_update': 0.004061436653137207, 'loss': 0.7044335409998894, 'rollout_return': 148.71428571428572, 'evaluation': 133.6} step=1000000


# Double DQN +  Using N-steps and Noisy Networks

In [None]:
# setup environment
# training env
env = gym.make('CartPole-v0')
# evaluation env
eval_env = gym.make('CartPole-v0')

In [None]:
# modify weight decay
optim_factory = OptimizerFactory(Adam, weight_decay=1e-4)
# setup algorithm
ddqn = DoubleDQN(
    batch_size=32,  # number of batches
    learning_rate=2.5e-4,  # learning rate
    target_update_interval=100,  # interval to synchronize the target network
    n_steps=4,  # N-step TD calculation
    optim_factory=optim_factory,  # optimizer
)

In [None]:
# setup replay buffer
buffer = ReplayBuffer(maxlen=1000000, env=env)

In [None]:
explorer = LinearDecayEpsilonGreedy(start_epsilon=1.0,
                                    end_epsilon=0.1,
                                    duration=10000)

In [None]:
ddqn.fit_online(
    env,  # environment
    buffer,  # buffer
    explorer=explorer,  # buffer
    eval_env=eval_env,  # eval environment
    n_steps_per_epoch=1000,  # the number of steps per epoch.
    update_interval=100,
    eval_epsilon=0.3,
    save_metrics=True,
    tensorboard_dir="runs",
)

2023-01-09 23:09.13 [info     ] Directory is created at d3rlpy_logs/DoubleDQN_online_20230109230913
2023-01-09 23:09.13 [debug    ] Building model...
2023-01-09 23:09.13 [debug    ] Model has been built.
2023-01-09 23:09.13 [info     ] Parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/params.json params={'action_scaler': None, 'batch_size': 32, 'encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'gamma': 0.99, 'generated_maxlen': 100000, 'learning_rate': 0.00025, 'n_critics': 1, 'n_frames': 1, 'n_steps': 4, 'optim_factory': {'optim_cls': 'Adam', 'weight_decay': 0.0001}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'real_ratio': 1.0, 'reward_scaler': None, 'scaler': None, 'target_update_interval': 100, 'use_gpu': None, 'algorithm': 'DoubleDQN', 'observation_shape': (4,), 'action_size': 2}


  0%|          | 0/1000000 [00:00<?, ?it/s]

2023-01-09 23:09.13 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_1000.pt
2023-01-09 23:09.13 [info     ] DoubleDQN_online_20230109230913: epoch=1 step=1000 epoch=1 metrics={'time_inference': 0.0001857912540435791, 'time_environment_step': 1.2485265731811523e-05, 'time_step': 0.00022872090339660646, 'rollout_return': 22.0, 'time_sample_batch': 7.038116455078125e-05, 'time_algorithm_update': 0.0014892339706420899, 'loss': 3.0368156671524047, 'evaluation': 12.0} step=1000
2023-01-09 23:09.14 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_2000.pt
2023-01-09 23:09.14 [info     ] DoubleDQN_online_20230109230913: epoch=2 step=2000 epoch=2 metrics={'time_inference': 0.00017947816848754881, 'time_environment_step': 1.2160062789916992e-05, 'time_step': 0.00022032904624938964, 'rollout_return': 22.08888888888889, 'time_sample_batch': 7.259845733642578e-05, 'time_algorithm_update': 0.0013334512710571288, 'l

2023-01-09 23:09.18 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_17000.pt
2023-01-09 23:09.18 [info     ] DoubleDQN_online_20230109230913: epoch=17 step=17000 epoch=17 metrics={'time_inference': 0.00018135547637939452, 'time_environment_step': 1.222991943359375e-05, 'time_step': 0.00022182083129882813, 'rollout_return': 29.5, 'time_sample_batch': 7.824897766113281e-05, 'time_algorithm_update': 0.001309037208557129, 'loss': 1.0468122839927674, 'evaluation': 56.1} step=17000
2023-01-09 23:09.18 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_18000.pt
2023-01-09 23:09.18 [info     ] DoubleDQN_online_20230109230913: epoch=18 step=18000 epoch=18 metrics={'time_inference': 0.0001739804744720459, 'time_environment_step': 1.1828184127807616e-05, 'time_step': 0.00021347761154174804, 'rollout_return': 32.61290322580645, 'time_sample_batch': 7.481575012207032e-05, 'time_algorithm_update': 0.0013258695602416

2023-01-09 23:09.22 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_33000.pt
2023-01-09 23:09.22 [info     ] DoubleDQN_online_20230109230913: epoch=33 step=33000 epoch=33 metrics={'time_inference': 0.0001750352382659912, 'time_environment_step': 1.1987924575805664e-05, 'time_step': 0.0002171926498413086, 'rollout_return': 21.020833333333332, 'time_sample_batch': 9.708404541015625e-05, 'time_algorithm_update': 0.001510310173034668, 'loss': 2.4684374928474426, 'evaluation': 23.0} step=33000
2023-01-09 23:09.22 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_34000.pt
2023-01-09 23:09.22 [info     ] DoubleDQN_online_20230109230913: epoch=34 step=34000 epoch=34 metrics={'time_inference': 0.0001757187843322754, 'time_environment_step': 1.2021541595458984e-05, 'time_step': 0.00021554994583129883, 'rollout_return': 22.74418604651163, 'time_sample_batch': 7.646083831787109e-05, 'time_algorithm_update': 0.001

2023-01-09 23:09.26 [info     ] DoubleDQN_online_20230109230913: epoch=48 step=48000 epoch=48 metrics={'time_inference': 0.00017456769943237306, 'time_environment_step': 1.1962413787841797e-05, 'time_step': 0.00021372294425964355, 'rollout_return': 31.65625, 'time_sample_batch': 7.259845733642578e-05, 'time_algorithm_update': 0.001271963119506836, 'loss': 2.624872159957886, 'evaluation': 29.7} step=48000
2023-01-09 23:09.26 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_49000.pt
2023-01-09 23:09.26 [info     ] DoubleDQN_online_20230109230913: epoch=49 step=49000 epoch=49 metrics={'time_inference': 0.0001838362216949463, 'time_environment_step': 1.2448310852050782e-05, 'time_step': 0.00022530126571655273, 'rollout_return': 31.741935483870968, 'time_sample_batch': 8.301734924316406e-05, 'time_algorithm_update': 0.0013725757598876953, 'loss': 2.4352133512496947, 'evaluation': 38.4} step=49000
2023-01-09 23:09.26 [info     ] Model parameters are

2023-01-09 23:09.30 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_64000.pt
2023-01-09 23:09.30 [info     ] DoubleDQN_online_20230109230913: epoch=64 step=64000 epoch=64 metrics={'time_inference': 0.00017878699302673338, 'time_environment_step': 1.1987447738647462e-05, 'time_step': 0.00021854138374328614, 'time_sample_batch': 7.946491241455078e-05, 'time_algorithm_update': 0.001327347755432129, 'loss': 2.537966215610504, 'rollout_return': 65.26666666666667, 'evaluation': 61.7} step=64000
2023-01-09 23:09.31 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_65000.pt
2023-01-09 23:09.31 [info     ] DoubleDQN_online_20230109230913: epoch=65 step=65000 epoch=65 metrics={'time_inference': 0.00018863868713378906, 'time_environment_step': 1.2913703918457032e-05, 'time_step': 0.00023246479034423829, 'rollout_return': 68.4375, 'time_sample_batch': 9.729862213134766e-05, 'time_algorithm_update': 0.001518130302

2023-01-09 23:09.35 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_80000.pt
2023-01-09 23:09.35 [info     ] DoubleDQN_online_20230109230913: epoch=80 step=80000 epoch=80 metrics={'time_inference': 0.0001734626293182373, 'time_environment_step': 1.1792421340942382e-05, 'time_step': 0.00021196556091308594, 'rollout_return': 51.15, 'time_sample_batch': 7.405281066894532e-05, 'time_algorithm_update': 0.0012502193450927735, 'loss': 3.179272735118866, 'evaluation': 62.5} step=80000
2023-01-09 23:09.35 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_81000.pt
2023-01-09 23:09.35 [info     ] DoubleDQN_online_20230109230913: epoch=81 step=81000 epoch=81 metrics={'time_inference': 0.00017566895484924316, 'time_environment_step': 1.2552738189697265e-05, 'time_step': 0.00021674442291259765, 'rollout_return': 49.75, 'time_sample_batch': 0.00010311603546142578, 'time_algorithm_update': 0.0013656854629516602, 'los

2023-01-09 23:09.40 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_96000.pt
2023-01-09 23:09.40 [info     ] DoubleDQN_online_20230109230913: epoch=96 step=96000 epoch=96 metrics={'time_inference': 0.0009012620449066162, 'time_environment_step': 1.2294530868530273e-05, 'time_step': 0.0009422705173492431, 'rollout_return': 65.33333333333333, 'time_sample_batch': 7.405281066894532e-05, 'time_algorithm_update': 0.0012753963470458984, 'loss': 3.1761186599731444, 'evaluation': 65.4} step=96000
2023-01-09 23:09.41 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_97000.pt
2023-01-09 23:09.41 [info     ] DoubleDQN_online_20230109230913: epoch=97 step=97000 epoch=97 metrics={'time_inference': 0.00017879605293273925, 'time_environment_step': 1.2028217315673829e-05, 'time_step': 0.00021884369850158693, 'rollout_return': 59.470588235294116, 'time_sample_batch': 8.578300476074219e-05, 'time_algorithm_update': 0.0

2023-01-09 23:09.45 [info     ] DoubleDQN_online_20230109230913: epoch=111 step=111000 epoch=111 metrics={'time_inference': 0.000179093599319458, 'time_environment_step': 1.1922597885131835e-05, 'time_step': 0.00022016167640686034, 'rollout_return': 63.6875, 'time_sample_batch': 9.064674377441406e-05, 'time_algorithm_update': 0.0014438390731811523, 'loss': 3.2672453999519346, 'evaluation': 68.9} step=111000
2023-01-09 23:09.45 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_112000.pt
2023-01-09 23:09.45 [info     ] DoubleDQN_online_20230109230913: epoch=112 step=112000 epoch=112 metrics={'time_inference': 0.00017733216285705566, 'time_environment_step': 1.1934280395507812e-05, 'time_step': 0.00021622228622436524, 'rollout_return': 70.14285714285714, 'time_sample_batch': 7.729530334472656e-05, 'time_algorithm_update': 0.0012538433074951172, 'loss': 3.591367077827454, 'evaluation': 59.3} step=112000
2023-01-09 23:09.45 [info     ] Model paramet

2023-01-09 23:09.50 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_127000.pt
2023-01-09 23:09.50 [info     ] DoubleDQN_online_20230109230913: epoch=127 step=127000 epoch=127 metrics={'time_inference': 0.00017598223686218263, 'time_environment_step': 1.181650161743164e-05, 'time_step': 0.00021689271926879884, 'rollout_return': 84.33333333333333, 'time_sample_batch': 8.134841918945312e-05, 'time_algorithm_update': 0.0014594316482543946, 'loss': 3.1525785088539124, 'evaluation': 103.7} step=127000
2023-01-09 23:09.50 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_128000.pt
2023-01-09 23:09.50 [info     ] DoubleDQN_online_20230109230913: epoch=128 step=128000 epoch=128 metrics={'time_inference': 0.0001766669750213623, 'time_environment_step': 1.185750961303711e-05, 'time_step': 0.00021796846389770508, 'rollout_return': 104.22222222222223, 'time_sample_batch': 9.763240814208984e-05, 'time_algorithm_upd

2023-01-09 23:09.55 [info     ] DoubleDQN_online_20230109230913: epoch=142 step=142000 epoch=142 metrics={'time_inference': 0.00018224000930786133, 'time_environment_step': 1.2058496475219726e-05, 'time_step': 0.00022648310661315917, 'time_sample_batch': 8.88824462890625e-05, 'time_algorithm_update': 0.0017416000366210938, 'loss': 3.291070079803467, 'rollout_return': 154.0, 'evaluation': 111.2} step=142000
2023-01-09 23:09.55 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_143000.pt
2023-01-09 23:09.55 [info     ] DoubleDQN_online_20230109230913: epoch=143 step=143000 epoch=143 metrics={'time_inference': 0.00019223380088806153, 'time_environment_step': 1.255178451538086e-05, 'time_step': 0.00023796725273132324, 'time_sample_batch': 9.593963623046874e-05, 'time_algorithm_update': 0.0017822980880737305, 'loss': 3.0596959590911865, 'rollout_return': 140.57142857142858, 'evaluation': 111.5} step=143000
2023-01-09 23:09.56 [info     ] Model parame

2023-01-09 23:10.01 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_158000.pt
2023-01-09 23:10.01 [info     ] DoubleDQN_online_20230109230913: epoch=158 step=158000 epoch=158 metrics={'time_inference': 0.00018558597564697266, 'time_environment_step': 1.2044429779052735e-05, 'time_step': 0.00023242473602294922, 'rollout_return': 163.85714285714286, 'time_sample_batch': 0.00012593269348144532, 'time_algorithm_update': 0.0019694805145263673, 'loss': 3.1797491788864134, 'evaluation': 120.5} step=158000
2023-01-09 23:10.01 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_159000.pt
2023-01-09 23:10.01 [info     ] DoubleDQN_online_20230109230913: epoch=159 step=159000 epoch=159 metrics={'time_inference': 0.0001789860725402832, 'time_environment_step': 1.1766910552978515e-05, 'time_step': 0.00022406840324401856, 'time_sample_batch': 9.975433349609374e-05, 'time_algorithm_update': 0.0018839359283447266, 'loss

2023-01-09 23:10.06 [info     ] DoubleDQN_online_20230109230913: epoch=173 step=173000 epoch=173 metrics={'time_inference': 0.00018999433517456055, 'time_environment_step': 1.2181520462036134e-05, 'time_step': 0.00023996543884277345, 'rollout_return': 89.25, 'time_sample_batch': 0.00012013912200927734, 'time_algorithm_update': 0.0022566556930541993, 'loss': 3.8203012466430666, 'evaluation': 53.0} step=173000
2023-01-09 23:10.06 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_174000.pt
2023-01-09 23:10.06 [info     ] DoubleDQN_online_20230109230913: epoch=174 step=174000 epoch=174 metrics={'time_inference': 0.00017961406707763672, 'time_environment_step': 1.1721134185791015e-05, 'time_step': 0.00022629833221435547, 'rollout_return': 86.0, 'time_sample_batch': 8.165836334228516e-05, 'time_algorithm_update': 0.002053499221801758, 'loss': 3.378064680099487, 'evaluation': 64.9} step=174000
2023-01-09 23:10.07 [info     ] Model parameters are saved

2023-01-09 23:10.12 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_189000.pt
2023-01-09 23:10.12 [info     ] DoubleDQN_online_20230109230913: epoch=189 step=189000 epoch=189 metrics={'time_inference': 0.00018816208839416503, 'time_environment_step': 1.2074708938598634e-05, 'time_step': 0.00023659348487854005, 'rollout_return': 171.33333333333334, 'time_sample_batch': 9.188652038574218e-05, 'time_algorithm_update': 0.0021744251251220705, 'loss': 2.8266414999961853, 'evaluation': 146.0} step=189000
2023-01-09 23:10.13 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_190000.pt
2023-01-09 23:10.13 [info     ] DoubleDQN_online_20230109230913: epoch=190 step=190000 epoch=190 metrics={'time_inference': 0.00018527817726135255, 'time_environment_step': 1.1926651000976563e-05, 'time_step': 0.00023508644104003906, 'time_sample_batch': 8.857250213623047e-05, 'time_algorithm_update': 0.0023432254791259767, 'loss

2023-01-09 23:10.18 [info     ] DoubleDQN_online_20230109230913: epoch=204 step=204000 epoch=204 metrics={'time_inference': 0.00018059325218200683, 'time_environment_step': 1.1676549911499024e-05, 'time_step': 0.00022719955444335937, 'rollout_return': 182.33333333333334, 'time_sample_batch': 8.940696716308594e-05, 'time_algorithm_update': 0.002079939842224121, 'loss': 3.4382946372032164, 'evaluation': 156.9} step=204000
2023-01-09 23:10.19 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_205000.pt
2023-01-09 23:10.19 [info     ] DoubleDQN_online_20230109230913: epoch=205 step=205000 epoch=205 metrics={'time_inference': 0.00018385601043701172, 'time_environment_step': 1.1833906173706054e-05, 'time_step': 0.000230712890625, 'time_sample_batch': 9.784698486328124e-05, 'time_algorithm_update': 0.002071237564086914, 'loss': 3.40903103351593, 'rollout_return': 180.6, 'evaluation': 166.4} step=205000
2023-01-09 23:10.19 [info     ] Model parameters a

2023-01-09 23:10.25 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_220000.pt
2023-01-09 23:10.25 [info     ] DoubleDQN_online_20230109230913: epoch=220 step=220000 epoch=220 metrics={'time_inference': 0.00018740153312683106, 'time_environment_step': 1.1933326721191406e-05, 'time_step': 0.00023462462425231933, 'rollout_return': 178.6, 'time_sample_batch': 8.282661437988281e-05, 'time_algorithm_update': 0.0021046161651611327, 'loss': 3.5696943283081053, 'evaluation': 141.2} step=220000
2023-01-09 23:10.26 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_221000.pt
2023-01-09 23:10.26 [info     ] DoubleDQN_online_20230109230913: epoch=221 step=221000 epoch=221 metrics={'time_inference': 0.00018483996391296386, 'time_environment_step': 1.1899232864379883e-05, 'time_step': 0.0002328813076019287, 'rollout_return': 174.66666666666666, 'time_sample_batch': 9.317398071289063e-05, 'time_algorithm_update': 0.00

2023-01-09 23:10.32 [info     ] DoubleDQN_online_20230109230913: epoch=235 step=235000 epoch=235 metrics={'time_inference': 0.00019912409782409668, 'time_environment_step': 1.2552261352539063e-05, 'time_step': 0.00025043082237243653, 'rollout_return': 159.5, 'time_sample_batch': 9.40084457397461e-05, 'time_algorithm_update': 0.002379703521728516, 'loss': 3.426342082023621, 'evaluation': 160.4} step=235000
2023-01-09 23:10.33 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_236000.pt
2023-01-09 23:10.33 [info     ] DoubleDQN_online_20230109230913: epoch=236 step=236000 epoch=236 metrics={'time_inference': 0.00020107221603393553, 'time_environment_step': 1.2827396392822266e-05, 'time_step': 0.0002573683261871338, 'rollout_return': 191.4, 'time_sample_batch': 0.00011534690856933594, 'time_algorithm_update': 0.002785992622375488, 'loss': 4.232034206390381, 'evaluation': 184.8} step=236000
2023-01-09 23:10.33 [info     ] Model parameters are saved 

2023-01-09 23:10.40 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_251000.pt
2023-01-09 23:10.40 [info     ] DoubleDQN_online_20230109230913: epoch=251 step=251000 epoch=251 metrics={'time_inference': 0.00018847203254699707, 'time_environment_step': 1.1876821517944335e-05, 'time_step': 0.00023885726928710938, 'rollout_return': 182.83333333333334, 'time_sample_batch': 8.747577667236328e-05, 'time_algorithm_update': 0.0024132728576660156, 'loss': 3.5724326610565185, 'evaluation': 167.5} step=251000
2023-01-09 23:10.40 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_252000.pt
2023-01-09 23:10.40 [info     ] DoubleDQN_online_20230109230913: epoch=252 step=252000 epoch=252 metrics={'time_inference': 0.00019219279289245606, 'time_environment_step': 1.310563087463379e-05, 'time_step': 0.0002434515953063965, 'time_sample_batch': 8.463859558105469e-05, 'time_algorithm_update': 0.0023661375045776365, 'loss':

2023-01-09 23:10.47 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_267000.pt
2023-01-09 23:10.47 [info     ] DoubleDQN_online_20230109230913: epoch=267 step=267000 epoch=267 metrics={'time_inference': 0.0001913173198699951, 'time_environment_step': 1.2041568756103516e-05, 'time_step': 0.0002439301013946533, 'rollout_return': 182.66666666666666, 'time_sample_batch': 8.726119995117188e-05, 'time_algorithm_update': 0.002610945701599121, 'loss': 3.626935029029846, 'evaluation': 192.9} step=267000
2023-01-09 23:10.47 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_268000.pt
2023-01-09 23:10.47 [info     ] DoubleDQN_online_20230109230913: epoch=268 step=268000 epoch=268 metrics={'time_inference': 0.00018691396713256837, 'time_environment_step': 1.1868953704833984e-05, 'time_step': 0.00023829913139343262, 'time_sample_batch': 8.220672607421875e-05, 'time_algorithm_update': 0.0025374889373779297, 'loss': 2

2023-01-09 23:10.55 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_283000.pt
2023-01-09 23:10.55 [info     ] DoubleDQN_online_20230109230913: epoch=283 step=283000 epoch=283 metrics={'time_inference': 0.00027306270599365234, 'time_environment_step': 1.2005329132080078e-05, 'time_step': 0.0003266756534576416, 'time_sample_batch': 8.921623229980468e-05, 'time_algorithm_update': 0.0027120828628540037, 'loss': 3.4831809401512146, 'rollout_return': 192.2, 'evaluation': 174.8} step=283000
2023-01-09 23:10.55 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_284000.pt
2023-01-09 23:10.55 [info     ] DoubleDQN_online_20230109230913: epoch=284 step=284000 epoch=284 metrics={'time_inference': 0.00018602848052978515, 'time_environment_step': 1.239943504333496e-05, 'time_step': 0.00023839998245239258, 'rollout_return': 200.0, 'time_sample_batch': 8.172988891601563e-05, 'time_algorithm_update': 0.0025806427001953

2023-01-09 23:11.02 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_299000.pt
2023-01-09 23:11.02 [info     ] DoubleDQN_online_20230109230913: epoch=299 step=299000 epoch=299 metrics={'time_inference': 0.00018516755104064941, 'time_environment_step': 1.2276411056518555e-05, 'time_step': 0.000241652250289917, 'time_sample_batch': 9.546279907226562e-05, 'time_algorithm_update': 0.003003716468811035, 'loss': 3.402239274978638, 'rollout_return': 200.0, 'evaluation': 167.7} step=299000
2023-01-09 23:11.02 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_300000.pt
2023-01-09 23:11.02 [info     ] DoubleDQN_online_20230109230913: epoch=300 step=300000 epoch=300 metrics={'time_inference': 0.00018754029273986816, 'time_environment_step': 1.2011289596557617e-05, 'time_step': 0.00024180912971496582, 'rollout_return': 190.2, 'time_sample_batch': 0.00010170936584472657, 'time_algorithm_update': 0.00277955532073974

2023-01-09 23:11.09 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_315000.pt
2023-01-09 23:11.09 [info     ] DoubleDQN_online_20230109230913: epoch=315 step=315000 epoch=315 metrics={'time_inference': 0.00018930745124816895, 'time_environment_step': 1.1802196502685547e-05, 'time_step': 0.00024474024772644045, 'rollout_return': 195.4, 'time_sample_batch': 8.854866027832031e-05, 'time_algorithm_update': 0.0029024839401245116, 'loss': 3.73120059967041, 'evaluation': 162.2} step=315000
2023-01-09 23:11.10 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_316000.pt
2023-01-09 23:11.10 [info     ] DoubleDQN_online_20230109230913: epoch=316 step=316000 epoch=316 metrics={'time_inference': 0.00021357154846191406, 'time_environment_step': 1.2887716293334961e-05, 'time_step': 0.0002751002311706543, 'rollout_return': 197.2, 'time_sample_batch': 0.00011587142944335938, 'time_algorithm_update': 0.0032070159912109

2023-01-09 23:11.17 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_331000.pt
2023-01-09 23:11.17 [info     ] DoubleDQN_online_20230109230913: epoch=331 step=331000 epoch=331 metrics={'time_inference': 0.00019534754753112792, 'time_environment_step': 1.1957406997680664e-05, 'time_step': 0.0002512869834899902, 'rollout_return': 192.66666666666666, 'time_sample_batch': 8.673667907714844e-05, 'time_algorithm_update': 0.002893805503845215, 'loss': 3.4000808238983153, 'evaluation': 167.3} step=331000
2023-01-09 23:11.18 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_332000.pt
2023-01-09 23:11.18 [info     ] DoubleDQN_online_20230109230913: epoch=332 step=332000 epoch=332 metrics={'time_inference': 0.00019392704963684082, 'time_environment_step': 1.2043476104736329e-05, 'time_step': 0.00024863886833190917, 'time_sample_batch': 9.274482727050781e-05, 'time_algorithm_update': 0.0027683258056640627, 'loss':

2023-01-09 23:11.25 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_347000.pt
2023-01-09 23:11.25 [info     ] DoubleDQN_online_20230109230913: epoch=347 step=347000 epoch=347 metrics={'time_inference': 0.00019114303588867187, 'time_environment_step': 1.1896371841430664e-05, 'time_step': 0.0002459051609039307, 'time_sample_batch': 8.661746978759766e-05, 'time_algorithm_update': 0.002805137634277344, 'loss': 3.4188203692436216, 'rollout_return': 200.0, 'evaluation': 185.1} step=347000
2023-01-09 23:11.25 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_348000.pt
2023-01-09 23:11.25 [info     ] DoubleDQN_online_20230109230913: epoch=348 step=348000 epoch=348 metrics={'time_inference': 0.0001942892074584961, 'time_environment_step': 1.1980056762695312e-05, 'time_step': 0.00025211143493652344, 'time_sample_batch': 9.105205535888671e-05, 'time_algorithm_update': 0.0030803680419921875, 'loss': 3.59109737873

2023-01-09 23:11.32 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_363000.pt
2023-01-09 23:11.32 [info     ] DoubleDQN_online_20230109230913: epoch=363 step=363000 epoch=363 metrics={'time_inference': 0.0001901564598083496, 'time_environment_step': 1.1922121047973633e-05, 'time_step': 0.0002477929592132568, 'rollout_return': 191.5, 'time_sample_batch': 8.563995361328125e-05, 'time_algorithm_update': 0.0030904293060302736, 'loss': 3.688403296470642, 'evaluation': 176.9} step=363000
2023-01-09 23:11.33 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_364000.pt
2023-01-09 23:11.33 [info     ] DoubleDQN_online_20230109230913: epoch=364 step=364000 epoch=364 metrics={'time_inference': 0.0001907320022583008, 'time_environment_step': 1.1891603469848633e-05, 'time_step': 0.0002459733486175537, 'time_sample_batch': 8.687973022460937e-05, 'time_algorithm_update': 0.002857351303100586, 'loss': 4.08044300079345

2023-01-09 23:11.40 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_379000.pt
2023-01-09 23:11.40 [info     ] DoubleDQN_online_20230109230913: epoch=379 step=379000 epoch=379 metrics={'time_inference': 0.00018828487396240235, 'time_environment_step': 1.2696027755737304e-05, 'time_step': 0.00024402427673339842, 'rollout_return': 188.83333333333334, 'time_sample_batch': 8.230209350585937e-05, 'time_algorithm_update': 0.0028436660766601564, 'loss': 3.684180569648743, 'evaluation': 155.8} step=379000
2023-01-09 23:11.40 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_380000.pt
2023-01-09 23:11.40 [info     ] DoubleDQN_online_20230109230913: epoch=380 step=380000 epoch=380 metrics={'time_inference': 0.00019233918190002442, 'time_environment_step': 1.1864423751831054e-05, 'time_step': 0.0002475900650024414, 'time_sample_batch': 8.208751678466797e-05, 'time_algorithm_update': 0.0028638124465942385, 'loss':

2023-01-09 23:11.47 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_395000.pt
2023-01-09 23:11.47 [info     ] DoubleDQN_online_20230109230913: epoch=395 step=395000 epoch=395 metrics={'time_inference': 0.00019293212890625, 'time_environment_step': 1.1927843093872071e-05, 'time_step': 0.0002492215633392334, 'time_sample_batch': 8.411407470703125e-05, 'time_algorithm_update': 0.002949428558349609, 'loss': 3.703619360923767, 'rollout_return': 194.2, 'evaluation': 175.7} step=395000
2023-01-09 23:11.48 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_396000.pt
2023-01-09 23:11.48 [info     ] DoubleDQN_online_20230109230913: epoch=396 step=396000 epoch=396 metrics={'time_inference': 0.00018899893760681154, 'time_environment_step': 1.1770009994506835e-05, 'time_step': 0.00024701833724975584, 'time_sample_batch': 0.00011937618255615234, 'time_algorithm_update': 0.0031238079071044924, 'loss': 4.1381116390228

2023-01-09 23:11.55 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_411000.pt
2023-01-09 23:11.55 [info     ] DoubleDQN_online_20230109230913: epoch=411 step=411000 epoch=411 metrics={'time_inference': 0.00018732523918151854, 'time_environment_step': 1.1630535125732421e-05, 'time_step': 0.00024382901191711425, 'time_sample_batch': 8.208751678466797e-05, 'time_algorithm_update': 0.003036952018737793, 'loss': 3.962173414230347, 'rollout_return': 200.0, 'evaluation': 200.0} step=411000
2023-01-09 23:11.55 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_412000.pt
2023-01-09 23:11.55 [info     ] DoubleDQN_online_20230109230913: epoch=412 step=412000 epoch=412 metrics={'time_inference': 0.00018888044357299804, 'time_environment_step': 1.1703968048095704e-05, 'time_step': 0.000244575023651123, 'time_sample_batch': 8.308887481689453e-05, 'time_algorithm_update': 0.002940464019775391, 'loss': 3.9484827280044

2023-01-09 23:12.03 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_427000.pt
2023-01-09 23:12.03 [info     ] DoubleDQN_online_20230109230913: epoch=427 step=427000 epoch=427 metrics={'time_inference': 0.00019323348999023438, 'time_environment_step': 1.2012243270874024e-05, 'time_step': 0.00025010919570922854, 'rollout_return': 191.5, 'time_sample_batch': 8.220672607421875e-05, 'time_algorithm_update': 0.002983880043029785, 'loss': 3.8573426842689513, 'evaluation': 171.3} step=427000
2023-01-09 23:12.03 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_428000.pt
2023-01-09 23:12.03 [info     ] DoubleDQN_online_20230109230913: epoch=428 step=428000 epoch=428 metrics={'time_inference': 0.00019185638427734374, 'time_environment_step': 1.1962413787841797e-05, 'time_step': 0.0002509903907775879, 'time_sample_batch': 8.58306884765625e-05, 'time_algorithm_update': 0.0032311201095581053, 'loss': 3.75198470354

2023-01-09 23:12.10 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_443000.pt
2023-01-09 23:12.10 [info     ] DoubleDQN_online_20230109230913: epoch=443 step=443000 epoch=443 metrics={'time_inference': 0.0001915743350982666, 'time_environment_step': 1.187443733215332e-05, 'time_step': 0.0002480847835540771, 'time_sample_batch': 8.373260498046875e-05, 'time_algorithm_update': 0.0029916763305664062, 'loss': 3.6077032804489138, 'rollout_return': 200.0, 'evaluation': 181.5} step=443000
2023-01-09 23:12.11 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_444000.pt
2023-01-09 23:12.11 [info     ] DoubleDQN_online_20230109230913: epoch=444 step=444000 epoch=444 metrics={'time_inference': 0.0001948530673980713, 'time_environment_step': 1.2052536010742187e-05, 'time_step': 0.000251856803894043, 'time_sample_batch': 8.060932159423829e-05, 'time_algorithm_update': 0.003008270263671875, 'loss': 3.617781853675842

2023-01-09 23:12.18 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_459000.pt
2023-01-09 23:12.18 [info     ] DoubleDQN_online_20230109230913: epoch=459 step=459000 epoch=459 metrics={'time_inference': 0.0001900634765625, 'time_environment_step': 1.1776924133300781e-05, 'time_step': 0.000247776985168457, 'time_sample_batch': 8.184909820556641e-05, 'time_algorithm_update': 0.0031184911727905273, 'loss': 3.396914482116699, 'rollout_return': 200.0, 'evaluation': 184.9} step=459000
2023-01-09 23:12.18 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_460000.pt
2023-01-09 23:12.18 [info     ] DoubleDQN_online_20230109230913: epoch=460 step=460000 epoch=460 metrics={'time_inference': 0.0001941385269165039, 'time_environment_step': 1.2105703353881836e-05, 'time_step': 0.0002535688877105713, 'time_sample_batch': 0.00010249614715576171, 'time_algorithm_update': 0.0032195568084716795, 'loss': 4.090750074386596,

2023-01-09 23:12.25 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_475000.pt
2023-01-09 23:12.25 [info     ] DoubleDQN_online_20230109230913: epoch=475 step=475000 epoch=475 metrics={'time_inference': 0.00018838000297546388, 'time_environment_step': 1.2235403060913087e-05, 'time_step': 0.0002468383312225342, 'rollout_return': 200.0, 'time_sample_batch': 7.967948913574218e-05, 'time_algorithm_update': 0.0031606674194335936, 'loss': 4.028665089607239, 'evaluation': 187.7} step=475000
2023-01-09 23:12.26 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_476000.pt
2023-01-09 23:12.26 [info     ] DoubleDQN_online_20230109230913: epoch=476 step=476000 epoch=476 metrics={'time_inference': 0.00019366526603698732, 'time_environment_step': 1.200413703918457e-05, 'time_step': 0.00025637030601501467, 'rollout_return': 199.2, 'time_sample_batch': 8.890628814697265e-05, 'time_algorithm_update': 0.00355820655822753

2023-01-09 23:12.33 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_491000.pt
2023-01-09 23:12.33 [info     ] DoubleDQN_online_20230109230913: epoch=491 step=491000 epoch=491 metrics={'time_inference': 0.00019516539573669433, 'time_environment_step': 1.209259033203125e-05, 'time_step': 0.0002536659240722656, 'rollout_return': 195.0, 'time_sample_batch': 7.941722869873047e-05, 'time_algorithm_update': 0.0031414031982421875, 'loss': 4.267070150375366, 'evaluation': 171.9} step=491000
2023-01-09 23:12.34 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_492000.pt
2023-01-09 23:12.34 [info     ] DoubleDQN_online_20230109230913: epoch=492 step=492000 epoch=492 metrics={'time_inference': 0.00018930768966674805, 'time_environment_step': 1.1748552322387696e-05, 'time_step': 0.00024656224250793455, 'rollout_return': 196.2, 'time_sample_batch': 8.027553558349609e-05, 'time_algorithm_update': 0.00308670997619628

2023-01-09 23:12.41 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_507000.pt
2023-01-09 23:12.41 [info     ] DoubleDQN_online_20230109230913: epoch=507 step=507000 epoch=507 metrics={'time_inference': 0.00018681836128234863, 'time_environment_step': 1.184535026550293e-05, 'time_step': 0.00024353289604187012, 'time_sample_batch': 7.913112640380859e-05, 'time_algorithm_update': 0.003061938285827637, 'loss': 4.600051283836365, 'rollout_return': 192.2, 'evaluation': 173.5} step=507000
2023-01-09 23:12.41 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_508000.pt
2023-01-09 23:12.41 [info     ] DoubleDQN_online_20230109230913: epoch=508 step=508000 epoch=508 metrics={'time_inference': 0.00020670366287231444, 'time_environment_step': 1.2447834014892578e-05, 'time_step': 0.0002697262763977051, 'rollout_return': 199.6, 'time_sample_batch': 0.00010392665863037109, 'time_algorithm_update': 0.00351974964141845

2023-01-09 23:12.49 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_523000.pt
2023-01-09 23:12.49 [info     ] DoubleDQN_online_20230109230913: epoch=523 step=523000 epoch=523 metrics={'time_inference': 0.00018725156784057617, 'time_environment_step': 1.1957406997680664e-05, 'time_step': 0.0002469542026519775, 'time_sample_batch': 8.463859558105469e-05, 'time_algorithm_update': 0.0033330440521240233, 'loss': 3.8862337708473205, 'rollout_return': 199.2, 'evaluation': 177.3} step=523000
2023-01-09 23:12.49 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_524000.pt
2023-01-09 23:12.49 [info     ] DoubleDQN_online_20230109230913: epoch=524 step=524000 epoch=524 metrics={'time_inference': 0.00018387985229492189, 'time_environment_step': 1.1666059494018554e-05, 'time_step': 0.00024370670318603514, 'time_sample_batch': 8.592605590820312e-05, 'time_algorithm_update': 0.003400254249572754, 'loss': 5.0496668815

2023-01-09 23:12.57 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_539000.pt
2023-01-09 23:12.57 [info     ] DoubleDQN_online_20230109230913: epoch=539 step=539000 epoch=539 metrics={'time_inference': 0.00018709182739257813, 'time_environment_step': 1.1814594268798829e-05, 'time_step': 0.0002461371421813965, 'rollout_return': 200.0, 'time_sample_batch': 8.52823257446289e-05, 'time_algorithm_update': 0.003292369842529297, 'loss': 4.32409520149231, 'evaluation': 195.4} step=539000
2023-01-09 23:12.57 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_540000.pt
2023-01-09 23:12.57 [info     ] DoubleDQN_online_20230109230913: epoch=540 step=540000 epoch=540 metrics={'time_inference': 0.00019235515594482423, 'time_environment_step': 1.2187004089355469e-05, 'time_step': 0.0002542393207550049, 'rollout_return': 198.6, 'time_sample_batch': 0.00010673999786376953, 'time_algorithm_update': 0.00346066951751709, 

2023-01-09 23:13.05 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_555000.pt
2023-01-09 23:13.05 [info     ] DoubleDQN_online_20230109230913: epoch=555 step=555000 epoch=555 metrics={'time_inference': 0.00019724273681640626, 'time_environment_step': 1.247572898864746e-05, 'time_step': 0.00025981855392456056, 'time_sample_batch': 0.00011332035064697266, 'time_algorithm_update': 0.003466963768005371, 'loss': 4.489846587181091, 'rollout_return': 200.0, 'evaluation': 175.2} step=555000
2023-01-09 23:13.06 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_556000.pt
2023-01-09 23:13.06 [info     ] DoubleDQN_online_20230109230913: epoch=556 step=556000 epoch=556 metrics={'time_inference': 0.00019542312622070312, 'time_environment_step': 1.2434959411621093e-05, 'time_step': 0.00025701045989990235, 'time_sample_batch': 8.83340835571289e-05, 'time_algorithm_update': 0.0034132480621337892, 'loss': 4.63117098808

2023-01-09 23:13.13 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_571000.pt
2023-01-09 23:13.13 [info     ] DoubleDQN_online_20230109230913: epoch=571 step=571000 epoch=571 metrics={'time_inference': 0.00020075917243957519, 'time_environment_step': 1.2500762939453126e-05, 'time_step': 0.00026476955413818357, 'time_sample_batch': 0.00010778903961181641, 'time_algorithm_update': 0.003611183166503906, 'loss': 3.251026678085327, 'rollout_return': 192.0, 'evaluation': 175.4} step=571000
2023-01-09 23:13.14 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_572000.pt
2023-01-09 23:13.14 [info     ] DoubleDQN_online_20230109230913: epoch=572 step=572000 epoch=572 metrics={'time_inference': 0.00019503283500671388, 'time_environment_step': 1.2183904647827149e-05, 'time_step': 0.0002568283081054687, 'rollout_return': 193.8, 'time_sample_batch': 8.893013000488281e-05, 'time_algorithm_update': 0.0034884452819824

2023-01-09 23:13.21 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_587000.pt
2023-01-09 23:13.21 [info     ] DoubleDQN_online_20230109230913: epoch=587 step=587000 epoch=587 metrics={'time_inference': 0.00020377373695373534, 'time_environment_step': 1.328277587890625e-05, 'time_step': 0.00026862430572509766, 'time_sample_batch': 9.491443634033204e-05, 'time_algorithm_update': 0.003611898422241211, 'loss': 3.810885453224182, 'rollout_return': 200.0, 'evaluation': 162.8} step=587000
2023-01-09 23:13.22 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_588000.pt
2023-01-09 23:13.22 [info     ] DoubleDQN_online_20230109230913: epoch=588 step=588000 epoch=588 metrics={'time_inference': 0.0002002434730529785, 'time_environment_step': 1.260972023010254e-05, 'time_step': 0.0002667481899261475, 'time_sample_batch': 0.00010805130004882812, 'time_algorithm_update': 0.003843951225280762, 'loss': 3.09583667516708

2023-01-09 23:13.29 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_603000.pt
2023-01-09 23:13.29 [info     ] DoubleDQN_online_20230109230913: epoch=603 step=603000 epoch=603 metrics={'time_inference': 0.00019875001907348634, 'time_environment_step': 1.242971420288086e-05, 'time_step': 0.0002598390579223633, 'time_sample_batch': 8.702278137207031e-05, 'time_algorithm_update': 0.0033569574356079102, 'loss': 4.9800379276275635, 'rollout_return': 199.4, 'evaluation': 174.1} step=603000
2023-01-09 23:13.30 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_604000.pt
2023-01-09 23:13.30 [info     ] DoubleDQN_online_20230109230913: epoch=604 step=604000 epoch=604 metrics={'time_inference': 0.00018802022933959962, 'time_environment_step': 1.1916160583496093e-05, 'time_step': 0.000246816873550415, 'time_sample_batch': 8.14199447631836e-05, 'time_algorithm_update': 0.003255891799926758, 'loss': 3.70633214712142

2023-01-09 23:13.37 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_619000.pt
2023-01-09 23:13.37 [info     ] DoubleDQN_online_20230109230913: epoch=619 step=619000 epoch=619 metrics={'time_inference': 0.00019280362129211426, 'time_environment_step': 1.210474967956543e-05, 'time_step': 0.00025340819358825686, 'time_sample_batch': 0.00010290145874023438, 'time_algorithm_update': 0.0033664464950561523, 'loss': 3.716341722011566, 'rollout_return': 200.0, 'evaluation': 190.0} step=619000
2023-01-09 23:13.37 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_620000.pt
2023-01-09 23:13.37 [info     ] DoubleDQN_online_20230109230913: epoch=620 step=620000 epoch=620 metrics={'time_inference': 0.00018434572219848633, 'time_environment_step': 1.1687278747558594e-05, 'time_step': 0.0002401111125946045, 'time_sample_batch': 8.034706115722656e-05, 'time_algorithm_update': 0.002999567985534668, 'loss': 3.37479920387

2023-01-09 23:13.45 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_635000.pt
2023-01-09 23:13.45 [info     ] DoubleDQN_online_20230109230913: epoch=635 step=635000 epoch=635 metrics={'time_inference': 0.00018472599983215333, 'time_environment_step': 1.164078712463379e-05, 'time_step': 0.00024209237098693847, 'rollout_return': 200.0, 'time_sample_batch': 8.096694946289063e-05, 'time_algorithm_update': 0.0031629562377929687, 'loss': 3.555555582046509, 'evaluation': 200.0} step=635000
2023-01-09 23:13.45 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_636000.pt
2023-01-09 23:13.45 [info     ] DoubleDQN_online_20230109230913: epoch=636 step=636000 epoch=636 metrics={'time_inference': 0.0001921234130859375, 'time_environment_step': 1.2013673782348634e-05, 'time_step': 0.0002485992908477783, 'rollout_return': 200.0, 'time_sample_batch': 8.544921875e-05, 'time_algorithm_update': 0.002982640266418457, 'los

2023-01-09 23:13.53 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_651000.pt
2023-01-09 23:13.53 [info     ] DoubleDQN_online_20230109230913: epoch=651 step=651000 epoch=651 metrics={'time_inference': 0.00019636321067810058, 'time_environment_step': 1.2403249740600587e-05, 'time_step': 0.00025554299354553224, 'rollout_return': 196.8, 'time_sample_batch': 8.511543273925781e-05, 'time_algorithm_update': 0.0031855106353759766, 'loss': 4.15745062828064, 'evaluation': 174.7} step=651000
2023-01-09 23:13.53 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_652000.pt
2023-01-09 23:13.53 [info     ] DoubleDQN_online_20230109230913: epoch=652 step=652000 epoch=652 metrics={'time_inference': 0.00019365477561950683, 'time_environment_step': 1.2201786041259765e-05, 'time_step': 0.00025255966186523435, 'rollout_return': 200.0, 'time_sample_batch': 9.152889251708984e-05, 'time_algorithm_update': 0.0031920909881591

2023-01-09 23:14.00 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_667000.pt
2023-01-09 23:14.00 [info     ] DoubleDQN_online_20230109230913: epoch=667 step=667000 epoch=667 metrics={'time_inference': 0.00019505596160888672, 'time_environment_step': 1.23291015625e-05, 'time_step': 0.0002541484832763672, 'time_sample_batch': 9.069442749023437e-05, 'time_algorithm_update': 0.0031743288040161134, 'loss': 4.320819568634033, 'rollout_return': 197.8, 'evaluation': 185.9} step=667000
2023-01-09 23:14.01 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_668000.pt
2023-01-09 23:14.01 [info     ] DoubleDQN_online_20230109230913: epoch=668 step=668000 epoch=668 metrics={'time_inference': 0.00018597030639648437, 'time_environment_step': 1.19781494140625e-05, 'time_step': 0.00024282121658325196, 'time_sample_batch': 8.039474487304687e-05, 'time_algorithm_update': 0.003041195869445801, 'loss': 3.6476507425308227, 

2023-01-09 23:14.08 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_683000.pt
2023-01-09 23:14.08 [info     ] DoubleDQN_online_20230109230913: epoch=683 step=683000 epoch=683 metrics={'time_inference': 0.0001859748363494873, 'time_environment_step': 1.1770963668823243e-05, 'time_step': 0.00024294185638427734, 'rollout_return': 200.0, 'time_sample_batch': 8.485317230224609e-05, 'time_algorithm_update': 0.0030904531478881834, 'loss': 3.998861861228943, 'evaluation': 197.7} step=683000
2023-01-09 23:14.09 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_684000.pt
2023-01-09 23:14.09 [info     ] DoubleDQN_online_20230109230913: epoch=684 step=684000 epoch=684 metrics={'time_inference': 0.0001911191940307617, 'time_environment_step': 1.2093067169189453e-05, 'time_step': 0.00024857258796691893, 'rollout_return': 200.0, 'time_sample_batch': 9.560585021972656e-05, 'time_algorithm_update': 0.00306339263916015

2023-01-09 23:14.16 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_699000.pt
2023-01-09 23:14.16 [info     ] DoubleDQN_online_20230109230913: epoch=699 step=699000 epoch=699 metrics={'time_inference': 0.00018819856643676757, 'time_environment_step': 1.1932134628295898e-05, 'time_step': 0.00025019288063049316, 'rollout_return': 200.0, 'time_sample_batch': 8.144378662109376e-05, 'time_algorithm_update': 0.003565049171447754, 'loss': 3.5758945465087892, 'evaluation': 196.0} step=699000
2023-01-09 23:14.16 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_700000.pt
2023-01-09 23:14.16 [info     ] DoubleDQN_online_20230109230913: epoch=700 step=700000 epoch=700 metrics={'time_inference': 0.00018660330772399904, 'time_environment_step': 1.1891603469848633e-05, 'time_step': 0.0002490837574005127, 'rollout_return': 200.0, 'time_sample_batch': 8.702278137207031e-05, 'time_algorithm_update': 0.0036197423934936

2023-01-09 23:14.24 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_715000.pt
2023-01-09 23:14.24 [info     ] DoubleDQN_online_20230109230913: epoch=715 step=715000 epoch=715 metrics={'time_inference': 0.00019022846221923828, 'time_environment_step': 1.2167930603027344e-05, 'time_step': 0.00025552654266357424, 'time_sample_batch': 9.450912475585937e-05, 'time_algorithm_update': 0.003825211524963379, 'loss': 4.178492736816406, 'rollout_return': 200.0, 'evaluation': 192.6} step=715000
2023-01-09 23:14.25 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_716000.pt
2023-01-09 23:14.25 [info     ] DoubleDQN_online_20230109230913: epoch=716 step=716000 epoch=716 metrics={'time_inference': 0.00019574260711669922, 'time_environment_step': 1.2516498565673828e-05, 'time_step': 0.0002616405487060547, 'time_sample_batch': 8.716583251953126e-05, 'time_algorithm_update': 0.0038186073303222655, 'loss': 5.07671861648

2023-01-09 23:14.32 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_731000.pt
2023-01-09 23:14.32 [info     ] DoubleDQN_online_20230109230913: epoch=731 step=731000 epoch=731 metrics={'time_inference': 0.00019234395027160645, 'time_environment_step': 1.2241601943969726e-05, 'time_step': 0.0002581336498260498, 'rollout_return': 200.0, 'time_sample_batch': 0.00010125637054443359, 'time_algorithm_update': 0.0038441896438598635, 'loss': 2.920514702796936, 'evaluation': 185.7} step=731000
2023-01-09 23:14.33 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_732000.pt
2023-01-09 23:14.33 [info     ] DoubleDQN_online_20230109230913: epoch=732 step=732000 epoch=732 metrics={'time_inference': 0.00019095325469970704, 'time_environment_step': 1.2035846710205078e-05, 'time_step': 0.0002565312385559082, 'rollout_return': 200.0, 'time_sample_batch': 0.0001071929931640625, 'time_algorithm_update': 0.0038845539093017

2023-01-09 23:14.40 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_747000.pt
2023-01-09 23:14.40 [info     ] DoubleDQN_online_20230109230913: epoch=747 step=747000 epoch=747 metrics={'time_inference': 0.00019070315361022948, 'time_environment_step': 1.2119293212890625e-05, 'time_step': 0.00025410747528076174, 'rollout_return': 200.0, 'time_sample_batch': 8.144378662109376e-05, 'time_algorithm_update': 0.003667187690734863, 'loss': 3.3488109350204467, 'evaluation': 194.6} step=747000
2023-01-09 23:14.41 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_748000.pt
2023-01-09 23:14.41 [info     ] DoubleDQN_online_20230109230913: epoch=748 step=748000 epoch=748 metrics={'time_inference': 0.00018902373313903808, 'time_environment_step': 1.1916637420654297e-05, 'time_step': 0.0002524435520172119, 'rollout_return': 200.0, 'time_sample_batch': 8.14199447631836e-05, 'time_algorithm_update': 0.00370538234710693

2023-01-09 23:14.48 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_763000.pt
2023-01-09 23:14.48 [info     ] DoubleDQN_online_20230109230913: epoch=763 step=763000 epoch=763 metrics={'time_inference': 0.00019078588485717775, 'time_environment_step': 1.195836067199707e-05, 'time_step': 0.000255979061126709, 'rollout_return': 200.0, 'time_sample_batch': 8.749961853027344e-05, 'time_algorithm_update': 0.0038667917251586914, 'loss': 3.7743481278419493, 'evaluation': 200.0} step=763000
2023-01-09 23:14.49 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_764000.pt
2023-01-09 23:14.49 [info     ] DoubleDQN_online_20230109230913: epoch=764 step=764000 epoch=764 metrics={'time_inference': 0.00018660831451416016, 'time_environment_step': 1.1763572692871095e-05, 'time_step': 0.0002507884502410889, 'rollout_return': 200.0, 'time_sample_batch': 9.329319000244141e-05, 'time_algorithm_update': 0.003763127326965332

2023-01-09 23:14.56 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_779000.pt
2023-01-09 23:14.56 [info     ] DoubleDQN_online_20230109230913: epoch=779 step=779000 epoch=779 metrics={'time_inference': 0.00018503332138061524, 'time_environment_step': 1.1540412902832032e-05, 'time_step': 0.00024687457084655764, 'time_sample_batch': 7.84158706665039e-05, 'time_algorithm_update': 0.0036208629608154297, 'loss': 4.422282767295838, 'rollout_return': 200.0, 'evaluation': 189.4} step=779000
2023-01-09 23:14.57 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_780000.pt
2023-01-09 23:14.57 [info     ] DoubleDQN_online_20230109230913: epoch=780 step=780000 epoch=780 metrics={'time_inference': 0.00018933510780334473, 'time_environment_step': 1.1774063110351563e-05, 'time_step': 0.0002525045871734619, 'time_sample_batch': 8.349418640136718e-05, 'time_algorithm_update': 0.003697848320007324, 'loss': 3.162087070941

2023-01-09 23:15.04 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_795000.pt
2023-01-09 23:15.04 [info     ] DoubleDQN_online_20230109230913: epoch=795 step=795000 epoch=795 metrics={'time_inference': 0.00018630528450012208, 'time_environment_step': 1.1661052703857421e-05, 'time_step': 0.00024711012840270994, 'time_sample_batch': 8.087158203125e-05, 'time_algorithm_update': 0.003497886657714844, 'loss': 4.03177580833435, 'rollout_return': 200.0, 'evaluation': 192.9} step=795000
2023-01-09 23:15.04 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_796000.pt
2023-01-09 23:15.04 [info     ] DoubleDQN_online_20230109230913: epoch=796 step=796000 epoch=796 metrics={'time_inference': 0.00018371868133544922, 'time_environment_step': 1.144862174987793e-05, 'time_step': 0.0002437429428100586, 'time_sample_batch': 7.987022399902344e-05, 'time_algorithm_update': 0.0034627437591552733, 'loss': 4.617037391662597,

2023-01-09 23:15.11 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_811000.pt
2023-01-09 23:15.11 [info     ] DoubleDQN_online_20230109230913: epoch=811 step=811000 epoch=811 metrics={'time_inference': 0.00018421363830566407, 'time_environment_step': 1.1577844619750977e-05, 'time_step': 0.0002444436550140381, 'time_sample_batch': 8.4686279296875e-05, 'time_algorithm_update': 0.0034598588943481447, 'loss': 3.4663763880729674, 'rollout_return': 200.0, 'evaluation': 190.4} step=811000
2023-01-09 23:15.12 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_812000.pt
2023-01-09 23:15.12 [info     ] DoubleDQN_online_20230109230913: epoch=812 step=812000 epoch=812 metrics={'time_inference': 0.00018073010444641114, 'time_environment_step': 1.1350393295288086e-05, 'time_step': 0.00024386167526245118, 'time_sample_batch': 9.489059448242188e-05, 'time_algorithm_update': 0.0037213802337646485, 'loss': 4.11899151802

2023-01-09 23:15.19 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_827000.pt
2023-01-09 23:15.19 [info     ] DoubleDQN_online_20230109230913: epoch=827 step=827000 epoch=827 metrics={'time_inference': 0.00017918705940246583, 'time_environment_step': 1.124882698059082e-05, 'time_step': 0.00023956775665283204, 'time_sample_batch': 8.273124694824219e-05, 'time_algorithm_update': 0.003540968894958496, 'loss': 3.0407947063446046, 'rollout_return': 200.0, 'evaluation': 192.4} step=827000
2023-01-09 23:15.19 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_828000.pt
2023-01-09 23:15.19 [info     ] DoubleDQN_online_20230109230913: epoch=828 step=828000 epoch=828 metrics={'time_inference': 0.0001870262622833252, 'time_environment_step': 1.17032527923584e-05, 'time_step': 0.00024732542037963866, 'time_sample_batch': 8.151531219482421e-05, 'time_algorithm_update': 0.0034420013427734373, 'loss': 3.8312332630157

2023-01-09 23:15.26 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_843000.pt
2023-01-09 23:15.26 [info     ] DoubleDQN_online_20230109230913: epoch=843 step=843000 epoch=843 metrics={'time_inference': 0.00018120217323303224, 'time_environment_step': 1.150345802307129e-05, 'time_step': 0.00023894691467285156, 'rollout_return': 200.0, 'time_sample_batch': 7.846355438232422e-05, 'time_algorithm_update': 0.0032375335693359377, 'loss': 3.6474560260772706, 'evaluation': 171.2} step=843000
2023-01-09 23:15.27 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_844000.pt
2023-01-09 23:15.27 [info     ] DoubleDQN_online_20230109230913: epoch=844 step=844000 epoch=844 metrics={'time_inference': 0.00017657184600830078, 'time_environment_step': 1.1107206344604493e-05, 'time_step': 0.0002365691661834717, 'rollout_return': 200.0, 'time_sample_batch': 8.237361907958984e-05, 'time_algorithm_update': 0.0035298824310302

2023-01-09 23:15.34 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_859000.pt
2023-01-09 23:15.34 [info     ] DoubleDQN_online_20230109230913: epoch=859 step=859000 epoch=859 metrics={'time_inference': 0.0001894681453704834, 'time_environment_step': 1.1969804763793945e-05, 'time_step': 0.0002502620220184326, 'rollout_return': 200.0, 'time_sample_batch': 0.0001007080078125, 'time_algorithm_update': 0.0033615827560424805, 'loss': 3.1263790488243104, 'evaluation': 197.5} step=859000
2023-01-09 23:15.34 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_860000.pt
2023-01-09 23:15.34 [info     ] DoubleDQN_online_20230109230913: epoch=860 step=860000 epoch=860 metrics={'time_inference': 0.00018137335777282715, 'time_environment_step': 1.1408090591430663e-05, 'time_step': 0.0002395923137664795, 'rollout_return': 200.0, 'time_sample_batch': 7.734298706054687e-05, 'time_algorithm_update': 0.003298020362854004, 

2023-01-09 23:15.42 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_875000.pt
2023-01-09 23:15.42 [info     ] DoubleDQN_online_20230109230913: epoch=875 step=875000 epoch=875 metrics={'time_inference': 0.00019992017745971679, 'time_environment_step': 1.267719268798828e-05, 'time_step': 0.00026531481742858885, 'rollout_return': 200.0, 'time_sample_batch': 8.847713470458985e-05, 'time_algorithm_update': 0.0036650657653808593, 'loss': 3.673934042453766, 'evaluation': 185.7} step=875000
2023-01-09 23:15.42 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_876000.pt
2023-01-09 23:15.42 [info     ] DoubleDQN_online_20230109230913: epoch=876 step=876000 epoch=876 metrics={'time_inference': 0.00019708585739135743, 'time_environment_step': 1.2467384338378907e-05, 'time_step': 0.0002585742473602295, 'rollout_return': 200.0, 'time_sample_batch': 8.33749771118164e-05, 'time_algorithm_update': 0.003389835357666015

2023-01-09 23:15.50 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_891000.pt
2023-01-09 23:15.50 [info     ] DoubleDQN_online_20230109230913: epoch=891 step=891000 epoch=891 metrics={'time_inference': 0.00019100618362426759, 'time_environment_step': 1.214003562927246e-05, 'time_step': 0.0002479531764984131, 'rollout_return': 200.0, 'time_sample_batch': 8.292198181152344e-05, 'time_algorithm_update': 0.0030246257781982424, 'loss': 2.9484012722969055, 'evaluation': 199.2} step=891000
2023-01-09 23:15.50 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_892000.pt
2023-01-09 23:15.50 [info     ] DoubleDQN_online_20230109230913: epoch=892 step=892000 epoch=892 metrics={'time_inference': 0.00018768930435180664, 'time_environment_step': 1.1968374252319335e-05, 'time_step': 0.000245408296585083, 'rollout_return': 200.0, 'time_sample_batch': 9.677410125732421e-05, 'time_algorithm_update': 0.00311431884765625,

2023-01-09 23:15.58 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_907000.pt
2023-01-09 23:15.58 [info     ] DoubleDQN_online_20230109230913: epoch=907 step=907000 epoch=907 metrics={'time_inference': 0.00019182157516479492, 'time_environment_step': 1.2255430221557617e-05, 'time_step': 0.0002483806610107422, 'rollout_return': 200.0, 'time_sample_batch': 9.167194366455078e-05, 'time_algorithm_update': 0.0029456615447998047, 'loss': 2.8476647973060607, 'evaluation': 198.8} step=907000
2023-01-09 23:15.58 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_908000.pt
2023-01-09 23:15.58 [info     ] DoubleDQN_online_20230109230913: epoch=908 step=908000 epoch=908 metrics={'time_inference': 0.00019393277168273926, 'time_environment_step': 1.2314319610595703e-05, 'time_step': 0.0002500438690185547, 'rollout_return': 200.0, 'time_sample_batch': 8.304119110107422e-05, 'time_algorithm_update': 0.0028992891311645

2023-01-09 23:16.06 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_923000.pt
2023-01-09 23:16.06 [info     ] DoubleDQN_online_20230109230913: epoch=923 step=923000 epoch=923 metrics={'time_inference': 0.00019301605224609375, 'time_environment_step': 1.2356042861938477e-05, 'time_step': 0.0002504916191101074, 'rollout_return': 200.0, 'time_sample_batch': 8.280277252197266e-05, 'time_algorithm_update': 0.003030991554260254, 'loss': 3.0850295066833495, 'evaluation': 200.0} step=923000
2023-01-09 23:16.06 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_924000.pt
2023-01-09 23:16.06 [info     ] DoubleDQN_online_20230109230913: epoch=924 step=924000 epoch=924 metrics={'time_inference': 0.00019269967079162597, 'time_environment_step': 1.2313127517700196e-05, 'time_step': 0.00025052499771118165, 'rollout_return': 200.0, 'time_sample_batch': 8.71896743774414e-05, 'time_algorithm_update': 0.00307366847991943

2023-01-09 23:16.14 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_939000.pt
2023-01-09 23:16.14 [info     ] DoubleDQN_online_20230109230913: epoch=939 step=939000 epoch=939 metrics={'time_inference': 0.0001855144500732422, 'time_environment_step': 1.186823844909668e-05, 'time_step': 0.00024166369438171386, 'rollout_return': 200.0, 'time_sample_batch': 8.230209350585937e-05, 'time_algorithm_update': 0.0030002355575561523, 'loss': 3.3416054964065554, 'evaluation': 196.1} step=939000
2023-01-09 23:16.14 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_940000.pt
2023-01-09 23:16.14 [info     ] DoubleDQN_online_20230109230913: epoch=940 step=940000 epoch=940 metrics={'time_inference': 0.00018558645248413085, 'time_environment_step': 1.1979818344116211e-05, 'time_step': 0.0002440354824066162, 'rollout_return': 200.0, 'time_sample_batch': 9.953975677490234e-05, 'time_algorithm_update': 0.00318193435668945

2023-01-09 23:16.22 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_955000.pt
2023-01-09 23:16.22 [info     ] DoubleDQN_online_20230109230913: epoch=955 step=955000 epoch=955 metrics={'time_inference': 0.00018744444847106933, 'time_environment_step': 1.2000322341918945e-05, 'time_step': 0.00024498820304870607, 'rollout_return': 200.0, 'time_sample_batch': 9.26971435546875e-05, 'time_algorithm_update': 0.003103470802307129, 'loss': 2.9173078775405883, 'evaluation': 193.8} step=955000
2023-01-09 23:16.22 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_956000.pt
2023-01-09 23:16.22 [info     ] DoubleDQN_online_20230109230913: epoch=956 step=956000 epoch=956 metrics={'time_inference': 0.00018494868278503418, 'time_environment_step': 1.1895179748535157e-05, 'time_step': 0.00024546337127685547, 'rollout_return': 200.0, 'time_sample_batch': 8.664131164550781e-05, 'time_algorithm_update': 0.0034299373626708

2023-01-09 23:16.30 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_971000.pt
2023-01-09 23:16.30 [info     ] DoubleDQN_online_20230109230913: epoch=971 step=971000 epoch=971 metrics={'time_inference': 0.00018904781341552734, 'time_environment_step': 1.2251853942871093e-05, 'time_step': 0.0002483832836151123, 'rollout_return': 200.0, 'time_sample_batch': 9.350776672363281e-05, 'time_algorithm_update': 0.0032314777374267576, 'loss': 3.2681543588638307, 'evaluation': 199.7} step=971000
2023-01-09 23:16.30 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_972000.pt
2023-01-09 23:16.30 [info     ] DoubleDQN_online_20230109230913: epoch=972 step=972000 epoch=972 metrics={'time_inference': 0.00018679213523864745, 'time_environment_step': 1.2262821197509765e-05, 'time_step': 0.0002442426681518555, 'rollout_return': 200.0, 'time_sample_batch': 8.265972137451172e-05, 'time_algorithm_update': 0.0030710935592651

2023-01-09 23:16.38 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_987000.pt
2023-01-09 23:16.38 [info     ] DoubleDQN_online_20230109230913: epoch=987 step=987000 epoch=987 metrics={'time_inference': 0.00019059658050537108, 'time_environment_step': 1.2299537658691406e-05, 'time_step': 0.00024830198287963866, 'rollout_return': 200.0, 'time_sample_batch': 8.296966552734375e-05, 'time_algorithm_update': 0.0030692338943481444, 'loss': 2.982913863658905, 'evaluation': 176.7} step=987000
2023-01-09 23:16.38 [info     ] Model parameters are saved to d3rlpy_logs/DoubleDQN_online_20230109230913/model_988000.pt
2023-01-09 23:16.38 [info     ] DoubleDQN_online_20230109230913: epoch=988 step=988000 epoch=988 metrics={'time_inference': 0.00020835351943969727, 'time_environment_step': 1.3186216354370118e-05, 'time_step': 0.000272294282913208, 'rollout_return': 200.0, 'time_sample_batch': 0.0001146554946899414, 'time_algorithm_update': 0.00347986221313476