# Solving RL problems with `ray[rllib]`

<img src="images/cartpole.jpg" width="500"></img>

## Step 1: Initialize `ray`

- `ray` is a package providing distributed computing primitives. `rllib` is built on `ray`.

In [1]:
import ray

ray.init()

{'node_ip_address': '192.168.0.98',
 'raylet_ip_address': '192.168.0.98',
 'redis_address': None,
 'object_store_address': '/tmp/ray/session_2022-12-23_16-35-03_047201_3487/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2022-12-23_16-35-03_047201_3487/sockets/raylet',
 'webui_url': None,
 'session_dir': '/tmp/ray/session_2022-12-23_16-35-03_047201_3487',
 'metrics_export_port': 65412,
 'gcs_address': '192.168.0.98:63541',
 'address': '192.168.0.98:63541',
 'node_id': 'c654f33cd9d7ab46008a4f0889879874b4cab53d76b0e12fecbd93fe'}

## Step 2: Run an **experiment** to solve RL problems

An experiment involves four things
- A **RL environment** (e.g. `CartPole-v1`)
- A **RL algorithm** to learn in that environment (e.g. Proximal Policy Optimization (PPO))
- **Configuration** (algorithm config, experiment config, environment config etc.)
- An **experiment runner** (called `tune`)

In [None]:
from ray import tune

tune.run("PPO",
         config={
             "env": "CartPole-v1",
                 # other configurations go here, if none provided, then default configurations will be used
         })

[2m[36m(PPOTrainer pid=3658)[0m 2022-12-23 16:35:56,890	INFO trainer.py:2140 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
[2m[36m(PPOTrainer pid=3658)[0m 2022-12-23 16:35:56,890	INFO ppo.py:249 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
[2m[36m(PPOTrainer pid=3658)[0m 2022-12-23 16:35:56,891	INFO trainer.py:779 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.


Trial name,status,loc
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658




Trial name,status,loc
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658




Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 4000
  custom_metrics: {}
  date: 2022-12-23_16-36-03
  done: false
  episode_len_mean: 22.666666666666668
  episode_media: {}
  episode_reward_max: 90.0
  episode_reward_mean: 22.666666666666668
  episode_reward_min: 9.0
  episodes_this_iter: 174
  episodes_total: 174
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 0.6677238941192627
          entropy_coeff: 0.0
          kl: 0.0263458751142025
          model: {}
          policy_loss: -0.031717997044324875
          total_loss: 222.7870330810547
          vf_explained_var: 0.019372833892703056
          vf_loss: 222.8134765625
    num_agent_steps_sampled: 4000
    num_agent_steps_trained: 4000
    num_steps_sampled: 4000
    num_steps_trained: 4000
    num_ste

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,1,3.32443,4000,22.6667,90,9,22.6667


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 12000
  custom_metrics: {}
  date: 2022-12-23_16-36-09
  done: false
  episode_len_mean: 67.2
  episode_media: {}
  episode_reward_max: 281.0
  episode_reward_mean: 67.2
  episode_reward_min: 10.0
  episodes_this_iter: 38
  episodes_total: 299
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.30000001192092896
          cur_lr: 4.999999873689376e-05
          entropy: 0.5797845721244812
          entropy_coeff: 0.0
          kl: 0.010812313295900822
          model: {}
          policy_loss: -0.02183121256530285
          total_loss: 757.8624267578125
          vf_explained_var: 0.13413472473621368
          vf_loss: 757.8809814453125
    num_agent_steps_sampled: 12000
    num_agent_steps_trained: 12000
    num_steps_sampled: 12000
    num_steps_trained: 12000
    num_steps_trained_this_ite

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,3,9.39606,12000,67.2,281,10,67.2


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 20000
  custom_metrics: {}
  date: 2022-12-23_16-36-15
  done: false
  episode_len_mean: 128.16
  episode_media: {}
  episode_reward_max: 456.0
  episode_reward_mean: 128.16
  episode_reward_min: 10.0
  episodes_this_iter: 16
  episodes_total: 334
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.30000001192092896
          cur_lr: 4.999999873689376e-05
          entropy: 0.5438970327377319
          entropy_coeff: 0.0
          kl: 0.005624917335808277
          model: {}
          policy_loss: -0.010234670713543892
          total_loss: 693.485107421875
          vf_explained_var: 0.2024756669998169
          vf_loss: 693.4935302734375
    num_agent_steps_sampled: 20000
    num_agent_steps_trained: 20000
    num_steps_sampled: 20000
    num_steps_trained: 20000
    num_steps_trained_this_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,5,15.5475,20000,128.16,456,10,128.16


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,6,18.5533,24000,161.53,500,10,161.53


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 28000
  custom_metrics: {}
  date: 2022-12-23_16-36-21
  done: false
  episode_len_mean: 200.43
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 200.43
  episode_reward_min: 19.0
  episodes_this_iter: 10
  episodes_total: 356
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.15000000596046448
          cur_lr: 4.999999873689376e-05
          entropy: 0.5294512510299683
          entropy_coeff: 0.0
          kl: 0.00397925078868866
          model: {}
          policy_loss: -0.008955384604632854
          total_loss: 438.78582763671875
          vf_explained_var: 0.16676998138427734
          vf_loss: 438.7941589355469
    num_agent_steps_sampled: 28000
    num_agent_steps_trained: 28000
    num_steps_sampled: 28000
    num_steps_trained: 28000
    num_steps_trained_thi

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,8,24.6207,32000,230.13,500,19,230.13


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 36000
  custom_metrics: {}
  date: 2022-12-23_16-36-27
  done: false
  episode_len_mean: 265.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 265.0
  episode_reward_min: 19.0
  episodes_this_iter: 10
  episodes_total: 374
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.07500000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 0.487382709980011
          entropy_coeff: 0.0
          kl: 0.002882240107282996
          model: {}
          policy_loss: -0.0043022544123232365
          total_loss: 504.5002136230469
          vf_explained_var: 0.026964129880070686
          vf_loss: 504.5043029785156
    num_agent_steps_sampled: 36000
    num_agent_steps_trained: 36000
    num_steps_sampled: 36000
    num_steps_trained: 36000
    num_steps_trained_this

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,10,30.7445,40000,290.94,500,19,290.94


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 44000
  custom_metrics: {}
  date: 2022-12-23_16-36-34
  done: false
  episode_len_mean: 324.9
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 324.9
  episode_reward_min: 19.0
  episodes_this_iter: 9
  episodes_total: 391
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.01875000074505806
          cur_lr: 4.999999873689376e-05
          entropy: 0.5414067506790161
          entropy_coeff: 0.0
          kl: 0.004195119719952345
          model: {}
          policy_loss: -0.0025419823359698057
          total_loss: 468.4158935546875
          vf_explained_var: 0.08807938545942307
          vf_loss: 468.4183349609375
    num_agent_steps_sampled: 44000
    num_agent_steps_trained: 44000
    num_steps_sampled: 44000
    num_steps_trained: 44000
    num_steps_trained_this_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,11,33.7701,44000,324.9,500,19,324.9


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 52000
  custom_metrics: {}
  date: 2022-12-23_16-36-40
  done: false
  episode_len_mean: 377.26
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 377.26
  episode_reward_min: 56.0
  episodes_this_iter: 8
  episodes_total: 407
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.00937500037252903
          cur_lr: 4.999999873689376e-05
          entropy: 0.5010088682174683
          entropy_coeff: 0.0
          kl: 0.006605195347219706
          model: {}
          policy_loss: -0.00037172867450863123
          total_loss: 484.509765625
          vf_explained_var: 0.030472297221422195
          vf_loss: 484.51007080078125
    num_agent_steps_sampled: 52000
    num_agent_steps_trained: 52000
    num_steps_sampled: 52000
    num_steps_trained: 52000
    num_steps_trained_this

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,13,39.8612,52000,377.26,500,56,377.26


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 60000
  custom_metrics: {}
  date: 2022-12-23_16-36-46
  done: false
  episode_len_mean: 426.26
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 426.26
  episode_reward_min: 145.0
  episodes_this_iter: 8
  episodes_total: 423
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.00937500037252903
          cur_lr: 4.999999873689376e-05
          entropy: 0.5119425654411316
          entropy_coeff: 0.0
          kl: 0.002727864310145378
          model: {}
          policy_loss: -0.0022586756385862827
          total_loss: 531.2691650390625
          vf_explained_var: -0.018852492794394493
          vf_loss: 531.2714233398438
    num_agent_steps_sampled: 60000
    num_agent_steps_trained: 60000
    num_steps_sampled: 60000
    num_steps_trained: 60000
    num_steps_trained_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,15,45.9458,60000,426.26,500,145,426.26


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,16,49.0046,64000,444.84,500,152,444.84


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 68000
  custom_metrics: {}
  date: 2022-12-23_16-36-52
  done: false
  episode_len_mean: 458.9
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 458.9
  episode_reward_min: 85.0
  episodes_this_iter: 9
  episodes_total: 440
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0023437500931322575
          cur_lr: 4.999999873689376e-05
          entropy: 0.45621252059936523
          entropy_coeff: 0.0
          kl: 0.0029501868411898613
          model: {}
          policy_loss: -0.0014081724220886827
          total_loss: 563.0380249023438
          vf_explained_var: -0.06301168352365494
          vf_loss: 563.0394287109375
    num_agent_steps_sampled: 68000
    num_agent_steps_trained: 68000
    num_steps_sampled: 68000
    num_steps_trained: 68000
    num_steps_trained_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,18,55.0431,72000,472.96,500,85,472.96


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 76000
  custom_metrics: {}
  date: 2022-12-23_16-36-58
  done: false
  episode_len_mean: 478.36
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 478.36
  episode_reward_min: 85.0
  episodes_this_iter: 8
  episodes_total: 456
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0011718750465661287
          cur_lr: 4.999999873689376e-05
          entropy: 0.4239750802516937
          entropy_coeff: 0.0
          kl: 0.005886388476938009
          model: {}
          policy_loss: -0.0022193826735019684
          total_loss: 529.1740112304688
          vf_explained_var: -0.021180717274546623
          vf_loss: 529.1762084960938
    num_agent_steps_sampled: 76000
    num_agent_steps_trained: 76000
    num_steps_sampled: 76000
    num_steps_trained: 76000
    num_steps_trained

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,20,61.1105,80000,481.37,500,85,481.37


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 84000
  custom_metrics: {}
  date: 2022-12-23_16-37-04
  done: false
  episode_len_mean: 488.19
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 488.19
  episode_reward_min: 85.0
  episodes_this_iter: 8
  episodes_total: 472
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0005859375232830644
          cur_lr: 4.999999873689376e-05
          entropy: 0.3980620205402374
          entropy_coeff: 0.0
          kl: 0.0037860707379877567
          model: {}
          policy_loss: -0.0029902078676968813
          total_loss: 501.73126220703125
          vf_explained_var: 0.03084975853562355
          vf_loss: 501.73419189453125
    num_agent_steps_sampled: 84000
    num_agent_steps_trained: 84000
    num_steps_sampled: 84000
    num_steps_trained: 84000
    num_steps_traine

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,21,64.1397,84000,488.19,500,85,488.19


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 92000
  custom_metrics: {}
  date: 2022-12-23_16-37-10
  done: false
  episode_len_mean: 491.45
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 491.45
  episode_reward_min: 85.0
  episodes_this_iter: 8
  episodes_total: 488
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0001464843808207661
          cur_lr: 4.999999873689376e-05
          entropy: 0.39177200198173523
          entropy_coeff: 0.0
          kl: 0.006307192612439394
          model: {}
          policy_loss: -0.004075607750564814
          total_loss: 513.6663818359375
          vf_explained_var: 0.03715536370873451
          vf_loss: 513.67041015625
    num_agent_steps_sampled: 92000
    num_agent_steps_trained: 92000
    num_steps_sampled: 92000
    num_steps_trained: 92000
    num_steps_trained_thi

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,23,70.3336,92000,491.45,500,85,491.45


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 100000
  custom_metrics: {}
  date: 2022-12-23_16-37-16
  done: false
  episode_len_mean: 495.85
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 495.85
  episode_reward_min: 85.0
  episodes_this_iter: 8
  episodes_total: 504
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0001464843808207661
          cur_lr: 4.999999873689376e-05
          entropy: 0.38255375623703003
          entropy_coeff: 0.0
          kl: 0.004798873793333769
          model: {}
          policy_loss: -0.002857438288629055
          total_loss: 499.1186828613281
          vf_explained_var: -0.08942034095525742
          vf_loss: 499.1215515136719
    num_agent_steps_sampled: 100000
    num_agent_steps_trained: 100000
    num_steps_sampled: 100000
    num_steps_trained: 100000
    num_steps_tra

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,25,76.3871,100000,495.85,500,85,495.85


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,26,79.4227,104000,495.85,500,85,495.85


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 108000
  custom_metrics: {}
  date: 2022-12-23_16-37-22
  done: false
  episode_len_mean: 495.85
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 495.85
  episode_reward_min: 85.0
  episodes_this_iter: 8
  episodes_total: 520
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 3.662109520519152e-05
          cur_lr: 4.999999873689376e-05
          entropy: 0.3938816487789154
          entropy_coeff: 0.0
          kl: 0.0015072141541168094
          model: {}
          policy_loss: 0.00038365216460078955
          total_loss: 539.7266235351562
          vf_explained_var: -0.0835423469543457
          vf_loss: 539.7262573242188
    num_agent_steps_sampled: 108000
    num_agent_steps_trained: 108000
    num_steps_sampled: 108000
    num_steps_trained: 108000
    num_steps_tra

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,28,85.4967,112000,495.85,500,85,495.85


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 116000
  custom_metrics: {}
  date: 2022-12-23_16-37-29
  done: false
  episode_len_mean: 495.85
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 495.85
  episode_reward_min: 85.0
  episodes_this_iter: 8
  episodes_total: 536
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 9.15527380129788e-06
          cur_lr: 4.999999873689376e-05
          entropy: 0.38424333930015564
          entropy_coeff: 0.0
          kl: 0.006487591657787561
          model: {}
          policy_loss: -0.0018671861616894603
          total_loss: 507.647216796875
          vf_explained_var: -0.022645175457000732
          vf_loss: 507.6490478515625
    num_agent_steps_sampled: 116000
    num_agent_steps_trained: 116000
    num_steps_sampled: 116000
    num_steps_trained: 116000
    num_steps_tra

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,30,91.5883,120000,500,500,500,500


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 124000
  custom_metrics: {}
  date: 2022-12-23_16-37-35
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 552
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 9.15527380129788e-06
          cur_lr: 4.999999873689376e-05
          entropy: 0.4100276827812195
          entropy_coeff: 0.0
          kl: 0.002599272644147277
          model: {}
          policy_loss: -0.0005167347262613475
          total_loss: 534.5759887695312
          vf_explained_var: 0.02402246743440628
          vf_loss: 534.5764770507812
    num_agent_steps_sampled: 124000
    num_agent_steps_trained: 124000
    num_steps_sampled: 124000
    num_steps_trained: 124000
    num_steps_traine

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,31,94.6975,124000,500,500,500,500


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 132000
  custom_metrics: {}
  date: 2022-12-23_16-37-41
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 568
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.28881845032447e-06
          cur_lr: 4.999999873689376e-05
          entropy: 0.4069482982158661
          entropy_coeff: 0.0
          kl: 0.005019642412662506
          model: {}
          policy_loss: 8.927865565055981e-05
          total_loss: 497.6102600097656
          vf_explained_var: -0.0994281992316246
          vf_loss: 497.6101379394531
    num_agent_steps_sampled: 132000
    num_agent_steps_trained: 132000
    num_steps_sampled: 132000
    num_steps_trained: 132000
    num_steps_trained

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,33,100.765,132000,500,500,500,500


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 140000
  custom_metrics: {}
  date: 2022-12-23_16-37-47
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 584
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 1.144409225162235e-06
          cur_lr: 4.999999873689376e-05
          entropy: 0.3817431926727295
          entropy_coeff: 0.0
          kl: 0.0031131699215620756
          model: {}
          policy_loss: -0.0003612424770835787
          total_loss: 545.5170288085938
          vf_explained_var: -0.13232421875
          vf_loss: 545.5173950195312
    num_agent_steps_sampled: 140000
    num_agent_steps_trained: 140000
    num_steps_sampled: 140000
    num_steps_trained: 140000
    num_steps_trained_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,35,106.842,140000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,36,109.881,144000,500,500,500,500


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 148000
  custom_metrics: {}
  date: 2022-12-23_16-37-53
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 600
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.8610230629055877e-07
          cur_lr: 4.999999873689376e-05
          entropy: 0.3397999703884125
          entropy_coeff: 0.0
          kl: 0.002356970449909568
          model: {}
          policy_loss: 9.650844731368124e-05
          total_loss: 500.5468444824219
          vf_explained_var: -0.057527437806129456
          vf_loss: 500.54669189453125
    num_agent_steps_sampled: 148000
    num_agent_steps_trained: 148000
    num_steps_sampled: 148000
    num_steps_trained: 148000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,38,116.032,152000,500,500,500,500


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 156000
  custom_metrics: {}
  date: 2022-12-23_16-37-59
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 616
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 7.152557657263969e-08
          cur_lr: 4.999999873689376e-05
          entropy: 0.3305110037326813
          entropy_coeff: 0.0
          kl: 0.003707656404003501
          model: {}
          policy_loss: -0.0004925053799524903
          total_loss: 542.630126953125
          vf_explained_var: -0.0788637027144432
          vf_loss: 542.630615234375
    num_agent_steps_sampled: 156000
    num_agent_steps_trained: 156000
    num_steps_sampled: 156000
    num_steps_trained: 156000
    num_steps_trained

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,40,122.091,160000,500,500,500,500


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 164000
  custom_metrics: {}
  date: 2022-12-23_16-38-05
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 632
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 1.7881394143159923e-08
          cur_lr: 4.999999873689376e-05
          entropy: 0.31372615694999695
          entropy_coeff: 0.0
          kl: 0.0007560974918305874
          model: {}
          policy_loss: -0.001454669632948935
          total_loss: 440.6395263671875
          vf_explained_var: -0.10135575383901596
          vf_loss: 440.6409912109375
    num_agent_steps_sampled: 164000
    num_agent_steps_trained: 164000
    num_steps_sampled: 164000
    num_steps_trained: 164000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,41,125.106,164000,500,500,500,500


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 172000
  custom_metrics: {}
  date: 2022-12-23_16-38-12
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 648
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 4.470348535789981e-09
          cur_lr: 4.999999873689376e-05
          entropy: 0.31648343801498413
          entropy_coeff: 0.0
          kl: 0.001370706595480442
          model: {}
          policy_loss: 0.002188962185755372
          total_loss: 590.8506469726562
          vf_explained_var: -0.33437660336494446
          vf_loss: 590.8485107421875
    num_agent_steps_sampled: 172000
    num_agent_steps_trained: 172000
    num_steps_sampled: 172000
    num_steps_trained: 172000
    num_steps_train

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,43,131.225,172000,500,500,500,500


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 180000
  custom_metrics: {}
  date: 2022-12-23_16-38-18
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 664
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.2351742678949904e-09
          cur_lr: 4.999999873689376e-05
          entropy: 0.30818021297454834
          entropy_coeff: 0.0
          kl: 0.004482596181333065
          model: {}
          policy_loss: -0.0021901451982557774
          total_loss: 534.6130981445312
          vf_explained_var: -0.13855034112930298
          vf_loss: 534.6152954101562
    num_agent_steps_sampled: 180000
    num_agent_steps_trained: 180000
    num_steps_sampled: 180000
    num_steps_trained: 180000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,45,137.307,180000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,46,140.364,184000,500,500,500,500


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 188000
  custom_metrics: {}
  date: 2022-12-23_16-38-24
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 680
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 5.587935669737476e-10
          cur_lr: 4.999999873689376e-05
          entropy: 0.3042510747909546
          entropy_coeff: 0.0
          kl: 0.00245378608815372
          model: {}
          policy_loss: -0.001417657476849854
          total_loss: 499.1313781738281
          vf_explained_var: -0.06330925971269608
          vf_loss: 499.1327819824219
    num_agent_steps_sampled: 188000
    num_agent_steps_trained: 188000
    num_steps_sampled: 188000
    num_steps_trained: 188000
    num_steps_traine

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,48,146.509,192000,500,500,500,500


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 196000
  custom_metrics: {}
  date: 2022-12-23_16-38-30
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 696
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 1.396983917434369e-10
          cur_lr: 4.999999873689376e-05
          entropy: 0.2733024060726166
          entropy_coeff: 0.0
          kl: 0.002296850783750415
          model: {}
          policy_loss: -0.0009159276378341019
          total_loss: 471.1797790527344
          vf_explained_var: 0.13280321657657623
          vf_loss: 471.18072509765625
    num_agent_steps_sampled: 196000
    num_agent_steps_trained: 196000
    num_steps_sampled: 196000
    num_steps_trained: 196000
    num_steps_trai

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,50,152.577,200000,500,500,500,500


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 204000
  custom_metrics: {}
  date: 2022-12-23_16-38-36
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 712
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 3.4924597935859225e-11
          cur_lr: 4.999999873689376e-05
          entropy: 0.27746322751045227
          entropy_coeff: 0.0
          kl: 0.006151467561721802
          model: {}
          policy_loss: -0.0030704454984515905
          total_loss: 481.7795715332031
          vf_explained_var: -0.29841360449790955
          vf_loss: 481.78265380859375
    num_agent_steps_sampled: 204000
    num_agent_steps_trained: 204000
    num_steps_sampled: 204000
    num_steps_trained: 204000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,51,155.606,204000,500,500,500,500


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 212000
  custom_metrics: {}
  date: 2022-12-23_16-38-42
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 728
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 1.7462298967929613e-11
          cur_lr: 4.999999873689376e-05
          entropy: 0.2986508011817932
          entropy_coeff: 0.0
          kl: 0.0013733146479353309
          model: {}
          policy_loss: 0.000683144957292825
          total_loss: 486.56427001953125
          vf_explained_var: -0.2059945911169052
          vf_loss: 486.5635986328125
    num_agent_steps_sampled: 212000
    num_agent_steps_trained: 212000
    num_steps_sampled: 212000
    num_steps_trained: 212000
    num_steps_trai

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,53,161.702,212000,500,500,500,500


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 220000
  custom_metrics: {}
  date: 2022-12-23_16-38-48
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 744
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 4.365574741982403e-12
          cur_lr: 4.999999873689376e-05
          entropy: 0.2781405746936798
          entropy_coeff: 0.0
          kl: 0.003607149003073573
          model: {}
          policy_loss: -0.00031350748031400144
          total_loss: 477.109619140625
          vf_explained_var: -0.11387369781732559
          vf_loss: 477.1099548339844
    num_agent_steps_sampled: 220000
    num_agent_steps_trained: 220000
    num_steps_sampled: 220000
    num_steps_trained: 220000
    num_steps_trai

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,55,167.797,220000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_de04c_00000,RUNNING,192.168.0.98:3658,56,170.827,224000,500,500,500,500


Result for PPO_CartPole-v1_de04c_00000:
  agent_timesteps_total: 228000
  custom_metrics: {}
  date: 2022-12-23_16-38-54
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 760
  experiment_id: 0edb2c66a3eb4d7484838f4f8775a744
  hostname: dl
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.1827873709912016e-12
          cur_lr: 4.999999873689376e-05
          entropy: 0.2417421042919159
          entropy_coeff: 0.0
          kl: 0.0024954811669886112
          model: {}
          policy_loss: 0.00023529196914751083
          total_loss: 499.0150451660156
          vf_explained_var: -0.09930148720741272
          vf_loss: 499.01483154296875
    num_agent_steps_sampled: 228000
    num_agent_steps_trained: 228000
    num_steps_sampled: 228000
    num_steps_trained: 228000
    num_steps_t

### Configuration

These configurations are applied in sequence

1. [Common config](https://docs.ray.io/en/master/rllib-training.html#common-parameters)
2. [Algorithm specific config (overrides common config)](https://docs.ray.io/en/master/rllib-algorithms.html#ppo)
3. User defined config

### Anatomy of an experiment

<img src="images/ex/2.png" width="750"></img>

In [None]:
tune.run("PPO",
         config={"env": "CartPole-v1",
                 "evaluation_interval": 2,    # num of training iter between evaluations
                 "evaluation_num_episodes": 20,
                 "num_gpus": 2
                 }
         )

[2m[36m(PPO pid=11117)[0m 2022-01-03 13:43:18,891	INFO trainer.py:722 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also want to then set `eager_tracing=True` in order to reach similar execution speed as with static-graph mode.
[2m[36m(PPO pid=11117)[0m 2022-01-03 13:43:18,891	INFO ppo.py:166 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
[2m[36m(PPO pid=11117)[0m 2022-01-03 13:43:18,891	INFO trainer.py:743 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.


Trial name,status,loc
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117




Trial name,status,loc
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117




Trial name,status,loc
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 4000
  custom_metrics: {}
  date: 2022-01-03_13-43-29
  done: false
  episode_len_mean: 22.429378531073446
  episode_media: {}
  episode_reward_max: 106.0
  episode_reward_mean: 22.429378531073446
  episode_reward_min: 8.0
  episodes_this_iter: 177
  episodes_total: 177
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 0.6658107042312622
          entropy_coeff: 0.0
          kl: 0.028826190158724785
          model: {}
          policy_loss: -0.04175363481044769
          total_loss: 234.28335571289062
          vf_explained_var: 0.026140272617340088
          vf_loss: 234.3193359375
    num_agent_steps_sampled: 4000
    num_agent_steps_trained: 4000
    num_steps_sampled: 4000
    num_steps_trained: 4000

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,1,6.7837,4000,22.4294,106,8,22.4294


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 8000
  custom_metrics: {}
  date: 2022-01-03_13-43-38
  done: false
  episode_len_mean: 43.34
  episode_media: {}
  episode_reward_max: 179.0
  episode_reward_mean: 43.34
  episode_reward_min: 9.0
  episodes_this_iter: 83
  episodes_total: 260
  evaluation:
    custom_metrics: {}
    episode_len_mean: 74.05
    episode_media: {}
    episode_reward_max: 234.0
    episode_reward_mean: 74.05
    episode_reward_min: 11.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 32
      - 118
      - 157
      - 50
      - 38
      - 46
      - 70
      - 18
      - 62
      - 11
      - 81
      - 18
      - 234
      - 24
      - 97
      - 171
      - 21
      - 34
      - 106
      - 93
      episode_reward:
      - 32.0
      - 118.0
      - 157.0
      - 50.0
      - 38.0
      - 46.0
      - 70.0
      - 18.0
      - 62.0
      - 11.0
      - 81.0
      - 18.0
      - 234.0
      - 24.0
      - 97.0
   

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,2,16.07,8000,43.34,179,9,43.34


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,2,16.07,8000,43.34,179,9,43.34


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 12000
  custom_metrics: {}
  date: 2022-01-03_13-43-45
  done: false
  episode_len_mean: 66.58
  episode_media: {}
  episode_reward_max: 206.0
  episode_reward_mean: 66.58
  episode_reward_min: 12.0
  episodes_this_iter: 46
  episodes_total: 306
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.30000001192092896
          cur_lr: 4.999999873689376e-05
          entropy: 0.5684974789619446
          entropy_coeff: 0.0
          kl: 0.010665263049304485
          model: {}
          policy_loss: -0.019676998257637024
          total_loss: 515.532958984375
          vf_explained_var: 0.09138352423906326
          vf_loss: 515.5494384765625
    num_agent_steps_sampled: 12000
    num_agent_steps_trained: 12000
    num_steps_sampled: 12000
    num_steps_trained: 12000
    num_steps_train

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,3,22.6103,12000,66.58,206,12,66.58


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,3,22.6103,12000,66.58,206,12,66.58


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,3,22.6103,12000,66.58,206,12,66.58


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 16000
  custom_metrics: {}
  date: 2022-01-03_13-44-02
  done: false
  episode_len_mean: 96.8
  episode_media: {}
  episode_reward_max: 360.0
  episode_reward_mean: 96.8
  episode_reward_min: 12.0
  episodes_this_iter: 22
  episodes_total: 328
  evaluation:
    custom_metrics: {}
    episode_len_mean: 293.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 293.0
    episode_reward_min: 91.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 91
      - 121
      - 221
      - 203
      - 271
      - 285
      - 325
      - 343
      - 275
      - 403
      - 385
      - 500
      - 268
      - 471
      - 299
      - 301
      - 307
      - 285
      - 238
      - 268
      episode_reward:
      - 91.0
      - 121.0
      - 221.0
      - 203.0
      - 271.0
      - 285.0
      - 325.0
      - 343.0
      - 275.0
      - 403.0
      - 385.0
      - 500.0
      - 268.0
      

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,4,39.7013,16000,96.8,360,12,96.8


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 20000
  custom_metrics: {}
  date: 2022-01-03_13-44-09
  done: false
  episode_len_mean: 131.82
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 131.82
  episode_reward_min: 14.0
  episodes_this_iter: 12
  episodes_total: 340
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.30000001192092896
          cur_lr: 4.999999873689376e-05
          entropy: 0.5397278070449829
          entropy_coeff: 0.0
          kl: 0.006242944393306971
          model: {}
          policy_loss: -0.01558975875377655
          total_loss: 694.2406616210938
          vf_explained_var: 0.2767491638660431
          vf_loss: 694.2542724609375
    num_agent_steps_sampled: 20000
    num_agent_steps_trained: 20000
    num_steps_sampled: 20000
    num_steps_trained: 20000
    num_steps_trai

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,5,46.359,20000,131.82,500,14,131.82


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,5,46.359,20000,131.82,500,14,131.82


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,5,46.359,20000,131.82,500,14,131.82


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,5,46.359,20000,131.82,500,14,131.82


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 24000
  custom_metrics: {}
  date: 2022-01-03_13-44-28
  done: false
  episode_len_mean: 163.4
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 163.4
  episode_reward_min: 14.0
  episodes_this_iter: 10
  episodes_total: 350
  evaluation:
    custom_metrics: {}
    episode_len_mean: 365.55
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 365.55
    episode_reward_min: 133.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 406
      - 500
      - 281
      - 404
      - 400
      - 279
      - 271
      - 458
      - 441
      - 304
      - 440
      - 397
      - 296
      - 500
      - 133
      - 262
      - 414
      - 473
      - 270
      - 382
      episode_reward:
      - 406.0
      - 500.0
      - 281.0
      - 404.0
      - 400.0
      - 279.0
      - 271.0
      - 458.0
      - 441.0
      - 304.0
      - 440.0
      - 397.0
      - 296.0

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,6,65.3898,24000,163.4,500,14,163.4


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 28000
  custom_metrics: {}
  date: 2022-01-03_13-44-34
  done: false
  episode_len_mean: 193.4
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 193.4
  episode_reward_min: 14.0
  episodes_this_iter: 10
  episodes_total: 360
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.15000000596046448
          cur_lr: 4.999999873689376e-05
          entropy: 0.5351145267486572
          entropy_coeff: 0.0
          kl: 0.00829379539936781
          model: {}
          policy_loss: -0.011822070926427841
          total_loss: 362.6712341308594
          vf_explained_var: 0.2681409418582916
          vf_loss: 362.6817932128906
    num_agent_steps_sampled: 28000
    num_agent_steps_trained: 28000
    num_steps_sampled: 28000
    num_steps_trained: 28000
    num_steps_traine

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,7,71.9424,28000,193.4,500,14,193.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,7,71.9424,28000,193.4,500,14,193.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,7,71.9424,28000,193.4,500,14,193.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,7,71.9424,28000,193.4,500,14,193.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,7,71.9424,28000,193.4,500,14,193.4


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 32000
  custom_metrics: {}
  date: 2022-01-03_13-44-58
  done: false
  episode_len_mean: 225.28
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 225.28
  episode_reward_min: 14.0
  episodes_this_iter: 9
  episodes_total: 369
  evaluation:
    custom_metrics: {}
    episode_len_mean: 468.1
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 468.1
    episode_reward_min: 316.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 431
      - 443
      - 316
      - 343
      - 474
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 499
      - 500
      - 500
      - 356
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 431.0
      - 443.0
      - 316.0
      - 343.0
      - 474.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,8,95.2527,32000,225.28,500,14,225.28


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 36000
  custom_metrics: {}
  date: 2022-01-03_13-45-04
  done: false
  episode_len_mean: 260.1
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 260.1
  episode_reward_min: 19.0
  episodes_this_iter: 9
  episodes_total: 378
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.07500000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 0.48875856399536133
          entropy_coeff: 0.0
          kl: 0.005784942302852869
          model: {}
          policy_loss: -0.004727379884570837
          total_loss: 388.62091064453125
          vf_explained_var: 0.15254899859428406
          vf_loss: 388.6252136230469
    num_agent_steps_sampled: 36000
    num_agent_steps_trained: 36000
    num_steps_sampled: 36000
    num_steps_trained: 36000
    num_steps_tra

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,9,101.902,36000,260.1,500,19,260.1


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,9,101.902,36000,260.1,500,19,260.1


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,9,101.902,36000,260.1,500,19,260.1


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,9,101.902,36000,260.1,500,19,260.1


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 40000
  custom_metrics: {}
  date: 2022-01-03_13-45-27
  done: false
  episode_len_mean: 294.48
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 294.48
  episode_reward_min: 19.0
  episodes_this_iter: 11
  episodes_total: 389
  evaluation:
    custom_metrics: {}
    episode_len_mean: 459.65
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 459.65
    episode_reward_min: 381.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 421
      - 500
      - 500
      - 500
      - 471
      - 500
      - 500
      - 456
      - 469
      - 458
      - 406
      - 396
      - 394
      - 449
      - 500
      - 381
      - 392
      - 500
      - 500
      - 500
      episode_reward:
      - 421.0
      - 500.0
      - 500.0
      - 500.0
      - 471.0
      - 500.0
      - 500.0
      - 456.0
      - 469.0
      - 458.0
      - 406.0
      - 396.0
      - 394

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,10,124.458,40000,294.48,500,19,294.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,10,124.458,40000,294.48,500,19,294.48


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 44000
  custom_metrics: {}
  date: 2022-01-03_13-45-34
  done: false
  episode_len_mean: 324.49
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 324.49
  episode_reward_min: 27.0
  episodes_this_iter: 8
  episodes_total: 397
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.07500000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 0.5341718196868896
          entropy_coeff: 0.0
          kl: 0.008021753281354904
          model: {}
          policy_loss: -0.004115492105484009
          total_loss: 140.74203491210938
          vf_explained_var: 0.48715221881866455
          vf_loss: 140.74554443359375
    num_agent_steps_sampled: 44000
    num_agent_steps_trained: 44000
    num_steps_sampled: 44000
    num_steps_trained: 44000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,11,131.243,44000,324.49,500,27,324.49


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,11,131.243,44000,324.49,500,27,324.49


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,11,131.243,44000,324.49,500,27,324.49


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,11,131.243,44000,324.49,500,27,324.49


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 48000
  custom_metrics: {}
  date: 2022-01-03_13-45-57
  done: false
  episode_len_mean: 357.11
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 357.11
  episode_reward_min: 13.0
  episodes_this_iter: 9
  episodes_total: 406
  evaluation:
    custom_metrics: {}
    episode_len_mean: 492.4
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 492.4
    episode_reward_min: 380.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 380
      - 500
      - 500
      - 468
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,12,154.502,48000,357.11,500,13,357.11


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 52000
  custom_metrics: {}
  date: 2022-01-03_13-46-04
  done: false
  episode_len_mean: 381.1
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 381.1
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 414
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.03750000149011612
          cur_lr: 4.999999873689376e-05
          entropy: 0.5165870785713196
          entropy_coeff: 0.0
          kl: 0.005959213245660067
          model: {}
          policy_loss: -0.0020698525477200747
          total_loss: 180.06182861328125
          vf_explained_var: 0.5193233489990234
          vf_loss: 180.0636749267578
    num_agent_steps_sampled: 52000
    num_agent_steps_trained: 52000
    num_steps_sampled: 52000
    num_steps_trained: 52000
    num_steps_trai

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,13,161.283,52000,381.1,500,13,381.1


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,13,161.283,52000,381.1,500,13,381.1


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,13,161.283,52000,381.1,500,13,381.1


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,13,161.283,52000,381.1,500,13,381.1


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,13,161.283,52000,381.1,500,13,381.1


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 56000
  custom_metrics: {}
  date: 2022-01-03_13-46-28
  done: false
  episode_len_mean: 406.44
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 406.44
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 422
  evaluation:
    custom_metrics: {}
    episode_len_mean: 495.85
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 495.85
    episode_reward_min: 417.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 417
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 417.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,14,185.714,56000,406.44,500,13,406.44


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 60000
  custom_metrics: {}
  date: 2022-01-03_13-46-35
  done: false
  episode_len_mean: 432.59
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 432.59
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 430
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.03750000149011612
          cur_lr: 4.999999873689376e-05
          entropy: 0.5273802876472473
          entropy_coeff: 0.0
          kl: 0.00733661325648427
          model: {}
          policy_loss: 0.0002818219072651118
          total_loss: 304.7541809082031
          vf_explained_var: 0.2888195216655731
          vf_loss: 304.7536315917969
    num_agent_steps_sampled: 60000
    num_agent_steps_trained: 60000
    num_steps_sampled: 60000
    num_steps_trained: 60000
    num_steps_train

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,15,192.548,60000,432.59,500,13,432.59


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,15,192.548,60000,432.59,500,13,432.59


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,15,192.548,60000,432.59,500,13,432.59


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,15,192.548,60000,432.59,500,13,432.59


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 64000
  custom_metrics: {}
  date: 2022-01-03_13-46-56
  done: false
  episode_len_mean: 444.6
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 444.6
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 438
  evaluation:
    custom_metrics: {}
    episode_len_mean: 414.8
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 414.8
    episode_reward_min: 144.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 216
      - 417
      - 500
      - 500
      - 278
      - 500
      - 500
      - 386
      - 388
      - 379
      - 144
      - 367
      - 500
      - 500
      - 500
      - 500
      - 221
      - 500
      - 500
      - 500
      episode_reward:
      - 216.0
      - 417.0
      - 500.0
      - 500.0
      - 278.0
      - 500.0
      - 500.0
      - 386.0
      - 388.0
      - 379.0
      - 144.0
      - 367.0
      - 500.0
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,16,213.351,64000,444.6,500,13,444.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,16,213.351,64000,444.6,500,13,444.6


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 68000
  custom_metrics: {}
  date: 2022-01-03_13-47-03
  done: false
  episode_len_mean: 453.71
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 453.71
  episode_reward_min: 13.0
  episodes_this_iter: 9
  episodes_total: 447
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.01875000074505806
          cur_lr: 4.999999873689376e-05
          entropy: 0.4969511032104492
          entropy_coeff: 0.0
          kl: 0.005400669761002064
          model: {}
          policy_loss: -0.011936291120946407
          total_loss: 270.889404296875
          vf_explained_var: 0.40124747157096863
          vf_loss: 270.9012451171875
    num_agent_steps_sampled: 68000
    num_agent_steps_trained: 68000
    num_steps_sampled: 68000
    num_steps_trained: 68000
    num_steps_trai

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,17,219.746,68000,453.71,500,13,453.71


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,17,219.746,68000,453.71,500,13,453.71


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,17,219.746,68000,453.71,500,13,453.71


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,17,219.746,68000,453.71,500,13,453.71


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 72000
  custom_metrics: {}
  date: 2022-01-03_13-47-26
  done: false
  episode_len_mean: 462.61
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 462.61
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 455
  evaluation:
    custom_metrics: {}
    episode_len_mean: 493.9
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 493.9
    episode_reward_min: 378.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 378
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 378.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,18,243.262,72000,462.61,500,13,462.61


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 76000
  custom_metrics: {}
  date: 2022-01-03_13-47-32
  done: false
  episode_len_mean: 465.42
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 465.42
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 463
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.01875000074505806
          cur_lr: 4.999999873689376e-05
          entropy: 0.46219807863235474
          entropy_coeff: 0.0
          kl: 0.004400151781737804
          model: {}
          policy_loss: -0.002735947957262397
          total_loss: 140.81980895996094
          vf_explained_var: 0.56380695104599
          vf_loss: 140.82244873046875
    num_agent_steps_sampled: 76000
    num_agent_steps_trained: 76000
    num_steps_sampled: 76000
    num_steps_trained: 76000
    num_steps_tra

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,19,249.511,76000,465.42,500,13,465.42


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,19,249.511,76000,465.42,500,13,465.42


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,19,249.511,76000,465.42,500,13,465.42


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,19,249.511,76000,465.42,500,13,465.42


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,19,249.511,76000,465.42,500,13,465.42


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 80000
  custom_metrics: {}
  date: 2022-01-03_13-47-57
  done: false
  episode_len_mean: 468.27
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 468.27
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 471
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,20,273.924,80000,468.27,500,13,468.27


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 84000
  custom_metrics: {}
  date: 2022-01-03_13-48-03
  done: false
  episode_len_mean: 473.35
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 473.35
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 479
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.004687500186264515
          cur_lr: 4.999999873689376e-05
          entropy: 0.4432362914085388
          entropy_coeff: 0.0
          kl: 0.00505047058686614
          model: {}
          policy_loss: 0.0021547339856624603
          total_loss: 284.6266784667969
          vf_explained_var: 0.5598533153533936
          vf_loss: 284.6245422363281
    num_agent_steps_sampled: 84000
    num_agent_steps_trained: 84000
    num_steps_sampled: 84000
    num_steps_trained: 84000
    num_steps_trai

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,21,280.301,84000,473.35,500,13,473.35


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,21,280.301,84000,473.35,500,13,473.35


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,21,280.301,84000,473.35,500,13,473.35


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,21,280.301,84000,473.35,500,13,473.35


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,21,280.301,84000,473.35,500,13,473.35


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 88000
  custom_metrics: {}
  date: 2022-01-03_13-48-27
  done: false
  episode_len_mean: 481.06
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 481.06
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 487
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,22,303.856,88000,481.06,500,13,481.06


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 92000
  custom_metrics: {}
  date: 2022-01-03_13-48-33
  done: false
  episode_len_mean: 484.83
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 484.83
  episode_reward_min: 13.0
  episodes_this_iter: 8
  episodes_total: 495
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0023437500931322575
          cur_lr: 4.999999873689376e-05
          entropy: 0.4058391749858856
          entropy_coeff: 0.0
          kl: 0.0034382410813122988
          model: {}
          policy_loss: -0.0004056684556417167
          total_loss: 167.3172149658203
          vf_explained_var: 0.7687280774116516
          vf_loss: 167.31761169433594
    num_agent_steps_sampled: 92000
    num_agent_steps_trained: 92000
    num_steps_sampled: 92000
    num_steps_trained: 92000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,23,310.252,92000,484.83,500,13,484.83


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,23,310.252,92000,484.83,500,13,484.83


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,23,310.252,92000,484.83,500,13,484.83


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,23,310.252,92000,484.83,500,13,484.83


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,23,310.252,92000,484.83,500,13,484.83


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 96000
  custom_metrics: {}
  date: 2022-01-03_13-48-58
  done: false
  episode_len_mean: 492.4
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.4
  episode_reward_min: 136.0
  episodes_this_iter: 8
  episodes_total: 503
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,24,334.689,96000,492.4,500,136,492.4


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 100000
  custom_metrics: {}
  date: 2022-01-03_13-49-04
  done: false
  episode_len_mean: 492.4
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.4
  episode_reward_min: 136.0
  episodes_this_iter: 8
  episodes_total: 511
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0005859375232830644
          cur_lr: 4.999999873689376e-05
          entropy: 0.3711017966270447
          entropy_coeff: 0.0
          kl: 0.00191035820171237
          model: {}
          policy_loss: 0.0005888226442039013
          total_loss: 270.68115234375
          vf_explained_var: 0.4633130133152008
          vf_loss: 270.6805419921875
    num_agent_steps_sampled: 100000
    num_agent_steps_trained: 100000
    num_steps_sampled: 100000
    num_steps_trained: 100000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,25,341.086,100000,492.4,500,136,492.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,25,341.086,100000,492.4,500,136,492.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,25,341.086,100000,492.4,500,136,492.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,25,341.086,100000,492.4,500,136,492.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,25,341.086,100000,492.4,500,136,492.4


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 104000
  custom_metrics: {}
  date: 2022-01-03_13-49-27
  done: false
  episode_len_mean: 492.4
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.4
  episode_reward_min: 136.0
  episodes_this_iter: 8
  episodes_total: 519
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,26,364.052,104000,492.4,500,136,492.4


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 108000
  custom_metrics: {}
  date: 2022-01-03_13-49-34
  done: false
  episode_len_mean: 492.4
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.4
  episode_reward_min: 136.0
  episodes_this_iter: 8
  episodes_total: 527
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.0001464843808207661
          cur_lr: 4.999999873689376e-05
          entropy: 0.36017468571662903
          entropy_coeff: 0.0
          kl: 0.0034243902191519737
          model: {}
          policy_loss: 0.0010374116245657206
          total_loss: 281.56884765625
          vf_explained_var: 0.37897738814353943
          vf_loss: 281.5677795410156
    num_agent_steps_sampled: 108000
    num_agent_steps_trained: 108000
    num_steps_sampled: 108000
    num_steps_trained: 108000
    num_ste

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,27,370.436,108000,492.4,500,136,492.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,27,370.436,108000,492.4,500,136,492.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,27,370.436,108000,492.4,500,136,492.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,27,370.436,108000,492.4,500,136,492.4


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 112000
  custom_metrics: {}
  date: 2022-01-03_13-49-57
  done: false
  episode_len_mean: 492.4
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.4
  episode_reward_min: 136.0
  episodes_this_iter: 8
  episodes_total: 535
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,28,394.092,112000,492.4,500,136,492.4


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,28,394.092,112000,492.4,500,136,492.4


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 116000
  custom_metrics: {}
  date: 2022-01-03_13-50-04
  done: false
  episode_len_mean: 492.87
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.87
  episode_reward_min: 136.0
  episodes_this_iter: 8
  episodes_total: 543
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 3.662109520519152e-05
          cur_lr: 4.999999873689376e-05
          entropy: 0.3628605008125305
          entropy_coeff: 0.0
          kl: 0.0037267031148076057
          model: {}
          policy_loss: -0.0016175990458577871
          total_loss: 299.8031005859375
          vf_explained_var: 0.33579498529434204
          vf_loss: 299.80462646484375
    num_agent_steps_sampled: 116000
    num_agent_steps_trained: 116000
    num_steps_sampled: 116000
    num_steps_trained: 116000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,29,400.561,116000,492.87,500,136,492.87


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,29,400.561,116000,492.87,500,136,492.87


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,29,400.561,116000,492.87,500,136,492.87


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,29,400.561,116000,492.87,500,136,492.87


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 120000
  custom_metrics: {}
  date: 2022-01-03_13-50-28
  done: false
  episode_len_mean: 494.29
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 494.29
  episode_reward_min: 136.0
  episodes_this_iter: 8
  episodes_total: 551
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,30,424.419,120000,494.29,500,136,494.29


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 124000
  custom_metrics: {}
  date: 2022-01-03_13-50-34
  done: false
  episode_len_mean: 499.09
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.09
  episode_reward_min: 409.0
  episodes_this_iter: 8
  episodes_total: 559
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 9.15527380129788e-06
          cur_lr: 4.999999873689376e-05
          entropy: 0.3726036250591278
          entropy_coeff: 0.0
          kl: 0.0030164276249706745
          model: {}
          policy_loss: 0.0016188398003578186
          total_loss: 363.93804931640625
          vf_explained_var: 0.2855881154537201
          vf_loss: 363.93646240234375
    num_agent_steps_sampled: 124000
    num_agent_steps_trained: 124000
    num_steps_sampled: 124000
    num_steps_trained: 124000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,31,430.737,124000,499.09,500,409,499.09


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,31,430.737,124000,499.09,500,409,499.09


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,31,430.737,124000,499.09,500,409,499.09


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,31,430.737,124000,499.09,500,409,499.09


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,31,430.737,124000,499.09,500,409,499.09


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 128000
  custom_metrics: {}
  date: 2022-01-03_13-50-58
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 567
  evaluation:
    custom_metrics: {}
    episode_len_mean: 497.6
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 497.6
    episode_reward_min: 452.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 452
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,32,454.403,128000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 132000
  custom_metrics: {}
  date: 2022-01-03_13-51-04
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 575
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.28881845032447e-06
          cur_lr: 4.999999873689376e-05
          entropy: 0.3741539716720581
          entropy_coeff: 0.0
          kl: 0.0032545658759772778
          model: {}
          policy_loss: -0.0036664640065282583
          total_loss: 258.59686279296875
          vf_explained_var: 0.35744550824165344
          vf_loss: 258.6005554199219
    num_agent_steps_sampled: 132000
    num_agent_steps_trained: 132000
    num_steps_sampled: 132000
    num_steps_trained: 132000
    num_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,33,460.956,132000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,33,460.956,132000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,33,460.956,132000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,33,460.956,132000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,33,460.956,132000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 136000
  custom_metrics: {}
  date: 2022-01-03_13-51-28
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 583
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,34,484.884,136000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 140000
  custom_metrics: {}
  date: 2022-01-03_13-51-35
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 591
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 5.722046125811175e-07
          cur_lr: 4.999999873689376e-05
          entropy: 0.3170822858810425
          entropy_coeff: 0.0
          kl: 0.0040565794333815575
          model: {}
          policy_loss: -0.0009620698401704431
          total_loss: 325.4109802246094
          vf_explained_var: 0.35483792424201965
          vf_loss: 325.4119567871094
    num_agent_steps_sampled: 140000
    num_agent_steps_trained: 140000
    num_steps_sampled: 140000
    num_steps_trained: 140000
    num_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,35,491.372,140000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,35,491.372,140000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,35,491.372,140000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,35,491.372,140000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,35,491.372,140000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 144000
  custom_metrics: {}
  date: 2022-01-03_13-51-59
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 599
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,36,515.601,144000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 148000
  custom_metrics: {}
  date: 2022-01-03_13-52-06
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 607
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 1.4305115314527939e-07
          cur_lr: 4.999999873689376e-05
          entropy: 0.3192770779132843
          entropy_coeff: 0.0
          kl: 0.0038329511880874634
          model: {}
          policy_loss: -0.0026558642275631428
          total_loss: 372.1199951171875
          vf_explained_var: 0.2135278582572937
          vf_loss: 372.1226501464844
    num_agent_steps_sampled: 148000
    num_agent_steps_trained: 148000
    num_steps_sampled: 148000
    num_steps_trained: 148000
    num_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,37,522.101,148000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,37,522.101,148000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,37,522.101,148000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,37,522.101,148000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,37,522.101,148000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 152000
  custom_metrics: {}
  date: 2022-01-03_13-52-30
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 615
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,38,546.256,152000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 156000
  custom_metrics: {}
  date: 2022-01-03_13-52-37
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 623
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 3.5762788286319847e-08
          cur_lr: 4.999999873689376e-05
          entropy: 0.33596229553222656
          entropy_coeff: 0.0
          kl: 0.003063222859054804
          model: {}
          policy_loss: -0.0007319195428863168
          total_loss: 502.2861328125
          vf_explained_var: -0.01001955009996891
          vf_loss: 502.2868347167969
    num_agent_steps_sampled: 156000
    num_agent_steps_trained: 156000
    num_steps_sampled: 156000
    num_steps_trained: 156000
    num_st

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,39,552.779,156000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,39,552.779,156000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,39,552.779,156000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,39,552.779,156000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,39,552.779,156000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 160000
  custom_metrics: {}
  date: 2022-01-03_13-53-00
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 631
  evaluation:
    custom_metrics: {}
    episode_len_mean: 490.95
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 490.95
    episode_reward_min: 319.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 319
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 319.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,40,576.486,160000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 164000
  custom_metrics: {}
  date: 2022-01-03_13-53-07
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 639
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 8.940697071579962e-09
          cur_lr: 4.999999873689376e-05
          entropy: 0.35207512974739075
          entropy_coeff: 0.0
          kl: 0.0028695587534457445
          model: {}
          policy_loss: -0.005121050402522087
          total_loss: 273.1157531738281
          vf_explained_var: 0.08673982322216034
          vf_loss: 273.1208801269531
    num_agent_steps_sampled: 164000
    num_agent_steps_trained: 164000
    num_steps_sampled: 164000
    num_steps_trained: 164000
    num_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,41,582.98,164000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,41,582.98,164000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,41,582.98,164000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,41,582.98,164000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 168000
  custom_metrics: {}
  date: 2022-01-03_13-53-30
  done: false
  episode_len_mean: 500.0
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 500.0
  episode_reward_min: 500.0
  episodes_this_iter: 8
  episodes_total: 647
  evaluation:
    custom_metrics: {}
    episode_len_mean: 471.9
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 471.9
    episode_reward_min: 242.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 254
      - 500
      - 500
      - 500
      - 500
      - 242
      - 473
      - 500
      - 500
      - 469
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 254.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 242.0
      - 473.0


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,42,606.409,168000,500,500,500,500


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,42,606.409,168000,500,500,500,500


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 172000
  custom_metrics: {}
  date: 2022-01-03_13-53-37
  done: false
  episode_len_mean: 498.92
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 498.92
  episode_reward_min: 392.0
  episodes_this_iter: 9
  episodes_total: 656
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.2351742678949904e-09
          cur_lr: 4.999999873689376e-05
          entropy: 0.3735828697681427
          entropy_coeff: 0.0
          kl: 0.004065442830324173
          model: {}
          policy_loss: -0.0010430102702230215
          total_loss: 464.3570556640625
          vf_explained_var: 0.017288917675614357
          vf_loss: 464.3581237792969
    num_agent_steps_sampled: 172000
    num_agent_steps_trained: 172000
    num_steps_sampled: 172000
    num_steps_trained: 172000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,43,613.038,172000,498.92,500,392,498.92


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,43,613.038,172000,498.92,500,392,498.92


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,43,613.038,172000,498.92,500,392,498.92


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,43,613.038,172000,498.92,500,392,498.92


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 176000
  custom_metrics: {}
  date: 2022-01-03_13-54-01
  done: false
  episode_len_mean: 496.63
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.63
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 664
  evaluation:
    custom_metrics: {}
    episode_len_mean: 494.9
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 494.9
    episode_reward_min: 399.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 399
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 499
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 399.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,44,636.882,176000,496.63,500,376,496.63


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 180000
  custom_metrics: {}
  date: 2022-01-03_13-54-07
  done: false
  episode_len_mean: 496.63
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.63
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 672
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 5.587935669737476e-10
          cur_lr: 4.999999873689376e-05
          entropy: 0.3455185294151306
          entropy_coeff: 0.0
          kl: 0.0016453824937343597
          model: {}
          policy_loss: -0.001856086659245193
          total_loss: 263.1059265136719
          vf_explained_var: 0.15679922699928284
          vf_loss: 263.1077880859375
    num_agent_steps_sampled: 180000
    num_agent_steps_trained: 180000
    num_steps_sampled: 180000
    num_steps_trained: 180000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,45,643.397,180000,496.63,500,376,496.63


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,45,643.397,180000,496.63,500,376,496.63


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,45,643.397,180000,496.63,500,376,496.63


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,45,643.397,180000,496.63,500,376,496.63


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,45,643.397,180000,496.63,500,376,496.63


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 184000
  custom_metrics: {}
  date: 2022-01-03_13-54-32
  done: false
  episode_len_mean: 496.63
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.63
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 680
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,46,667.508,184000,496.63,500,376,496.63


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 188000
  custom_metrics: {}
  date: 2022-01-03_13-54-38
  done: false
  episode_len_mean: 496.63
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.63
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 688
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 1.396983917434369e-10
          cur_lr: 4.999999873689376e-05
          entropy: 0.34701216220855713
          entropy_coeff: 0.0
          kl: 0.0024114095140248537
          model: {}
          policy_loss: 0.00015137766604311764
          total_loss: 370.8667297363281
          vf_explained_var: -0.011806507594883442
          vf_loss: 370.8665771484375
    num_agent_steps_sampled: 188000
    num_agent_steps_trained: 188000
    num_steps_sampled: 188000
    num_steps_trained: 188000
    

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,47,674.061,188000,496.63,500,376,496.63


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,47,674.061,188000,496.63,500,376,496.63


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,47,674.061,188000,496.63,500,376,496.63


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,47,674.061,188000,496.63,500,376,496.63


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,47,674.061,188000,496.63,500,376,496.63


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 192000
  custom_metrics: {}
  date: 2022-01-03_13-55-02
  done: false
  episode_len_mean: 496.48
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.48
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 696
  evaluation:
    custom_metrics: {}
    episode_len_mean: 496.55
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 496.55
    episode_reward_min: 431.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 431
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 431.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 50

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,48,698.238,192000,496.48,500,376,496.48


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 196000
  custom_metrics: {}
  date: 2022-01-03_13-55-09
  done: false
  episode_len_mean: 496.48
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.48
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 704
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 3.4924597935859225e-11
          cur_lr: 4.999999873689376e-05
          entropy: 0.28589412569999695
          entropy_coeff: 0.0
          kl: 0.0026714280247688293
          model: {}
          policy_loss: -0.001123060705140233
          total_loss: 340.09228515625
          vf_explained_var: 0.23291447758674622
          vf_loss: 340.0933532714844
    num_agent_steps_sampled: 196000
    num_agent_steps_trained: 196000
    num_steps_sampled: 196000
    num_steps_trained: 196000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,49,704.719,196000,496.48,500,376,496.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,49,704.719,196000,496.48,500,376,496.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,49,704.719,196000,496.48,500,376,496.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,49,704.719,196000,496.48,500,376,496.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,49,704.719,196000,496.48,500,376,496.48


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 200000
  custom_metrics: {}
  date: 2022-01-03_13-55-33
  done: false
  episode_len_mean: 496.48
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.48
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 712
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,50,728.358,200000,496.48,500,376,496.48


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 204000
  custom_metrics: {}
  date: 2022-01-03_13-55-39
  done: false
  episode_len_mean: 496.48
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.48
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 720
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 8.731149483964806e-12
          cur_lr: 4.999999873689376e-05
          entropy: 0.2651657164096832
          entropy_coeff: 0.0
          kl: 0.003235872834920883
          model: {}
          policy_loss: -0.00046385437599383295
          total_loss: 410.3289794921875
          vf_explained_var: 0.14666806161403656
          vf_loss: 410.32940673828125
    num_agent_steps_sampled: 204000
    num_agent_steps_trained: 204000
    num_steps_sampled: 204000
    num_steps_trained: 204000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,51,734.985,204000,496.48,500,376,496.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,51,734.985,204000,496.48,500,376,496.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,51,734.985,204000,496.48,500,376,496.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,51,734.985,204000,496.48,500,376,496.48


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,51,734.985,204000,496.48,500,376,496.48


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 208000
  custom_metrics: {}
  date: 2022-01-03_13-56-04
  done: false
  episode_len_mean: 496.27
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.27
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 728
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,52,759.192,208000,496.27,500,376,496.27


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 212000
  custom_metrics: {}
  date: 2022-01-03_13-56-10
  done: false
  episode_len_mean: 496.27
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.27
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 736
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.1827873709912016e-12
          cur_lr: 4.999999873689376e-05
          entropy: 0.28173762559890747
          entropy_coeff: 0.0
          kl: 0.0032018960919231176
          model: {}
          policy_loss: -0.0011629423825070262
          total_loss: 304.3319091796875
          vf_explained_var: 0.37079381942749023
          vf_loss: 304.33306884765625
    num_agent_steps_sampled: 212000
    num_agent_steps_trained: 212000
    num_steps_sampled: 212000
    num_steps_trained: 212000
    

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,53,765.657,212000,496.27,500,376,496.27


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,53,765.657,212000,496.27,500,376,496.27


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,53,765.657,212000,496.27,500,376,496.27


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,53,765.657,212000,496.27,500,376,496.27


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,53,765.657,212000,496.27,500,376,496.27


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 216000
  custom_metrics: {}
  date: 2022-01-03_13-56-34
  done: false
  episode_len_mean: 496.27
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.27
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 744
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,54,789.91,216000,496.27,500,376,496.27


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 220000
  custom_metrics: {}
  date: 2022-01-03_13-56-41
  done: false
  episode_len_mean: 496.27
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 496.27
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 752
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 5.456968427478004e-13
          cur_lr: 4.999999873689376e-05
          entropy: 0.3077785074710846
          entropy_coeff: 0.0
          kl: 0.001683449256233871
          model: {}
          policy_loss: -0.00023939917446114123
          total_loss: 297.7902526855469
          vf_explained_var: 0.4101290702819824
          vf_loss: 297.79052734375
    num_agent_steps_sampled: 220000
    num_agent_steps_trained: 220000
    num_steps_sampled: 220000
    num_steps_trained: 220000
    num_st

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,55,796.346,220000,496.27,500,376,496.27


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,55,796.346,220000,496.27,500,376,496.27


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,55,796.346,220000,496.27,500,376,496.27


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,55,796.346,220000,496.27,500,376,496.27


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,55,796.346,220000,496.27,500,376,496.27


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 224000
  custom_metrics: {}
  date: 2022-01-03_13-57-04
  done: false
  episode_len_mean: 497.99
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.99
  episode_reward_min: 376.0
  episodes_this_iter: 8
  episodes_total: 760
  evaluation:
    custom_metrics: {}
    episode_len_mean: 476.3
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 476.3
    episode_reward_min: 352.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 352
      - 500
      - 500
      - 500
      - 359
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 353
      - 500
      - 500
      - 500
      - 500
      - 500
      - 462
      episode_reward:
      - 500.0
      - 352.0
      - 500.0
      - 500.0
      - 500.0
      - 359.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,56,819.885,224000,497.99,500,376,497.99


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 228000
  custom_metrics: {}
  date: 2022-01-03_13-57-11
  done: false
  episode_len_mean: 498.75
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 498.75
  episode_reward_min: 459.0
  episodes_this_iter: 8
  episodes_total: 768
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.728484213739002e-13
          cur_lr: 4.999999873689376e-05
          entropy: 0.27726444602012634
          entropy_coeff: 0.0
          kl: 0.002490389160811901
          model: {}
          policy_loss: -0.0012705748667940497
          total_loss: 212.70944213867188
          vf_explained_var: 0.4870787262916565
          vf_loss: 212.71072387695312
    num_agent_steps_sampled: 228000
    num_agent_steps_trained: 228000
    num_steps_sampled: 228000
    num_steps_trained: 228000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,57,826.372,228000,498.75,500,459,498.75


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,57,826.372,228000,498.75,500,459,498.75


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,57,826.372,228000,498.75,500,459,498.75


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,57,826.372,228000,498.75,500,459,498.75


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,57,826.372,228000,498.75,500,459,498.75


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 232000
  custom_metrics: {}
  date: 2022-01-03_13-57-35
  done: false
  episode_len_mean: 497.89
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.89
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 776
  evaluation:
    custom_metrics: {}
    episode_len_mean: 496.05
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 496.05
    episode_reward_min: 447.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 447
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 484
      - 490
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 447.0
      - 500.0
      - 500.0
      - 50

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,58,849.877,232000,497.89,500,446,497.89


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 236000
  custom_metrics: {}
  date: 2022-01-03_13-57-41
  done: false
  episode_len_mean: 497.62
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.62
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 784
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 6.821210534347505e-14
          cur_lr: 4.999999873689376e-05
          entropy: 0.2768877446651459
          entropy_coeff: 0.0
          kl: 0.002392576541751623
          model: {}
          policy_loss: -0.0008955386583693326
          total_loss: 288.79681396484375
          vf_explained_var: 0.2627524137496948
          vf_loss: 288.7977294921875
    num_agent_steps_sampled: 236000
    num_agent_steps_trained: 236000
    num_steps_sampled: 236000
    num_steps_trained: 236000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,59,856.478,236000,497.62,500,446,497.62


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,59,856.478,236000,497.62,500,446,497.62


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,59,856.478,236000,497.62,500,446,497.62


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,59,856.478,236000,497.62,500,446,497.62


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 240000
  custom_metrics: {}
  date: 2022-01-03_13-58-05
  done: false
  episode_len_mean: 497.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.77
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 792
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,60,880.211,240000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,60,880.211,240000,497.77,500,446,497.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 244000
  custom_metrics: {}
  date: 2022-01-03_13-58-11
  done: false
  episode_len_mean: 497.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.77
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 800
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 3.4106052671737525e-14
          cur_lr: 4.999999873689376e-05
          entropy: 0.27059662342071533
          entropy_coeff: 0.0
          kl: 0.0037121737841516733
          model: {}
          policy_loss: -0.0013116763439029455
          total_loss: 359.6751708984375
          vf_explained_var: 0.24808254837989807
          vf_loss: 359.6764831542969
    num_agent_steps_sampled: 244000
    num_agent_steps_trained: 244000
    num_steps_sampled: 244000
    num_steps_trained: 244000
    n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,61,886.737,244000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,61,886.737,244000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,61,886.737,244000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,61,886.737,244000,497.77,500,446,497.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 248000
  custom_metrics: {}
  date: 2022-01-03_13-58-36
  done: false
  episode_len_mean: 497.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.77
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 808
  evaluation:
    custom_metrics: {}
    episode_len_mean: 498.6
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 498.6
    episode_reward_min: 472.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 472
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 472.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,62,911.055,248000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,62,911.055,248000,497.77,500,446,497.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 252000
  custom_metrics: {}
  date: 2022-01-03_13-58-42
  done: false
  episode_len_mean: 497.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.77
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 816
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 8.526513167934381e-15
          cur_lr: 4.999999873689376e-05
          entropy: 0.29696959257125854
          entropy_coeff: 0.0
          kl: 0.003980488050729036
          model: {}
          policy_loss: -0.002883958863094449
          total_loss: 318.4349060058594
          vf_explained_var: 0.21819797158241272
          vf_loss: 318.43780517578125
    num_agent_steps_sampled: 252000
    num_agent_steps_trained: 252000
    num_steps_sampled: 252000
    num_steps_trained: 252000
    num

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,63,917.518,252000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,63,917.518,252000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,63,917.518,252000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,63,917.518,252000,497.77,500,446,497.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 256000
  custom_metrics: {}
  date: 2022-01-03_13-59-06
  done: false
  episode_len_mean: 497.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.77
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 824
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,64,941.592,256000,497.77,500,446,497.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,64,941.592,256000,497.77,500,446,497.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 260000
  custom_metrics: {}
  date: 2022-01-03_13-59-13
  done: false
  episode_len_mean: 497.98
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.98
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 832
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.1316282919835953e-15
          cur_lr: 4.999999873689376e-05
          entropy: 0.3001687526702881
          entropy_coeff: 0.0
          kl: 0.005532457958906889
          model: {}
          policy_loss: -0.0025179916992783546
          total_loss: 415.4772644042969
          vf_explained_var: 0.09152212738990784
          vf_loss: 415.4797668457031
    num_agent_steps_sampled: 260000
    num_agent_steps_trained: 260000
    num_steps_sampled: 260000
    num_steps_trained: 260000
    num

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,65,948.114,260000,497.98,500,446,497.98


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,65,948.114,260000,497.98,500,446,497.98


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,65,948.114,260000,497.98,500,446,497.98


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,65,948.114,260000,497.98,500,446,497.98


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 264000
  custom_metrics: {}
  date: 2022-01-03_13-59-37
  done: false
  episode_len_mean: 497.98
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.98
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 840
  evaluation:
    custom_metrics: {}
    episode_len_mean: 494.65
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 494.65
    episode_reward_min: 399.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 494
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 399
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 494.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 50

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,66,972.208,264000,497.98,500,446,497.98


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,66,972.208,264000,497.98,500,446,497.98


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 268000
  custom_metrics: {}
  date: 2022-01-03_13-59-44
  done: false
  episode_len_mean: 497.75
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 497.75
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 848
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 1.0658141459917976e-15
          cur_lr: 4.999999873689376e-05
          entropy: 0.3097001016139984
          entropy_coeff: 0.0
          kl: 0.002771975938230753
          model: {}
          policy_loss: -0.003007206367328763
          total_loss: 239.74151611328125
          vf_explained_var: 0.2675352692604065
          vf_loss: 239.74452209472656
    num_agent_steps_sampled: 268000
    num_agent_steps_trained: 268000
    num_steps_sampled: 268000
    num_steps_trained: 268000
    num

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,67,979.171,268000,497.75,500,446,497.75


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,67,979.171,268000,497.75,500,446,497.75


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,67,979.171,268000,497.75,500,446,497.75


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,67,979.171,268000,497.75,500,446,497.75


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,67,979.171,268000,497.75,500,446,497.75


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 272000
  custom_metrics: {}
  date: 2022-01-03_14-00-10
  done: false
  episode_len_mean: 498.16
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 498.16
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 856
  evaluation:
    custom_metrics: {}
    episode_len_mean: 499.1
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 499.1
    episode_reward_min: 484.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 484
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 498
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 484.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,68,1005.16,272000,498.16,500,446,498.16


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 276000
  custom_metrics: {}
  date: 2022-01-03_14-00-17
  done: false
  episode_len_mean: 498.45
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 498.45
  episode_reward_min: 446.0
  episodes_this_iter: 8
  episodes_total: 864
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.664535364979494e-16
          cur_lr: 4.999999873689376e-05
          entropy: 0.30517786741256714
          entropy_coeff: 0.0
          kl: 0.003519083373248577
          model: {}
          policy_loss: -0.003675314364954829
          total_loss: 180.7675018310547
          vf_explained_var: 0.4545365571975708
          vf_loss: 180.77117919921875
    num_agent_steps_sampled: 276000
    num_agent_steps_trained: 276000
    num_steps_sampled: 276000
    num_steps_trained: 276000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,69,1012,276000,498.45,500,446,498.45


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,69,1012,276000,498.45,500,446,498.45


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,69,1012,276000,498.45,500,446,498.45


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,69,1012,276000,498.45,500,446,498.45


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,69,1012,276000,498.45,500,446,498.45


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 280000
  custom_metrics: {}
  date: 2022-01-03_14-00-42
  done: false
  episode_len_mean: 499.45
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.45
  episode_reward_min: 473.0
  episodes_this_iter: 8
  episodes_total: 872
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,70,1036.44,280000,499.45,500,473,499.45


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 284000
  custom_metrics: {}
  date: 2022-01-03_14-00-48
  done: false
  episode_len_mean: 499.5
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.5
  episode_reward_min: 473.0
  episodes_this_iter: 8
  episodes_total: 880
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 6.661338412448735e-17
          cur_lr: 4.999999873689376e-05
          entropy: 0.33786264061927795
          entropy_coeff: 0.0
          kl: 0.0016626294236630201
          model: {}
          policy_loss: -0.0017250650562345982
          total_loss: 257.88360595703125
          vf_explained_var: 0.2730449438095093
          vf_loss: 257.88531494140625
    num_agent_steps_sampled: 284000
    num_agent_steps_trained: 284000
    num_steps_sampled: 284000
    num_steps_trained: 284000
    num

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,71,1043.02,284000,499.5,500,473,499.5


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,71,1043.02,284000,499.5,500,473,499.5


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,71,1043.02,284000,499.5,500,473,499.5


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,71,1043.02,284000,499.5,500,473,499.5


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,71,1043.02,284000,499.5,500,473,499.5


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 288000
  custom_metrics: {}
  date: 2022-01-03_14-01-13
  done: false
  episode_len_mean: 499.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.77
  episode_reward_min: 477.0
  episodes_this_iter: 8
  episodes_total: 888
  evaluation:
    custom_metrics: {}
    episode_len_mean: 497.55
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 497.55
    episode_reward_min: 451.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 451
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 50

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,72,1067.28,288000,499.77,500,477,499.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 292000
  custom_metrics: {}
  date: 2022-01-03_14-01-20
  done: false
  episode_len_mean: 499.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.77
  episode_reward_min: 477.0
  episodes_this_iter: 8
  episodes_total: 896
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 1.6653346031121838e-17
          cur_lr: 4.999999873689376e-05
          entropy: 0.3299762010574341
          entropy_coeff: 0.0
          kl: 0.003860413795337081
          model: {}
          policy_loss: -0.0030521831940859556
          total_loss: 199.1850128173828
          vf_explained_var: 0.3740776479244232
          vf_loss: 199.1880645751953
    num_agent_steps_sampled: 292000
    num_agent_steps_trained: 292000
    num_steps_sampled: 292000
    num_steps_trained: 292000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,73,1074.24,292000,499.77,500,477,499.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,73,1074.24,292000,499.77,500,477,499.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,73,1074.24,292000,499.77,500,477,499.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,73,1074.24,292000,499.77,500,477,499.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,73,1074.24,292000,499.77,500,477,499.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 296000
  custom_metrics: {}
  date: 2022-01-03_14-01-46
  done: false
  episode_len_mean: 499.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.77
  episode_reward_min: 477.0
  episodes_this_iter: 8
  episodes_total: 904
  evaluation:
    custom_metrics: {}
    episode_len_mean: 500.0
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 500.0
    episode_reward_min: 500.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,74,1100.56,296000,499.77,500,477,499.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 300000
  custom_metrics: {}
  date: 2022-01-03_14-01-53
  done: false
  episode_len_mean: 499.77
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.77
  episode_reward_min: 477.0
  episodes_this_iter: 8
  episodes_total: 912
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 8.326673015560919e-18
          cur_lr: 4.999999873689376e-05
          entropy: 0.30977368354797363
          entropy_coeff: 0.0
          kl: 0.003388606710359454
          model: {}
          policy_loss: -0.001255176728591323
          total_loss: 306.4930725097656
          vf_explained_var: 0.17501209676265717
          vf_loss: 306.4943542480469
    num_agent_steps_sampled: 300000
    num_agent_steps_trained: 300000
    num_steps_sampled: 300000
    num_steps_trained: 300000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,75,1107.37,300000,499.77,500,477,499.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,75,1107.37,300000,499.77,500,477,499.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,75,1107.37,300000,499.77,500,477,499.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,75,1107.37,300000,499.77,500,477,499.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,75,1107.37,300000,499.77,500,477,499.77


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 304000
  custom_metrics: {}
  date: 2022-01-03_14-02-17
  done: false
  episode_len_mean: 499.08
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.08
  episode_reward_min: 456.0
  episodes_this_iter: 9
  episodes_total: 921
  evaluation:
    custom_metrics: {}
    episode_len_mean: 497.8
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 497.8
    episode_reward_min: 479.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 497
      - 500
      - 500
      - 500
      - 480
      - 479
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 497.0
      - 500.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,76,1131.99,304000,499.08,500,456,499.08


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 308000
  custom_metrics: {}
  date: 2022-01-03_14-02-24
  done: false
  episode_len_mean: 499.08
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 499.08
  episode_reward_min: 456.0
  episodes_this_iter: 8
  episodes_total: 929
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 2.0816682538902298e-18
          cur_lr: 4.999999873689376e-05
          entropy: 0.2569127678871155
          entropy_coeff: 0.0
          kl: 0.003834979608654976
          model: {}
          policy_loss: -0.0022537545301020145
          total_loss: 184.89852905273438
          vf_explained_var: 0.5317127108573914
          vf_loss: 184.90077209472656
    num_agent_steps_sampled: 308000
    num_agent_steps_trained: 308000
    num_steps_sampled: 308000
    num_steps_trained: 308000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,77,1138.27,308000,499.08,500,456,499.08


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,77,1138.27,308000,499.08,500,456,499.08


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,77,1138.27,308000,499.08,500,456,499.08


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,77,1138.27,308000,499.08,500,456,499.08


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,77,1138.27,308000,499.08,500,456,499.08


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 312000
  custom_metrics: {}
  date: 2022-01-03_14-02-48
  done: false
  episode_len_mean: 494.06
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 494.06
  episode_reward_min: 359.0
  episodes_this_iter: 9
  episodes_total: 938
  evaluation:
    custom_metrics: {}
    episode_len_mean: 484.45
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 484.45
    episode_reward_min: 407.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 477
      - 491
      - 500
      - 500
      - 468
      - 440
      - 500
      - 500
      - 421
      - 500
      - 500
      - 407
      - 485
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 477.0
      - 491.0
      - 500.0
      - 500.0
      - 468.0
      - 440.0
      - 500.0
      - 500.0
      - 42

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,78,1162.5,312000,494.06,500,359,494.06


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 316000
  custom_metrics: {}
  date: 2022-01-03_14-02-55
  done: false
  episode_len_mean: 492.45
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.45
  episode_reward_min: 359.0
  episodes_this_iter: 8
  episodes_total: 946
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 5.204170634725574e-19
          cur_lr: 4.999999873689376e-05
          entropy: 0.2483358383178711
          entropy_coeff: 0.0
          kl: 0.003815557574853301
          model: {}
          policy_loss: -0.00015161468763835728
          total_loss: 176.90817260742188
          vf_explained_var: 0.5207358002662659
          vf_loss: 176.90834045410156
    num_agent_steps_sampled: 316000
    num_agent_steps_trained: 316000
    num_steps_sampled: 316000
    num_steps_trained: 316000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,79,1169.2,316000,492.45,500,359,492.45


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,79,1169.2,316000,492.45,500,359,492.45


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,79,1169.2,316000,492.45,500,359,492.45


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,79,1169.2,316000,492.45,500,359,492.45


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,79,1169.2,316000,492.45,500,359,492.45


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 320000
  custom_metrics: {}
  date: 2022-01-03_14-03-20
  done: false
  episode_len_mean: 492.45
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.45
  episode_reward_min: 359.0
  episodes_this_iter: 8
  episodes_total: 954
  evaluation:
    custom_metrics: {}
    episode_len_mean: 497.55
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 497.55
    episode_reward_min: 451.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 451
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 50

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,80,1194.27,320000,492.45,500,359,492.45


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 324000
  custom_metrics: {}
  date: 2022-01-03_14-03-27
  done: false
  episode_len_mean: 492.44
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.44
  episode_reward_min: 359.0
  episodes_this_iter: 8
  episodes_total: 962
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 1.3010426586813936e-19
          cur_lr: 4.999999873689376e-05
          entropy: 0.26068851351737976
          entropy_coeff: 0.0
          kl: 0.004601421300321817
          model: {}
          policy_loss: -0.00252857175655663
          total_loss: 327.0229797363281
          vf_explained_var: 0.399458646774292
          vf_loss: 327.0255126953125
    num_agent_steps_sampled: 324000
    num_agent_steps_trained: 324000
    num_steps_sampled: 324000
    num_steps_trained: 324000
    num_st

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,81,1200.94,324000,492.44,500,359,492.44


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,81,1200.94,324000,492.44,500,359,492.44


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,81,1200.94,324000,492.44,500,359,492.44


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,81,1200.94,324000,492.44,500,359,492.44


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,81,1200.94,324000,492.44,500,359,492.44


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 328000
  custom_metrics: {}
  date: 2022-01-03_14-03-51
  done: false
  episode_len_mean: 492.44
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.44
  episode_reward_min: 359.0
  episodes_this_iter: 8
  episodes_total: 970
  evaluation:
    custom_metrics: {}
    episode_len_mean: 497.75
    episode_media: {}
    episode_reward_max: 500.0
    episode_reward_mean: 497.75
    episode_reward_min: 455.0
    episodes_this_iter: 20
    hist_stats:
      episode_lengths:
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 455
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      - 500
      episode_reward:
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 500.0
      - 455.0
      - 500.0
      - 500.0
      - 50

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,82,1225.19,328000,492.44,500,359,492.44


Result for PPO_CartPole-v1_b92be_00000:
  agent_timesteps_total: 332000
  custom_metrics: {}
  date: 2022-01-03_14-03-58
  done: false
  episode_len_mean: 492.44
  episode_media: {}
  episode_reward_max: 500.0
  episode_reward_mean: 492.44
  episode_reward_min: 359.0
  episodes_this_iter: 8
  episodes_total: 978
  experiment_id: b4dc5ffd0f634bb3b56cec8b19d70331
  hostname: devbox-x299
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 3.252606646703484e-20
          cur_lr: 4.999999873689376e-05
          entropy: 0.2761077880859375
          entropy_coeff: 0.0
          kl: 0.002532893093302846
          model: {}
          policy_loss: -0.0015050852671265602
          total_loss: 263.3819885253906
          vf_explained_var: 0.48334449529647827
          vf_loss: 263.3835144042969
    num_agent_steps_sampled: 332000
    num_agent_steps_trained: 332000
    num_steps_sampled: 332000
    num_steps_trained: 332000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,83,1231.98,332000,492.44,500,359,492.44


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,83,1231.98,332000,492.44,500,359,492.44


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,83,1231.98,332000,492.44,500,359,492.44


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_CartPole-v1_b92be_00000,RUNNING,192.168.0.90:11117,83,1231.98,332000,492.44,500,359,492.44
