# Teach the robot in the `BipedalWalker-v3` environment how to walk using `rllib`'s PPO implementation

Finally, the time has come to teach the robot how to walk. 

Are you ready? Let's go!

Before starting this exercise, first check your `ray` version using the following code

In [1]:
import ray
ray.__version__

'1.11.0'

If the version is greater than `1.11.0`, please downgrade the package to `1.11.0` first.

You can do that by running the following code in the terminal.

```
pip uninstall ray[rllib]
pip install ray[rllib]==1.11.0
```

We are doing this because version `1.12.0` has a bug which prevents `BipedalWalker-v3` from learning.

Next, import and initialize `ray`.

In [2]:
# Import and initialize ray in this cell
ray.init()

{'node_ip_address': '192.168.0.98',
 'raylet_ip_address': '192.168.0.98',
 'redis_address': None,
 'object_store_address': '/tmp/ray/session_2022-12-23_17-24-36_251314_158663/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2022-12-23_17-24-36_251314_158663/sockets/raylet',
 'webui_url': None,
 'session_dir': '/tmp/ray/session_2022-12-23_17-24-36_251314_158663',
 'metrics_export_port': 64706,
 'gcs_address': '192.168.0.98:60517',
 'address': '192.168.0.98:60517',
 'node_id': '2179fa9e7c608b3605f79664922df67944fa2aeabd883093aa3cc4df'}

To teach the robot how to walk, you simply need to run an experiment (using `ray`'s experiment runner) with the following configurations

- Set the algorithm to "PPO"
- Set the environment to "BipedalWalker-v3"

Those are the required configurations. In addition to those, also set the following optional configurations.

- Set the number of training iterations between evaluations to 100
- Set the number of episodes used for each evaluation to 100

**WARNING: The `BipedalWalker-v3` environment is much harder than `CartPole-v1`. It's going to take many timesteps (~ 5 million timesteps) for the robot to reach acceptable walking performance (~ 250 cumulative rewards per episode). You may have to keep your computer running for ~ 4 hours, when using tensorflow on CPU.**

In [4]:
# Import the experiment runner below
from ray import tune

# Run the experiment by filling the blank with the required configuration
tune.run(
    "PPO", 
    config={
        "env": "BipedalWalker-v3", 
        "num_gpus": 2,
        "evaluation_interval": 100,    # num of training iter between evaluations
        "evaluation_num_episodes": 100,
    }, 
    local_dir="walker_v3"
)

Trial name,status,loc
PPO_BipedalWalker-v3_25b72_00000,PENDING,


[2m[36m(PPOTrainer pid=158805)[0m 2022-12-23 17:28:02,025	INFO trainer.py:2140 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
[2m[36m(PPOTrainer pid=158805)[0m 2022-12-23 17:28:02,026	INFO ppo.py:249 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
[2m[36m(PPOTrainer pid=158805)[0m 2022-12-23 17:28:02,026	INFO trainer.py:779 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.


Trial name,status,loc
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805




Trial name,status,loc
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805


Trial name,status,loc
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 4000
  custom_metrics: {}
  date: 2022-12-23_17-28-16
  done: false
  episode_len_mean: 678.4
  episode_media: {}
  episode_reward_max: -101.33964626871236
  episode_reward_mean: -109.75189014949474
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 5
  episodes_total: 5
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.645051956176758
          entropy_coeff: 0.0
          kl: 0.0156696829944849
          model: {}
          policy_loss: -0.022245073691010475
          total_loss: 274.5101623535156
          vf_explained_var: -0.021566230803728104
          vf_loss: 274.5292663574219
        train: null
    num_agent_steps_sampled: 4000
    num_agent_steps_trained: 4000
    num_steps_sampled: 4000
    num_steps_trained: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,1,6.84059,4000,-109.752,-101.34,-125.646,678.4


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 8000
  custom_metrics: {}
  date: 2022-12-23_17-28-22
  done: false
  episode_len_mean: 422.2352941176471
  episode_media: {}
  episode_reward_max: -101.2622276331869
  episode_reward_mean: -110.29294449331815
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 12
  episodes_total: 17
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.692352771759033
          entropy_coeff: 0.0
          kl: 0.01690756529569626
          model: {}
          policy_loss: -0.026634719222784042
          total_loss: 691.4613647460938
          vf_explained_var: -0.3008078336715698
          vf_loss: 691.484619140625
        train: null
    num_agent_steps_sampled: 8000
    num_agent_steps_trained: 8000
    num_steps_sampled: 8000
    num_step

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,2,13.0829,8000,-110.293,-101.262,-125.646,422.235


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 12000
  custom_metrics: {}
  date: 2022-12-23_17-28-29
  done: false
  episode_len_mean: 523.35
  episode_media: {}
  episode_reward_max: -101.01919514975386
  episode_reward_mean: -109.73047982333135
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 3
  episodes_total: 20
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.704278945922852
          entropy_coeff: 0.0
          kl: 0.010567907243967056
          model: {}
          policy_loss: -0.0235713180154562
          total_loss: 75.7781753540039
          vf_explained_var: -0.41361796855926514
          vf_loss: 75.79962921142578
        train: null
    num_agent_steps_sampled: 12000
    num_agent_steps_trained: 12000
    num_steps_sampled: 12000
    num_steps_train

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,3,19.3464,12000,-109.73,-101.019,-125.646,523.35


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 16000
  custom_metrics: {}
  date: 2022-12-23_17-28-35
  done: false
  episode_len_mean: 461.96774193548384
  episode_media: {}
  episode_reward_max: -101.01919514975386
  episode_reward_mean: -110.43834433051995
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 11
  episodes_total: 31
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.697986602783203
          entropy_coeff: 0.0
          kl: 0.020429696887731552
          model: {}
          policy_loss: -0.03610014542937279
          total_loss: 578.8330078125
          vf_explained_var: -0.6338520646095276
          vf_loss: 578.8650512695312
        train: null
    num_agent_steps_sampled: 16000
    num_agent_steps_trained: 16000
    num_steps_sampled: 16000
    num_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,4,25.6357,16000,-110.438,-101.019,-125.646,461.968


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 20000
  custom_metrics: {}
  date: 2022-12-23_17-28-41
  done: false
  episode_len_mean: 517.3823529411765
  episode_media: {}
  episode_reward_max: -101.01919514975386
  episode_reward_mean: -110.4887255121062
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 3
  episodes_total: 34
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.538061618804932
          entropy_coeff: 0.0
          kl: 0.010549559257924557
          model: {}
          policy_loss: -0.02856048196554184
          total_loss: 52.932525634765625
          vf_explained_var: -0.5739486813545227
          vf_loss: 52.95897674560547
        train: null
    num_agent_steps_sampled: 20000
    num_agent_steps_trained: 20000
    num_steps_sampled: 20000
    num

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,5,31.8327,20000,-110.489,-101.019,-125.646,517.382


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,5,31.8327,20000,-110.489,-101.019,-125.646,517.382


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 24000
  custom_metrics: {}
  date: 2022-12-23_17-28-48
  done: false
  episode_len_mean: 590.3421052631579
  episode_media: {}
  episode_reward_max: -101.01919514975386
  episode_reward_mean: -110.57435748232952
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 4
  episodes_total: 38
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.5306525230407715
          entropy_coeff: 0.0
          kl: 0.009493827819824219
          model: {}
          policy_loss: -0.023390917107462883
          total_loss: 52.365264892578125
          vf_explained_var: -0.23677678406238556
          vf_loss: 52.38675308227539
        train: null
    num_agent_steps_sampled: 24000
    num_agent_steps_trained: 24000
    num_steps_sampled: 24000
   

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,6,38.0849,24000,-110.574,-101.019,-125.646,590.342


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 28000
  custom_metrics: {}
  date: 2022-12-23_17-28-54
  done: false
  episode_len_mean: 664.219512195122
  episode_media: {}
  episode_reward_max: -101.01919514975386
  episode_reward_mean: -110.24733988385232
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 3
  episodes_total: 41
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.502223491668701
          entropy_coeff: 0.0
          kl: 0.009272604249417782
          model: {}
          policy_loss: 0.002136416034772992
          total_loss: 3.188262462615967
          vf_explained_var: -0.04488189518451691
          vf_loss: 3.1842713356018066
        train: null
    num_agent_steps_sampled: 28000
    num_agent_steps_trained: 28000
    num_steps_sampled: 28000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,7,44.3199,28000,-110.247,-101.019,-125.646,664.22


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 32000
  custom_metrics: {}
  date: 2022-12-23_17-29-00
  done: false
  episode_len_mean: 707.7441860465116
  episode_media: {}
  episode_reward_max: -101.01330546350157
  episode_reward_mean: -109.8995033452653
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 2
  episodes_total: 43
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.505726337432861
          entropy_coeff: 0.0
          kl: 0.011723184026777744
          model: {}
          policy_loss: -0.019915057346224785
          total_loss: 2.3570139408111572
          vf_explained_var: 0.04875466972589493
          vf_loss: 2.374584197998047
        train: null
    num_agent_steps_sampled: 32000
    num_agent_steps_trained: 32000
    num_steps_sampled: 32000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,8,50.5367,32000,-109.9,-101.013,-125.646,707.744


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 36000
  custom_metrics: {}
  date: 2022-12-23_17-29-06
  done: false
  episode_len_mean: 747.4
  episode_media: {}
  episode_reward_max: -98.12393183841229
  episode_reward_mean: -109.57519787152438
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 2
  episodes_total: 45
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.39167594909668
          entropy_coeff: 0.0
          kl: 0.015462150797247887
          model: {}
          policy_loss: -0.02336364984512329
          total_loss: 1.1338802576065063
          vf_explained_var: -0.009230677969753742
          vf_loss: 1.154151439666748
        train: null
    num_agent_steps_sampled: 36000
    num_agent_steps_trained: 36000
    num_steps_sampled: 36000
    num_steps_trai

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,9,56.7401,36000,-109.575,-98.1239,-125.646,747.4


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 40000
  custom_metrics: {}
  date: 2022-12-23_17-29-13
  done: false
  episode_len_mean: 785.5510204081633
  episode_media: {}
  episode_reward_max: -98.12393183841229
  episode_reward_mean: -109.10391027058873
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 4
  episodes_total: 49
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.346787929534912
          entropy_coeff: 0.0
          kl: 0.010472910478711128
          model: {}
          policy_loss: -0.021779850125312805
          total_loss: 65.93119049072266
          vf_explained_var: -0.21763400733470917
          vf_loss: 65.95087432861328
        train: null
    num_agent_steps_sampled: 40000
    num_agent_steps_trained: 40000
    num_steps_sampled: 40000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,10,62.9951,40000,-109.104,-98.1239,-125.646,785.551


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,10,62.9951,40000,-109.104,-98.1239,-125.646,785.551


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 44000
  custom_metrics: {}
  date: 2022-12-23_17-29-19
  done: false
  episode_len_mean: 832.5384615384615
  episode_media: {}
  episode_reward_max: -92.20223257929587
  episode_reward_mean: -108.33758692707475
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 3
  episodes_total: 52
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.207039833068848
          entropy_coeff: 0.0
          kl: 0.013340535573661327
          model: {}
          policy_loss: -0.012544560246169567
          total_loss: 2.334543228149414
          vf_explained_var: -0.01019146665930748
          vf_loss: 2.3444197177886963
        train: null
    num_agent_steps_sampled: 44000
    num_agent_steps_trained: 44000
    num_steps_sampled: 44000
    n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,11,69.253,44000,-108.338,-92.2022,-125.646,832.538


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 48000
  custom_metrics: {}
  date: 2022-12-23_17-29-25
  done: false
  episode_len_mean: 832.7857142857143
  episode_media: {}
  episode_reward_max: -92.20223257929587
  episode_reward_mean: -108.05404639794207
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 4
  episodes_total: 56
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.260312557220459
          entropy_coeff: 0.0
          kl: 0.01325264759361744
          model: {}
          policy_loss: -0.03027886338531971
          total_loss: 106.6024169921875
          vf_explained_var: -0.32580262422561646
          vf_loss: 106.63005065917969
        train: null
    num_agent_steps_sampled: 48000
    num_agent_steps_trained: 48000
    num_steps_sampled: 48000
    num

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,12,75.4661,48000,-108.054,-92.2022,-125.646,832.786


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 52000
  custom_metrics: {}
  date: 2022-12-23_17-29-31
  done: false
  episode_len_mean: 845.8474576271186
  episode_media: {}
  episode_reward_max: -92.20223257929587
  episode_reward_mean: -107.6359293292368
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 3
  episodes_total: 59
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.196269989013672
          entropy_coeff: 0.0
          kl: 0.007698612287640572
          model: {}
          policy_loss: -0.027825862169265747
          total_loss: 47.55764389038086
          vf_explained_var: -0.36368098855018616
          vf_loss: 47.58393096923828
        train: null
    num_agent_steps_sampled: 52000
    num_agent_steps_trained: 52000
    num_steps_sampled: 52000
    num

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,13,81.6626,52000,-107.636,-92.2022,-125.646,845.847


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 56000
  custom_metrics: {}
  date: 2022-12-23_17-29-38
  done: false
  episode_len_mean: 882.3387096774194
  episode_media: {}
  episode_reward_max: -92.20223257929587
  episode_reward_mean: -107.2313999005172
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 3
  episodes_total: 62
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.153186798095703
          entropy_coeff: 0.0
          kl: 0.011024028062820435
          model: {}
          policy_loss: -0.028049172833561897
          total_loss: 4.7348809242248535
          vf_explained_var: -0.14691314101219177
          vf_loss: 4.760725498199463
        train: null
    num_agent_steps_sampled: 56000
    num_agent_steps_trained: 56000
    num_steps_sampled: 56000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,14,87.9275,56000,-107.231,-92.2022,-125.646,882.339


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 60000
  custom_metrics: {}
  date: 2022-12-23_17-29-44
  done: false
  episode_len_mean: 904.765625
  episode_media: {}
  episode_reward_max: -83.99104351180475
  episode_reward_mean: -106.71932501567602
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 2
  episodes_total: 64
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.10328483581543
          entropy_coeff: 0.0
          kl: 0.012507213279604912
          model: {}
          policy_loss: -0.025799917057156563
          total_loss: 1.2651374340057373
          vf_explained_var: 0.11706218123435974
          vf_loss: 1.2884358167648315
        train: null
    num_agent_steps_sampled: 60000
    num_agent_steps_trained: 60000
    num_steps_sampled: 60000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,15,94.1998,60000,-106.719,-83.991,-125.646,904.766


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,15,94.1998,60000,-106.719,-83.991,-125.646,904.766


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 64000
  custom_metrics: {}
  date: 2022-12-23_17-29-50
  done: false
  episode_len_mean: 923.0
  episode_media: {}
  episode_reward_max: -83.99104351180475
  episode_reward_mean: -106.11382712370833
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 4
  episodes_total: 68
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.118871212005615
          entropy_coeff: 0.0
          kl: 0.009027808904647827
          model: {}
          policy_loss: -0.024687567725777626
          total_loss: 53.5495719909668
          vf_explained_var: -0.126837357878685
          vf_loss: 53.57245635986328
        train: null
    num_agent_steps_sampled: 64000
    num_agent_steps_trained: 64000
    num_steps_sampled: 64000
    num_steps_trained

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,16,100.415,64000,-106.114,-83.991,-125.646,923


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 68000
  custom_metrics: {}
  date: 2022-12-23_17-29-56
  done: false
  episode_len_mean: 942.3428571428572
  episode_media: {}
  episode_reward_max: -83.99104351180475
  episode_reward_mean: -105.55590158544807
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 2
  episodes_total: 70
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.122050762176514
          entropy_coeff: 0.0
          kl: 0.008303681388497353
          model: {}
          policy_loss: -0.022225620225071907
          total_loss: 3.5172359943389893
          vf_explained_var: 0.018919847905635834
          vf_loss: 3.5378007888793945
        train: null
    num_agent_steps_sampled: 68000
    num_agent_steps_trained: 68000
    num_steps_sampled: 68000
    

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,17,106.648,68000,-105.556,-83.991,-125.646,942.343


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 72000
  custom_metrics: {}
  date: 2022-12-23_17-30-03
  done: false
  episode_len_mean: 969.3698630136986
  episode_media: {}
  episode_reward_max: -78.04261612102256
  episode_reward_mean: -104.63253795179156
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 3
  episodes_total: 73
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 5.056233882904053
          entropy_coeff: 0.0
          kl: 0.012547197751700878
          model: {}
          policy_loss: -0.026586515828967094
          total_loss: 1.4096653461456299
          vf_explained_var: 0.09894546866416931
          vf_loss: 1.433742642402649
        train: null
    num_agent_steps_sampled: 72000
    num_agent_steps_trained: 72000
    num_steps_sampled: 72000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,18,113.328,72000,-104.633,-78.0426,-125.646,969.37


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 76000
  custom_metrics: {}
  date: 2022-12-23_17-30-09
  done: false
  episode_len_mean: 986.1866666666666
  episode_media: {}
  episode_reward_max: -78.04261612102256
  episode_reward_mean: -104.18562738686231
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 2
  episodes_total: 75
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 4.963725566864014
          entropy_coeff: 0.0
          kl: 0.015596229583024979
          model: {}
          policy_loss: -0.02193654328584671
          total_loss: 0.8278713226318359
          vf_explained_var: 0.13307681679725647
          vf_loss: 0.8466886281967163
        train: null
    num_agent_steps_sampled: 76000
    num_agent_steps_trained: 76000
    num_steps_sampled: 76000
    nu

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,19,119.641,76000,-104.186,-78.0426,-125.646,986.187


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 80000
  custom_metrics: {}
  date: 2022-12-23_17-30-16
  done: false
  episode_len_mean: 1009.7948717948718
  episode_media: {}
  episode_reward_max: -78.04261612102256
  episode_reward_mean: -103.44984294275586
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 3
  episodes_total: 78
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 4.865558624267578
          entropy_coeff: 0.0
          kl: 0.01628044806420803
          model: {}
          policy_loss: -0.02020290493965149
          total_loss: 1.0671919584274292
          vf_explained_var: 0.1866530478000641
          vf_loss: 1.0841387510299683
        train: null
    num_agent_steps_sampled: 80000
    num_agent_steps_trained: 80000
    num_steps_sampled: 80000
    num

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,20,125.943,80000,-103.45,-78.0426,-125.646,1009.79


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,20,125.943,80000,-103.45,-78.0426,-125.646,1009.79


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 84000
  custom_metrics: {}
  date: 2022-12-23_17-30-22
  done: false
  episode_len_mean: 1024.55
  episode_media: {}
  episode_reward_max: -78.04261612102256
  episode_reward_mean: -102.91670263285671
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 2
  episodes_total: 80
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 4.826045036315918
          entropy_coeff: 0.0
          kl: 0.011342574842274189
          model: {}
          policy_loss: -0.01837044209241867
          total_loss: 1.095554232597351
          vf_explained_var: -0.04764169827103615
          vf_loss: 1.1116561889648438
        train: null
    num_agent_steps_sampled: 84000
    num_agent_steps_trained: 84000
    num_steps_sampled: 84000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,21,132.267,84000,-102.917,-78.0426,-125.646,1024.55


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 88000
  custom_metrics: {}
  date: 2022-12-23_17-30-28
  done: false
  episode_len_mean: 1045.3493975903614
  episode_media: {}
  episode_reward_max: -73.73051724980522
  episode_reward_mean: -101.9606771555184
  episode_reward_min: -125.64550395528546
  episodes_this_iter: 3
  episodes_total: 83
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 4.729432106018066
          entropy_coeff: 0.0
          kl: 0.016451548784971237
          model: {}
          policy_loss: -0.01875431463122368
          total_loss: 1.1680885553359985
          vf_explained_var: 0.118070088326931
          vf_loss: 1.183552622795105
        train: null
    num_agent_steps_sampled: 88000
    num_agent_steps_trained: 88000
    num_steps_sampled: 88000
    num_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,22,138.497,88000,-101.961,-73.7305,-125.646,1045.35


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 92000
  custom_metrics: {}
  date: 2022-12-23_17-30-34
  done: false
  episode_len_mean: 1047.2674418604652
  episode_media: {}
  episode_reward_max: -71.31383190269953
  episode_reward_mean: -101.64580852941666
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 86
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 4.80612850189209
          entropy_coeff: 0.0
          kl: 0.00841265358030796
          model: {}
          policy_loss: -0.01834314875304699
          total_loss: 21.60879898071289
          vf_explained_var: 0.25819385051727295
          vf_loss: 21.62546157836914
        train: null
    num_agent_steps_sampled: 92000
    num_agent_steps_trained: 92000
    num_steps_sampled: 92000
    num_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,23,144.709,92000,-101.646,-71.3138,-127.98,1047.27


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 96000
  custom_metrics: {}
  date: 2022-12-23_17-30-41
  done: false
  episode_len_mean: 1055.0
  episode_media: {}
  episode_reward_max: -68.67654774620566
  episode_reward_mean: -100.80301981724539
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 4
  episodes_total: 90
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 4.759394645690918
          entropy_coeff: 0.0
          kl: 0.011985072866082191
          model: {}
          policy_loss: -0.030410969629883766
          total_loss: 46.23482894897461
          vf_explained_var: 0.22080311179161072
          vf_loss: 46.262840270996094
        train: null
    num_agent_steps_sampled: 96000
    num_agent_steps_trained: 96000
    num_steps_sampled: 96000
    num_steps_tra

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,24,150.938,96000,-100.803,-68.6765,-127.98,1055


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 100000
  custom_metrics: {}
  date: 2022-12-23_17-30-47
  done: false
  episode_len_mean: 1066.8478260869565
  episode_media: {}
  episode_reward_max: -68.67654774620566
  episode_reward_mean: -100.25642361851125
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 92
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 4.6793293952941895
          entropy_coeff: 0.0
          kl: 0.010267348028719425
          model: {}
          policy_loss: -0.019951317459344864
          total_loss: 1.1288803815841675
          vf_explained_var: 0.30974724888801575
          vf_loss: 1.1467781066894531
        train: null
    num_agent_steps_sampled: 100000
    num_agent_steps_trained: 100000
    num_steps_sampled: 100000

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,25,157.183,100000,-100.256,-68.6765,-127.98,1066.85


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,25,157.183,100000,-100.256,-68.6765,-127.98,1066.85


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 104000
  custom_metrics: {}
  date: 2022-12-23_17-30-53
  done: false
  episode_len_mean: 1078.1914893617022
  episode_media: {}
  episode_reward_max: -65.7238838826139
  episode_reward_mean: -99.56250970400357
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 94
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 4.5209760665893555
          entropy_coeff: 0.0
          kl: 0.014213238842785358
          model: {}
          policy_loss: -0.028331246227025986
          total_loss: 0.5754063725471497
          vf_explained_var: 0.47236308455467224
          vf_loss: 0.6008949875831604
        train: null
    num_agent_steps_sampled: 104000
    num_agent_steps_trained: 104000
    num_steps_sampled: 104000
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,26,163.434,104000,-99.5625,-65.7239,-127.98,1078.19


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 108000
  custom_metrics: {}
  date: 2022-12-23_17-31-00
  done: false
  episode_len_mean: 1094.3298969072166
  episode_media: {}
  episode_reward_max: -62.79402244450651
  episode_reward_mean: -98.46022111033729
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 97
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 4.403848648071289
          entropy_coeff: 0.0
          kl: 0.01345518883317709
          model: {}
          policy_loss: -0.020581062883138657
          total_loss: 0.795676589012146
          vf_explained_var: 0.3905438780784607
          vf_loss: 0.8135666251182556
        train: null
    num_agent_steps_sampled: 108000
    num_agent_steps_trained: 108000
    num_steps_sampled: 108000
    

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,27,169.696,108000,-98.4602,-62.794,-127.98,1094.33


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 112000
  custom_metrics: {}
  date: 2022-12-23_17-31-06
  done: false
  episode_len_mean: 1109.5
  episode_media: {}
  episode_reward_max: -59.09859828475125
  episode_reward_mean: -97.38045160573704
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 100
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 4.326533317565918
          entropy_coeff: 0.0
          kl: 0.01756836287677288
          model: {}
          policy_loss: -0.023189380764961243
          total_loss: 0.5684767365455627
          vf_explained_var: 0.35760727524757385
          vf_loss: 0.5881524085998535
        train: null
    num_agent_steps_sampled: 112000
    num_agent_steps_trained: 112000
    num_steps_sampled: 112000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,28,175.926,112000,-97.3805,-59.0986,-127.98,1109.5


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 116000
  custom_metrics: {}
  date: 2022-12-23_17-31-12
  done: false
  episode_len_mean: 1124.91
  episode_media: {}
  episode_reward_max: -59.09859828475125
  episode_reward_mean: -96.5257652550617
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 102
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 4.149298667907715
          entropy_coeff: 0.0
          kl: 0.012953488156199455
          model: {}
          policy_loss: -0.013378625735640526
          total_loss: 0.5389046669006348
          vf_explained_var: 0.4351658225059509
          vf_loss: 0.5496925115585327
        train: null
    num_agent_steps_sampled: 116000
    num_agent_steps_trained: 116000
    num_steps_sampled: 116000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,29,182.199,116000,-96.5258,-59.0986,-127.98,1124.91


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 120000
  custom_metrics: {}
  date: 2022-12-23_17-31-18
  done: false
  episode_len_mean: 1140.21
  episode_media: {}
  episode_reward_max: -46.66297147874337
  episode_reward_mean: -95.19517193084558
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 104
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 4.139858245849609
          entropy_coeff: 0.0
          kl: 0.015605791471898556
          model: {}
          policy_loss: -0.02138504385948181
          total_loss: 0.5984724164009094
          vf_explained_var: 0.19251106679439545
          vf_loss: 0.6167363524436951
        train: null
    num_agent_steps_sampled: 120000
    num_agent_steps_trained: 120000
    num_steps_sampled: 120000
    num_step

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,30,188.474,120000,-95.1952,-46.663,-127.98,1140.21


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,30,188.474,120000,-95.1952,-46.663,-127.98,1140.21


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 124000
  custom_metrics: {}
  date: 2022-12-23_17-31-25
  done: false
  episode_len_mean: 1171.0
  episode_media: {}
  episode_reward_max: -46.66297147874337
  episode_reward_mean: -93.59568333889723
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 107
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 4.134120941162109
          entropy_coeff: 0.0
          kl: 0.0146876759827137
          model: {}
          policy_loss: -0.02018466405570507
          total_loss: 0.5085868835449219
          vf_explained_var: 0.3224993348121643
          vf_loss: 0.5258340239524841
        train: null
    num_agent_steps_sampled: 124000
    num_agent_steps_trained: 124000
    num_steps_sampled: 124000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,31,194.728,124000,-93.5957,-46.663,-127.98,1171


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 128000
  custom_metrics: {}
  date: 2022-12-23_17-31-31
  done: false
  episode_len_mean: 1217.43
  episode_media: {}
  episode_reward_max: -41.52519579624876
  episode_reward_mean: -91.81309835573323
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 110
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 4.143176078796387
          entropy_coeff: 0.0
          kl: 0.017285246402025223
          model: {}
          policy_loss: -0.02535649761557579
          total_loss: 0.7481164336204529
          vf_explained_var: 0.12397495657205582
          vf_loss: 0.7700158357620239
        train: null
    num_agent_steps_sampled: 128000
    num_agent_steps_trained: 128000
    num_steps_sampled: 128000
    num_step

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,32,200.976,128000,-91.8131,-41.5252,-127.98,1217.43


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 132000
  custom_metrics: {}
  date: 2022-12-23_17-31-37
  done: false
  episode_len_mean: 1232.87
  episode_media: {}
  episode_reward_max: -41.52519579624876
  episode_reward_mean: -90.545817950646
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 112
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 4.050921440124512
          entropy_coeff: 0.0
          kl: 0.017605522647500038
          model: {}
          policy_loss: -0.02758100815117359
          total_loss: 0.41643407940864563
          vf_explained_var: 0.19712138175964355
          vf_loss: 0.44049400091171265
        train: null
    num_agent_steps_sampled: 132000
    num_agent_steps_trained: 132000
    num_steps_sampled: 132000
    num_step

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,33,207.189,132000,-90.5458,-41.5252,-127.98,1232.87


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 136000
  custom_metrics: {}
  date: 2022-12-23_17-31-43
  done: false
  episode_len_mean: 1263.52
  episode_media: {}
  episode_reward_max: -37.0856413499644
  episode_reward_mean: -88.96229410310286
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 114
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 4.075223445892334
          entropy_coeff: 0.0
          kl: 0.014315873384475708
          model: {}
          policy_loss: -0.02392864041030407
          total_loss: 0.5152563452720642
          vf_explained_var: 0.1593097746372223
          vf_loss: 0.5363218188285828
        train: null
    num_agent_steps_sampled: 136000
    num_agent_steps_trained: 136000
    num_steps_sampled: 136000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,34,213.422,136000,-88.9623,-37.0856,-127.98,1263.52


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 140000
  custom_metrics: {}
  date: 2022-12-23_17-31-50
  done: false
  episode_len_mean: 1309.72
  episode_media: {}
  episode_reward_max: -36.032033176709
  episode_reward_mean: -86.7508708556427
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 117
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 3.9507052898406982
          entropy_coeff: 0.0
          kl: 0.01702706143260002
          model: {}
          policy_loss: -0.029691515490412712
          total_loss: 0.4654120206832886
          vf_explained_var: 0.2319677323102951
          vf_loss: 0.49169814586639404
        train: null
    num_agent_steps_sampled: 140000
    num_agent_steps_trained: 140000
    num_steps_sampled: 140000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,35,219.721,140000,-86.7509,-36.032,-127.98,1309.72


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,35,219.721,140000,-86.7509,-36.032,-127.98,1309.72


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 144000
  custom_metrics: {}
  date: 2022-12-23_17-31-56
  done: false
  episode_len_mean: 1324.83
  episode_media: {}
  episode_reward_max: -25.983555634621492
  episode_reward_mean: -84.55787905372136
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 120
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 3.8789021968841553
          entropy_coeff: 0.0
          kl: 0.017974689602851868
          model: {}
          policy_loss: -0.027225183323025703
          total_loss: 0.5114206671714783
          vf_explained_var: 0.17187879979610443
          vf_loss: 0.5350509285926819
        train: null
    num_agent_steps_sampled: 144000
    num_agent_steps_trained: 144000
    num_steps_sampled: 144000
    num_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,36,225.948,144000,-84.5579,-25.9836,-127.98,1324.83


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 148000
  custom_metrics: {}
  date: 2022-12-23_17-32-02
  done: false
  episode_len_mean: 1340.33
  episode_media: {}
  episode_reward_max: -22.01111738647646
  episode_reward_mean: -82.85340996584515
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 122
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 3.7430531978607178
          entropy_coeff: 0.0
          kl: 0.01624983362853527
          model: {}
          policy_loss: -0.032297149300575256
          total_loss: 0.6162409782409668
          vf_explained_var: 0.0824027955532074
          vf_loss: 0.6452882289886475
        train: null
    num_agent_steps_sampled: 148000
    num_agent_steps_trained: 148000
    num_steps_sampled: 148000
    num_step

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,37,232.182,148000,-82.8534,-22.0111,-127.98,1340.33


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 152000
  custom_metrics: {}
  date: 2022-12-23_17-32-08
  done: false
  episode_len_mean: 1355.7
  episode_media: {}
  episode_reward_max: -17.207103320685523
  episode_reward_mean: -80.94483107403941
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 124
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 3.6454689502716064
          entropy_coeff: 0.0
          kl: 0.01648136042058468
          model: {}
          policy_loss: -0.025614218786358833
          total_loss: 0.875985860824585
          vf_explained_var: 0.18415215611457825
          vf_loss: 0.898303747177124
        train: null
    num_agent_steps_sampled: 152000
    num_agent_steps_trained: 152000
    num_steps_sampled: 152000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,38,238.415,152000,-80.9448,-17.2071,-127.98,1355.7


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 156000
  custom_metrics: {}
  date: 2022-12-23_17-32-15
  done: false
  episode_len_mean: 1401.12
  episode_media: {}
  episode_reward_max: 8.365108984111494
  episode_reward_mean: -77.92002780272776
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 127
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 3.539142370223999
          entropy_coeff: 0.0
          kl: 0.019679401069879532
          model: {}
          policy_loss: -0.02874567173421383
          total_loss: 0.6950201988220215
          vf_explained_var: 0.3659529685974121
          vf_loss: 0.7198299169540405
        train: null
    num_agent_steps_sampled: 156000
    num_agent_steps_trained: 156000
    num_steps_sampled: 156000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,39,244.612,156000,-77.92,8.36511,-127.98,1401.12


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 160000
  custom_metrics: {}
  date: 2022-12-23_17-32-21
  done: false
  episode_len_mean: 1446.81
  episode_media: {}
  episode_reward_max: 12.654014368278256
  episode_reward_mean: -74.33992823301487
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 130
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 3.44712233543396
          entropy_coeff: 0.0
          kl: 0.020502088591456413
          model: {}
          policy_loss: -0.025623278692364693
          total_loss: 0.9350083470344543
          vf_explained_var: 0.38602742552757263
          vf_loss: 0.9565311670303345
        train: null
    num_agent_steps_sampled: 160000
    num_agent_steps_trained: 160000
    num_steps_sampled: 160000
    num_step

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,40,250.956,160000,-74.3399,12.654,-127.98,1446.81


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,40,250.956,160000,-74.3399,12.654,-127.98,1446.81


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 164000
  custom_metrics: {}
  date: 2022-12-23_17-32-27
  done: false
  episode_len_mean: 1462.29
  episode_media: {}
  episode_reward_max: 24.35069172622824
  episode_reward_mean: -71.68991880876985
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 132
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 3.355738401412964
          entropy_coeff: 0.0
          kl: 0.01615133136510849
          model: {}
          policy_loss: -0.023596681654453278
          total_loss: 1.5008500814437866
          vf_explained_var: 0.3882867693901062
          vf_loss: 1.5212165117263794
        train: null
    num_agent_steps_sampled: 164000
    num_agent_steps_trained: 164000
    num_steps_sampled: 164000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,41,257.205,164000,-71.6899,24.3507,-127.98,1462.29


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 168000
  custom_metrics: {}
  date: 2022-12-23_17-32-34
  done: false
  episode_len_mean: 1477.59
  episode_media: {}
  episode_reward_max: 27.88887789489643
  episode_reward_mean: -68.99343499980982
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 134
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 3.2590012550354004
          entropy_coeff: 0.0
          kl: 0.019517716020345688
          model: {}
          policy_loss: -0.027942674234509468
          total_loss: 0.7930856943130493
          vf_explained_var: 0.3477388918399811
          vf_loss: 0.8171248435974121
        train: null
    num_agent_steps_sampled: 168000
    num_agent_steps_trained: 168000
    num_steps_sampled: 168000
    num_step

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,42,263.423,168000,-68.9934,27.8889,-127.98,1477.59


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 172000
  custom_metrics: {}
  date: 2022-12-23_17-32-40
  done: false
  episode_len_mean: 1477.59
  episode_media: {}
  episode_reward_max: 47.31160857421595
  episode_reward_mean: -64.70396911711491
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 137
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 3.178323984146118
          entropy_coeff: 0.0
          kl: 0.017454734072089195
          model: {}
          policy_loss: -0.020908204838633537
          total_loss: 1.1090140342712402
          vf_explained_var: 0.5400403141975403
          vf_loss: 1.1264312267303467
        train: null
    num_agent_steps_sampled: 172000
    num_agent_steps_trained: 172000
    num_steps_sampled: 172000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,43,269.639,172000,-64.704,47.3116,-127.98,1477.59


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 176000
  custom_metrics: {}
  date: 2022-12-23_17-32-46
  done: false
  episode_len_mean: 1493.17
  episode_media: {}
  episode_reward_max: 49.33984423151367
  episode_reward_mean: -60.26966257576341
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 140
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 3.009298324584961
          entropy_coeff: 0.0
          kl: 0.018369564786553383
          model: {}
          policy_loss: -0.02908375859260559
          total_loss: 1.3322762250900269
          vf_explained_var: 0.36092931032180786
          vf_loss: 1.357685923576355
        train: null
    num_agent_steps_sampled: 176000
    num_agent_steps_trained: 176000
    num_steps_sampled: 176000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,44,275.876,176000,-60.2697,49.3398,-127.98,1493.17


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 180000
  custom_metrics: {}
  date: 2022-12-23_17-32-52
  done: false
  episode_len_mean: 1493.17
  episode_media: {}
  episode_reward_max: 49.33984423151367
  episode_reward_mean: -57.25636387808469
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 142
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 3.016550064086914
          entropy_coeff: 0.0
          kl: 0.020736612379550934
          model: {}
          policy_loss: -0.023000719025731087
          total_loss: 1.176267385482788
          vf_explained_var: 0.4983614683151245
          vf_loss: 1.1951208114624023
        train: null
    num_agent_steps_sampled: 180000
    num_agent_steps_trained: 180000
    num_steps_sampled: 180000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,45,282.116,180000,-57.2564,49.3398,-127.98,1493.17


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,45,282.116,180000,-57.2564,49.3398,-127.98,1493.17


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 184000
  custom_metrics: {}
  date: 2022-12-23_17-32-59
  done: false
  episode_len_mean: 1493.17
  episode_media: {}
  episode_reward_max: 54.763589788761664
  episode_reward_mean: -54.33802675470178
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 144
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.8649802207946777
          entropy_coeff: 0.0
          kl: 0.01465429738163948
          model: {}
          policy_loss: -0.02600852958858013
          total_loss: 1.1912256479263306
          vf_explained_var: 0.5964446067810059
          vf_loss: 1.2143032550811768
        train: null
    num_agent_steps_sampled: 184000
    num_agent_steps_trained: 184000
    num_steps_sampled: 184000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,46,288.355,184000,-54.338,54.7636,-127.98,1493.17


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 188000
  custom_metrics: {}
  date: 2022-12-23_17-33-05
  done: false
  episode_len_mean: 1493.74
  episode_media: {}
  episode_reward_max: 58.196726671077236
  episode_reward_mean: -49.62837386651622
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 4
  episodes_total: 148
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.9135756492614746
          entropy_coeff: 0.0
          kl: 0.015298234298825264
          model: {}
          policy_loss: -0.02517312951385975
          total_loss: 76.9236831665039
          vf_explained_var: 0.24996121227741241
          vf_loss: 76.94579315185547
        train: null
    num_agent_steps_sampled: 188000
    num_agent_steps_trained: 188000
    num_steps_sampled: 188000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,47,294.54,188000,-49.6284,58.1967,-127.98,1493.74


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 192000
  custom_metrics: {}
  date: 2022-12-23_17-33-11
  done: false
  episode_len_mean: 1493.74
  episode_media: {}
  episode_reward_max: 58.196726671077236
  episode_reward_mean: -44.95274335252751
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 151
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.807393789291382
          entropy_coeff: 0.0
          kl: 0.014634872786700726
          model: {}
          policy_loss: -0.02270382270216942
          total_loss: 1.9731619358062744
          vf_explained_var: 0.48563703894615173
          vf_loss: 1.9929389953613281
        train: null
    num_agent_steps_sampled: 192000
    num_agent_steps_trained: 192000
    num_steps_sampled: 192000
    num_step

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,48,300.732,192000,-44.9527,58.1967,-127.98,1493.74


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 196000
  custom_metrics: {}
  date: 2022-12-23_17-33-17
  done: false
  episode_len_mean: 1493.74
  episode_media: {}
  episode_reward_max: 61.41466772591825
  episode_reward_mean: -41.86654138215867
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 153
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.7678608894348145
          entropy_coeff: 0.0
          kl: 0.017175907269120216
          model: {}
          policy_loss: -0.02367447316646576
          total_loss: 1.036445140838623
          vf_explained_var: 0.6352463960647583
          vf_loss: 1.0566843748092651
        train: null
    num_agent_steps_sampled: 196000
    num_agent_steps_trained: 196000
    num_steps_sampled: 196000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,49,306.944,196000,-41.8665,61.4147,-127.98,1493.74


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 200000
  custom_metrics: {}
  date: 2022-12-23_17-33-23
  done: false
  episode_len_mean: 1524.3
  episode_media: {}
  episode_reward_max: 61.41466772591825
  episode_reward_mean: -38.737129195633536
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 155
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.7678558826446533
          entropy_coeff: 0.0
          kl: 0.014752203598618507
          model: {}
          policy_loss: -0.02203713357448578
          total_loss: 1.317419171333313
          vf_explained_var: 0.5553975701332092
          vf_loss: 1.3365058898925781
        train: null
    num_agent_steps_sampled: 200000
    num_agent_steps_trained: 200000
    num_steps_sampled: 200000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,50,313.153,200000,-38.7371,61.4147,-127.98,1524.3


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,50,313.153,200000,-38.7371,61.4147,-127.98,1524.3


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 204000
  custom_metrics: {}
  date: 2022-12-23_17-33-30
  done: false
  episode_len_mean: 1539.61
  episode_media: {}
  episode_reward_max: 61.551451826769856
  episode_reward_mean: -33.84935219887304
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 158
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.7201650142669678
          entropy_coeff: 0.0
          kl: 0.01868390664458275
          model: {}
          policy_loss: -0.031237497925758362
          total_loss: 1.2586733102798462
          vf_explained_var: 0.5899474620819092
          vf_loss: 1.286173939704895
        train: null
    num_agent_steps_sampled: 204000
    num_agent_steps_trained: 204000
    num_steps_sampled: 204000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,51,319.406,204000,-33.8494,61.5515,-127.98,1539.61


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 208000
  custom_metrics: {}
  date: 2022-12-23_17-33-36
  done: false
  episode_len_mean: 1539.61
  episode_media: {}
  episode_reward_max: 65.32448174328037
  episode_reward_mean: -29.161712624717193
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 161
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.670811891555786
          entropy_coeff: 0.0
          kl: 0.017008623108267784
          model: {}
          policy_loss: -0.027855489403009415
          total_loss: 0.8931689262390137
          vf_explained_var: 0.6478950381278992
          vf_loss: 0.9176226854324341
        train: null
    num_agent_steps_sampled: 208000
    num_agent_steps_trained: 208000
    num_steps_sampled: 208000
    num_step

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,52,325.638,208000,-29.1617,65.3245,-127.98,1539.61


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 212000
  custom_metrics: {}
  date: 2022-12-23_17-33-42
  done: false
  episode_len_mean: 1539.61
  episode_media: {}
  episode_reward_max: 80.13907978402577
  episode_reward_mean: -25.84346662223099
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 163
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.569742202758789
          entropy_coeff: 0.0
          kl: 0.01608564890921116
          model: {}
          policy_loss: -0.023057473823428154
          total_loss: 1.176178216934204
          vf_explained_var: 0.6723731160163879
          vf_loss: 1.1960185766220093
        train: null
    num_agent_steps_sampled: 212000
    num_agent_steps_trained: 212000
    num_steps_sampled: 212000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,53,331.793,212000,-25.8435,80.1391,-127.98,1539.61


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 216000
  custom_metrics: {}
  date: 2022-12-23_17-33-48
  done: false
  episode_len_mean: 1539.61
  episode_media: {}
  episode_reward_max: 80.13907978402577
  episode_reward_mean: -22.486864970563094
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 165
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.667537212371826
          entropy_coeff: 0.0
          kl: 0.018682045862078667
          model: {}
          policy_loss: -0.027474420145154
          total_loss: 1.4568471908569336
          vf_explained_var: 0.5869460701942444
          vf_loss: 1.4805853366851807
        train: null
    num_agent_steps_sampled: 216000
    num_agent_steps_trained: 216000
    num_steps_sampled: 216000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,54,338.035,216000,-22.4869,80.1391,-127.98,1539.61


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 220000
  custom_metrics: {}
  date: 2022-12-23_17-33-55
  done: false
  episode_len_mean: 1555.02
  episode_media: {}
  episode_reward_max: 93.46700640210497
  episode_reward_mean: -17.173550528344176
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 168
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.519136667251587
          entropy_coeff: 0.0
          kl: 0.015298551879823208
          model: {}
          policy_loss: -0.022592667490243912
          total_loss: 1.8697651624679565
          vf_explained_var: 0.565716028213501
          vf_loss: 1.8892982006072998
        train: null
    num_agent_steps_sampled: 220000
    num_agent_steps_trained: 220000
    num_steps_sampled: 220000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,55,344.211,220000,-17.1736,93.467,-127.98,1555.02


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,55,344.211,220000,-17.1736,93.467,-127.98,1555.02


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 224000
  custom_metrics: {}
  date: 2022-12-23_17-34-01
  done: false
  episode_len_mean: 1555.02
  episode_media: {}
  episode_reward_max: 93.46700640210497
  episode_reward_mean: -12.18085206060269
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 171
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.5125021934509277
          entropy_coeff: 0.0
          kl: 0.017615333199501038
          model: {}
          policy_loss: -0.024079903960227966
          total_loss: 2.126746892929077
          vf_explained_var: 0.6378647685050964
          vf_loss: 2.147303581237793
        train: null
    num_agent_steps_sampled: 224000
    num_agent_steps_trained: 224000
    num_steps_sampled: 224000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,56,350.528,224000,-12.1809,93.467,-127.98,1555.02


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 228000
  custom_metrics: {}
  date: 2022-12-23_17-34-07
  done: false
  episode_len_mean: 1555.02
  episode_media: {}
  episode_reward_max: 102.18910911667994
  episode_reward_mean: -8.497616550158453
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 173
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.4561827182769775
          entropy_coeff: 0.0
          kl: 0.016429683193564415
          model: {}
          policy_loss: -0.024742910638451576
          total_loss: 2.0931057929992676
          vf_explained_var: 0.6077880263328552
          vf_loss: 2.114562749862671
        train: null
    num_agent_steps_sampled: 228000
    num_agent_steps_trained: 228000
    num_steps_sampled: 228000
    num_step

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,57,356.727,228000,-8.49762,102.189,-127.98,1555.02


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 232000
  custom_metrics: {}
  date: 2022-12-23_17-34-13
  done: false
  episode_len_mean: 1555.02
  episode_media: {}
  episode_reward_max: 102.18910911667994
  episode_reward_mean: -4.814706611194834
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 175
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.47151255607605
          entropy_coeff: 0.0
          kl: 0.01663992553949356
          model: {}
          policy_loss: -0.0220294501632452
          total_loss: 1.5958036184310913
          vf_explained_var: 0.6473569273948669
          vf_loss: 1.6145050525665283
        train: null
    num_agent_steps_sampled: 232000
    num_agent_steps_trained: 232000
    num_steps_sampled: 232000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,58,362.934,232000,-4.81471,102.189,-127.98,1555.02


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 236000
  custom_metrics: {}
  date: 2022-12-23_17-34-20
  done: false
  episode_len_mean: 1555.02
  episode_media: {}
  episode_reward_max: 106.38122016854042
  episode_reward_mean: 0.550987442791384
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 178
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.3414409160614014
          entropy_coeff: 0.0
          kl: 0.01842312514781952
          model: {}
          policy_loss: -0.018493831157684326
          total_loss: 2.510244131088257
          vf_explained_var: 0.656004011631012
          vf_loss: 2.5250532627105713
        train: null
    num_agent_steps_sampled: 236000
    num_agent_steps_trained: 236000
    num_steps_sampled: 236000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,59,369.158,236000,0.550987,106.381,-127.98,1555.02


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 240000
  custom_metrics: {}
  date: 2022-12-23_17-34-26
  done: false
  episode_len_mean: 1555.02
  episode_media: {}
  episode_reward_max: 106.38122016854042
  episode_reward_mean: 5.8052545762566705
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 3
  episodes_total: 181
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.3234264850616455
          entropy_coeff: 0.0
          kl: 0.01764369010925293
          model: {}
          policy_loss: -0.025295550003647804
          total_loss: 1.9944006204605103
          vf_explained_var: 0.6982313990592957
          vf_loss: 2.016167402267456
        train: null
    num_agent_steps_sampled: 240000
    num_agent_steps_trained: 240000
    num_steps_sampled: 240000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,60,375.339,240000,5.80525,106.381,-127.98,1555.02


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,60,375.339,240000,5.80525,106.381,-127.98,1555.02


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 244000
  custom_metrics: {}
  date: 2022-12-23_17-34-32
  done: false
  episode_len_mean: 1555.02
  episode_media: {}
  episode_reward_max: 106.38122016854042
  episode_reward_mean: 9.291125895702551
  episode_reward_min: -127.98035087673117
  episodes_this_iter: 2
  episodes_total: 183
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.167717218399048
          entropy_coeff: 0.0
          kl: 0.018386106938123703
          model: {}
          policy_loss: -0.02489362098276615
          total_loss: 2.3401167392730713
          vf_explained_var: 0.6952233910560608
          vf_loss: 2.3613333702087402
        train: null
    num_agent_steps_sampled: 244000
    num_agent_steps_trained: 244000
    num_steps_sampled: 244000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,61,381.562,244000,9.29113,106.381,-127.98,1555.02


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 248000
  custom_metrics: {}
  date: 2022-12-23_17-34-38
  done: false
  episode_len_mean: 1570.01
  episode_media: {}
  episode_reward_max: 115.63895268679263
  episode_reward_mean: 13.677251114916457
  episode_reward_min: -108.29767615103164
  episodes_this_iter: 2
  episodes_total: 185
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.0794057846069336
          entropy_coeff: 0.0
          kl: 0.018527600914239883
          model: {}
          policy_loss: -0.026610208675265312
          total_loss: 1.7842766046524048
          vf_explained_var: 0.7412393093109131
          vf_loss: 1.8071812391281128
        train: null
    num_agent_steps_sampled: 248000
    num_agent_steps_trained: 248000
    num_steps_sampled: 248000
    num_ste

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,62,387.891,248000,13.6773,115.639,-108.298,1570.01


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 252000
  custom_metrics: {}
  date: 2022-12-23_17-34-45
  done: false
  episode_len_mean: 1585.16
  episode_media: {}
  episode_reward_max: 116.53638200174446
  episode_reward_mean: 19.552030946431344
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 3
  episodes_total: 188
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.0519678592681885
          entropy_coeff: 0.0
          kl: 0.018491655588150024
          model: {}
          policy_loss: -0.02184155210852623
          total_loss: 2.3210480213165283
          vf_explained_var: 0.7104918360710144
          vf_loss: 2.3391916751861572
        train: null
    num_agent_steps_sampled: 252000
    num_agent_steps_trained: 252000
    num_steps_sampled: 252000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,63,394.094,252000,19.552,116.536,-96.7176,1585.16


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 256000
  custom_metrics: {}
  date: 2022-12-23_17-34-51
  done: false
  episode_len_mean: 1585.16
  episode_media: {}
  episode_reward_max: 123.75832014023563
  episode_reward_mean: 25.1721345908049
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 3
  episodes_total: 191
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.0434775352478027
          entropy_coeff: 0.0
          kl: 0.017398664727807045
          model: {}
          policy_loss: -0.017797963693737984
          total_loss: 1.721159815788269
          vf_explained_var: 0.7598265409469604
          vf_loss: 1.7354779243469238
        train: null
    num_agent_steps_sampled: 256000
    num_agent_steps_trained: 256000
    num_steps_sampled: 256000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,64,400.314,256000,25.1721,123.758,-96.7176,1585.16


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 260000
  custom_metrics: {}
  date: 2022-12-23_17-34-57
  done: false
  episode_len_mean: 1585.16
  episode_media: {}
  episode_reward_max: 123.75832014023563
  episode_reward_mean: 28.90521730354901
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 2
  episodes_total: 193
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.0044479370117188
          entropy_coeff: 0.0
          kl: 0.0197888370603323
          model: {}
          policy_loss: -0.022660456597805023
          total_loss: 2.114398717880249
          vf_explained_var: 0.7790555953979492
          vf_loss: 2.13310170173645
        train: null
    num_agent_steps_sampled: 260000
    num_agent_steps_trained: 260000
    num_steps_sampled: 260000
    num_steps_trai

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,65,406.551,260000,28.9052,123.758,-96.7176,1585.16


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,65,406.551,260000,28.9052,123.758,-96.7176,1585.16


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 264000
  custom_metrics: {}
  date: 2022-12-23_17-35-03
  done: false
  episode_len_mean: 1585.16
  episode_media: {}
  episode_reward_max: 123.75832014023563
  episode_reward_mean: 32.40169491703493
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 2
  episodes_total: 195
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.8838040828704834
          entropy_coeff: 0.0
          kl: 0.018966548144817352
          model: {}
          policy_loss: -0.024374568834900856
          total_loss: 2.2215092182159424
          vf_explained_var: 0.6909325122833252
          vf_loss: 2.242090940475464
        train: null
    num_agent_steps_sampled: 264000
    num_agent_steps_trained: 264000
    num_steps_sampled: 264000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,66,412.814,264000,32.4017,123.758,-96.7176,1585.16


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 268000
  custom_metrics: {}
  date: 2022-12-23_17-35-10
  done: false
  episode_len_mean: 1585.16
  episode_media: {}
  episode_reward_max: 128.25306119199035
  episode_reward_mean: 38.050663699798655
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 3
  episodes_total: 198
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.8604434728622437
          entropy_coeff: 0.0
          kl: 0.018509436398744583
          model: {}
          policy_loss: -0.025587067008018494
          total_loss: 2.0098073482513428
          vf_explained_var: 0.7242030501365662
          vf_loss: 2.0316925048828125
        train: null
    num_agent_steps_sampled: 268000
    num_agent_steps_trained: 268000
    num_steps_sampled: 268000
    num_step

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,67,419.052,268000,38.0507,128.253,-96.7176,1585.16


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 272000
  custom_metrics: {}
  date: 2022-12-23_17-35-16
  done: false
  episode_len_mean: 1585.16
  episode_media: {}
  episode_reward_max: 139.90959892308334
  episode_reward_mean: 43.67661011173016
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 3
  episodes_total: 201
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.8749794960021973
          entropy_coeff: 0.0
          kl: 0.021059125661849976
          model: {}
          policy_loss: -0.02348347008228302
          total_loss: 2.4576053619384766
          vf_explained_var: 0.6108849048614502
          vf_loss: 2.476876974105835
        train: null
    num_agent_steps_sampled: 272000
    num_agent_steps_trained: 272000
    num_steps_sampled: 272000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,68,425.324,272000,43.6766,139.91,-96.7176,1585.16


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 276000
  custom_metrics: {}
  date: 2022-12-23_17-35-22
  done: false
  episode_len_mean: 1585.16
  episode_media: {}
  episode_reward_max: 139.90959892308334
  episode_reward_mean: 47.165209839357345
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 2
  episodes_total: 203
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.8213542699813843
          entropy_coeff: 0.0
          kl: 0.020204197615385056
          model: {}
          policy_loss: -0.024396749213337898
          total_loss: 2.1872122287750244
          vf_explained_var: 0.734703540802002
          vf_loss: 2.2075681686401367
        train: null
    num_agent_steps_sampled: 276000
    num_agent_steps_trained: 276000
    num_steps_sampled: 276000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,69,431.538,276000,47.1652,139.91,-96.7176,1585.16


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 280000
  custom_metrics: {}
  date: 2022-12-23_17-35-28
  done: false
  episode_len_mean: 1585.16
  episode_media: {}
  episode_reward_max: 139.90959892308334
  episode_reward_mean: 50.88908243934396
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 2
  episodes_total: 205
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.7097615003585815
          entropy_coeff: 0.0
          kl: 0.023235542699694633
          model: {}
          policy_loss: -0.028683258220553398
          total_loss: 1.8819326162338257
          vf_explained_var: 0.7688300013542175
          vf_loss: 1.9059687852859497
        train: null
    num_agent_steps_sampled: 280000
    num_agent_steps_trained: 280000
    num_steps_sampled: 280000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,70,437.727,280000,50.8891,139.91,-96.7176,1585.16


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,70,437.727,280000,50.8891,139.91,-96.7176,1585.16


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 284000
  custom_metrics: {}
  date: 2022-12-23_17-35-35
  done: false
  episode_len_mean: 1585.16
  episode_media: {}
  episode_reward_max: 139.90959892308334
  episode_reward_mean: 56.29127113622545
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 3
  episodes_total: 208
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.7135632038116455
          entropy_coeff: 0.0
          kl: 0.019921600818634033
          model: {}
          policy_loss: -0.022566331550478935
          total_loss: 3.0583314895629883
          vf_explained_var: 0.7637818455696106
          vf_loss: 3.076913595199585
        train: null
    num_agent_steps_sampled: 284000
    num_agent_steps_trained: 284000
    num_steps_sampled: 284000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,71,443.996,284000,56.2913,139.91,-96.7176,1585.16


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 288000
  custom_metrics: {}
  date: 2022-12-23_17-35-41
  done: false
  episode_len_mean: 1585.16
  episode_media: {}
  episode_reward_max: 146.44794694091448
  episode_reward_mean: 61.902993809823435
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 3
  episodes_total: 211
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.6942790746688843
          entropy_coeff: 0.0
          kl: 0.018128881230950356
          model: {}
          policy_loss: -0.0234367772936821
          total_loss: 3.5720713138580322
          vf_explained_var: 0.6732159852981567
          vf_loss: 3.5918824672698975
        train: null
    num_agent_steps_sampled: 288000
    num_agent_steps_trained: 288000
    num_steps_sampled: 288000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,72,450.253,288000,61.903,146.448,-96.7176,1585.16


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 292000
  custom_metrics: {}
  date: 2022-12-23_17-35-47
  done: false
  episode_len_mean: 1585.16
  episode_media: {}
  episode_reward_max: 146.44794694091448
  episode_reward_mean: 65.30519018404311
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 2
  episodes_total: 213
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.5715563297271729
          entropy_coeff: 0.0
          kl: 0.021096108481287956
          model: {}
          policy_loss: -0.024431124329566956
          total_loss: 2.1531825065612793
          vf_explained_var: 0.7597988843917847
          vf_loss: 2.173394203186035
        train: null
    num_agent_steps_sampled: 292000
    num_agent_steps_trained: 292000
    num_steps_sampled: 292000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,73,456.513,292000,65.3052,146.448,-96.7176,1585.16


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 296000
  custom_metrics: {}
  date: 2022-12-23_17-35-54
  done: false
  episode_len_mean: 1585.16
  episode_media: {}
  episode_reward_max: 146.44794694091448
  episode_reward_mean: 68.90268660191263
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 2
  episodes_total: 215
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.6189743280410767
          entropy_coeff: 0.0
          kl: 0.01882951706647873
          model: {}
          policy_loss: -0.019374078139662743
          total_loss: 2.8728411197662354
          vf_explained_var: 0.7222643494606018
          vf_loss: 2.88844895362854
        train: null
    num_agent_steps_sampled: 296000
    num_agent_steps_trained: 296000
    num_steps_sampled: 296000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,74,462.786,296000,68.9027,146.448,-96.7176,1585.16


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 300000
  custom_metrics: {}
  date: 2022-12-23_17-36-00
  done: false
  episode_len_mean: 1585.16
  episode_media: {}
  episode_reward_max: 147.5775085584526
  episode_reward_mean: 74.31297412108539
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 3
  episodes_total: 218
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.4674354791641235
          entropy_coeff: 0.0
          kl: 0.02410992980003357
          model: {}
          policy_loss: -0.02380327135324478
          total_loss: 3.2609033584594727
          vf_explained_var: 0.7715415358543396
          vf_loss: 3.2798848152160645
        train: null
    num_agent_steps_sampled: 300000
    num_agent_steps_trained: 300000
    num_steps_sampled: 300000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,75,469.048,300000,74.313,147.578,-96.7176,1585.16


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,75,469.048,300000,74.313,147.578,-96.7176,1585.16


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 304000
  custom_metrics: {}
  date: 2022-12-23_17-36-06
  done: false
  episode_len_mean: 1585.16
  episode_media: {}
  episode_reward_max: 147.5775085584526
  episode_reward_mean: 79.51966217386547
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 3
  episodes_total: 221
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.4236180782318115
          entropy_coeff: 0.0
          kl: 0.01943369396030903
          model: {}
          policy_loss: -0.020422637462615967
          total_loss: 2.4171853065490723
          vf_explained_var: 0.7138649821281433
          vf_loss: 2.4337210655212402
        train: null
    num_agent_steps_sampled: 304000
    num_agent_steps_trained: 304000
    num_steps_sampled: 304000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,76,475.32,304000,79.5197,147.578,-96.7176,1585.16


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 308000
  custom_metrics: {}
  date: 2022-12-23_17-36-12
  done: false
  episode_len_mean: 1577.76
  episode_media: {}
  episode_reward_max: 147.5775085584526
  episode_reward_mean: 82.56886887198323
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 3
  episodes_total: 224
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.4040957689285278
          entropy_coeff: 0.0
          kl: 0.019727036356925964
          model: {}
          policy_loss: -0.03082900680601597
          total_loss: 99.31715393066406
          vf_explained_var: 0.6811762452125549
          vf_loss: 99.34403228759766
        train: null
    num_agent_steps_sampled: 308000
    num_agent_steps_trained: 308000
    num_steps_sampled: 308000
    num_steps_tra

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,77,481.513,308000,82.5689,147.578,-96.7176,1577.76


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 312000
  custom_metrics: {}
  date: 2022-12-23_17-36-19
  done: false
  episode_len_mean: 1577.76
  episode_media: {}
  episode_reward_max: 153.36129082783216
  episode_reward_mean: 85.62501210035089
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 2
  episodes_total: 226
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.4444215297698975
          entropy_coeff: 0.0
          kl: 0.022354597225785255
          model: {}
          policy_loss: -0.025012420490384102
          total_loss: 2.617321491241455
          vf_explained_var: 0.6568194031715393
          vf_loss: 2.6378626823425293
        train: null
    num_agent_steps_sampled: 312000
    num_agent_steps_trained: 312000
    num_steps_sampled: 312000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,78,487.8,312000,85.625,153.361,-96.7176,1577.76


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 316000
  custom_metrics: {}
  date: 2022-12-23_17-36-25
  done: false
  episode_len_mean: 1577.76
  episode_media: {}
  episode_reward_max: 153.36129082783216
  episode_reward_mean: 88.4342381807944
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 2
  episodes_total: 228
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.350753903388977
          entropy_coeff: 0.0
          kl: 0.021944966167211533
          model: {}
          policy_loss: -0.02694406732916832
          total_loss: 1.8033301830291748
          vf_explained_var: 0.7911669611930847
          vf_loss: 1.8258851766586304
        train: null
    num_agent_steps_sampled: 316000
    num_agent_steps_trained: 316000
    num_steps_sampled: 316000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,79,494.062,316000,88.4342,153.361,-96.7176,1577.76


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 320000
  custom_metrics: {}
  date: 2022-12-23_17-36-31
  done: false
  episode_len_mean: 1577.76
  episode_media: {}
  episode_reward_max: 153.36129082783216
  episode_reward_mean: 92.52258279448994
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 3
  episodes_total: 231
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.3991602659225464
          entropy_coeff: 0.0
          kl: 0.018160449340939522
          model: {}
          policy_loss: -0.024305401369929314
          total_loss: 3.5538933277130127
          vf_explained_var: 0.7065319418907166
          vf_loss: 3.574566602706909
        train: null
    num_agent_steps_sampled: 320000
    num_agent_steps_trained: 320000
    num_steps_sampled: 320000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,80,500.381,320000,92.5226,153.361,-96.7176,1577.76


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,80,500.381,320000,92.5226,153.361,-96.7176,1577.76


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 324000
  custom_metrics: {}
  date: 2022-12-23_17-36-38
  done: false
  episode_len_mean: 1577.76
  episode_media: {}
  episode_reward_max: 153.36129082783216
  episode_reward_mean: 96.17501986360239
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 3
  episodes_total: 234
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.3791464567184448
          entropy_coeff: 0.0
          kl: 0.02081056497991085
          model: {}
          policy_loss: -0.024672552943229675
          total_loss: 4.321838855743408
          vf_explained_var: 0.7611939907073975
          vf_loss: 4.342349052429199
        train: null
    num_agent_steps_sampled: 324000
    num_agent_steps_trained: 324000
    num_steps_sampled: 324000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,81,506.648,324000,96.175,153.361,-96.7176,1577.76


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 328000
  custom_metrics: {}
  date: 2022-12-23_17-36-45
  done: false
  episode_len_mean: 1577.76
  episode_media: {}
  episode_reward_max: 153.36129082783216
  episode_reward_mean: 98.16308212386956
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 2
  episodes_total: 236
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.3699365854263306
          entropy_coeff: 0.0
          kl: 0.01970175839960575
          model: {}
          policy_loss: -0.020673198625445366
          total_loss: 2.919490337371826
          vf_explained_var: 0.730987012386322
          vf_loss: 2.936223268508911
        train: null
    num_agent_steps_sampled: 328000
    num_agent_steps_trained: 328000
    num_steps_sampled: 328000
    num_steps_tra

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,82,513.552,328000,98.1631,153.361,-96.7176,1577.76


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 332000
  custom_metrics: {}
  date: 2022-12-23_17-36-52
  done: false
  episode_len_mean: 1577.76
  episode_media: {}
  episode_reward_max: 153.36129082783216
  episode_reward_mean: 100.39432461367156
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 2
  episodes_total: 238
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.3472306728363037
          entropy_coeff: 0.0
          kl: 0.020647086203098297
          model: {}
          policy_loss: -0.02393530309200287
          total_loss: 2.4643235206604004
          vf_explained_var: 0.7962733507156372
          vf_loss: 2.4841291904449463
        train: null
    num_agent_steps_sampled: 332000
    num_agent_steps_trained: 332000
    num_steps_sampled: 332000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,83,520.717,332000,100.394,153.361,-96.7176,1577.76


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,83,520.717,332000,100.394,153.361,-96.7176,1577.76


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 336000
  custom_metrics: {}
  date: 2022-12-23_17-36-59
  done: false
  episode_len_mean: 1577.76
  episode_media: {}
  episode_reward_max: 153.36129082783216
  episode_reward_mean: 103.6683621019167
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 3
  episodes_total: 241
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.3977605104446411
          entropy_coeff: 0.0
          kl: 0.019256271421909332
          model: {}
          policy_loss: -0.027345949783921242
          total_loss: 2.164727210998535
          vf_explained_var: 0.7950323820114136
          vf_loss: 2.1882216930389404
        train: null
    num_agent_steps_sampled: 336000
    num_agent_steps_trained: 336000
    num_steps_sampled: 336000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,84,528.087,336000,103.668,153.361,-96.7176,1577.76


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 340000
  custom_metrics: {}
  date: 2022-12-23_17-37-07
  done: false
  episode_len_mean: 1577.76
  episode_media: {}
  episode_reward_max: 153.36129082783216
  episode_reward_mean: 106.6904737763409
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 3
  episodes_total: 244
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.3866848945617676
          entropy_coeff: 0.0
          kl: 0.022523801773786545
          model: {}
          policy_loss: -0.025492116808891296
          total_loss: 3.292675733566284
          vf_explained_var: 0.7744581699371338
          vf_loss: 3.3136630058288574
        train: null
    num_agent_steps_sampled: 340000
    num_agent_steps_trained: 340000
    num_steps_sampled: 340000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,85,536.373,340000,106.69,153.361,-96.7176,1577.76


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,85,536.373,340000,106.69,153.361,-96.7176,1577.76


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 344000
  custom_metrics: {}
  date: 2022-12-23_17-37-16
  done: false
  episode_len_mean: 1577.76
  episode_media: {}
  episode_reward_max: 153.36129082783216
  episode_reward_mean: 108.29812727187385
  episode_reward_min: -96.71755117153315
  episodes_this_iter: 2
  episodes_total: 246
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.4367494583129883
          entropy_coeff: 0.0
          kl: 0.01683288998901844
          model: {}
          policy_loss: -0.021808264777064323
          total_loss: 2.326220750808716
          vf_explained_var: 0.7359540462493896
          vf_loss: 2.3446621894836426
        train: null
    num_agent_steps_sampled: 344000
    num_agent_steps_trained: 344000
    num_steps_sampled: 344000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,86,544.565,344000,108.298,153.361,-96.7176,1577.76


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 348000
  custom_metrics: {}
  date: 2022-12-23_17-37-23
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 153.36129082783216
  episode_reward_mean: 111.57426377302149
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 2
  episodes_total: 248
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.3233288526535034
          entropy_coeff: 0.0
          kl: 0.019827691838145256
          model: {}
          policy_loss: -0.027133790776133537
          total_loss: 2.4900898933410645
          vf_explained_var: 0.7889662384986877
          vf_loss: 2.513258218765259
        train: null
    num_agent_steps_sampled: 348000
    num_agent_steps_trained: 348000
    num_steps_sampled: 348000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,87,552.139,348000,111.574,153.361,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,87,552.139,348000,111.574,153.361,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 352000
  custom_metrics: {}
  date: 2022-12-23_17-37-32
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 153.7515850224378
  episode_reward_mean: 114.24704078841587
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 3
  episodes_total: 251
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.3695552349090576
          entropy_coeff: 0.0
          kl: 0.021332886070013046
          model: {}
          policy_loss: -0.024914046749472618
          total_loss: 4.321210861206055
          vf_explained_var: 0.7350334525108337
          vf_loss: 4.341858386993408
        train: null
    num_agent_steps_sampled: 352000
    num_agent_steps_trained: 352000
    num_steps_sampled: 352000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,88,560.443,352000,114.247,153.752,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 356000
  custom_metrics: {}
  date: 2022-12-23_17-37-39
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 153.7515850224378
  episode_reward_mean: 117.08818344228996
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 3
  episodes_total: 254
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.3729214668273926
          entropy_coeff: 0.0
          kl: 0.018078170716762543
          model: {}
          policy_loss: -0.023826247081160545
          total_loss: 3.563232898712158
          vf_explained_var: 0.7560592293739319
          vf_loss: 3.5834436416625977
        train: null
    num_agent_steps_sampled: 356000
    num_agent_steps_trained: 356000
    num_steps_sampled: 356000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,89,568.02,356000,117.088,153.752,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,89,568.02,356000,117.088,153.752,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 360000
  custom_metrics: {}
  date: 2022-12-23_17-37-47
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 155.62657769973515
  episode_reward_mean: 118.99582648413642
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 2
  episodes_total: 256
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.3756389617919922
          entropy_coeff: 0.0
          kl: 0.021005798131227493
          model: {}
          policy_loss: -0.023990830406546593
          total_loss: 3.2325079441070557
          vf_explained_var: 0.7960026860237122
          vf_loss: 3.2522976398468018
        train: null
    num_agent_steps_sampled: 360000
    num_agent_steps_trained: 360000
    num_steps_sampled: 360000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,90,576.197,360000,118.996,155.627,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 364000
  custom_metrics: {}
  date: 2022-12-23_17-37-55
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 155.62657769973515
  episode_reward_mean: 120.68293566715529
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 2
  episodes_total: 258
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.3116084337234497
          entropy_coeff: 0.0
          kl: 0.022327661514282227
          model: {}
          policy_loss: -0.022050932049751282
          total_loss: 2.945211410522461
          vf_explained_var: 0.8262830376625061
          vf_loss: 2.962797164916992
        train: null
    num_agent_steps_sampled: 364000
    num_agent_steps_trained: 364000
    num_steps_sampled: 364000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,91,583.787,364000,120.683,155.627,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,91,583.787,364000,120.683,155.627,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 368000
  custom_metrics: {}
  date: 2022-12-23_17-38-03
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 155.62657769973515
  episode_reward_mean: 123.35554261483732
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 3
  episodes_total: 261
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.270633339881897
          entropy_coeff: 0.0
          kl: 0.02010677382349968
          model: {}
          policy_loss: -0.023874491453170776
          total_loss: 3.1358304023742676
          vf_explained_var: 0.7926644682884216
          vf_loss: 3.1556835174560547
        train: null
    num_agent_steps_sampled: 368000
    num_agent_steps_trained: 368000
    num_steps_sampled: 368000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,92,591.918,368000,123.356,155.627,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 372000
  custom_metrics: {}
  date: 2022-12-23_17-38-11
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 155.62657769973515
  episode_reward_mean: 125.67389370726707
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 3
  episodes_total: 264
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.2318545579910278
          entropy_coeff: 0.0
          kl: 0.02058662474155426
          model: {}
          policy_loss: -0.023503977805376053
          total_loss: 4.637133598327637
          vf_explained_var: 0.7566186189651489
          vf_loss: 4.656520843505859
        train: null
    num_agent_steps_sampled: 372000
    num_agent_steps_trained: 372000
    num_steps_sampled: 372000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,93,599.562,372000,125.674,155.627,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,93,599.562,372000,125.674,155.627,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 376000
  custom_metrics: {}
  date: 2022-12-23_17-38-19
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 155.62657769973515
  episode_reward_mean: 127.31254201456981
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 2
  episodes_total: 266
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.2615786790847778
          entropy_coeff: 0.0
          kl: 0.019166110083460808
          model: {}
          policy_loss: -0.02306693233549595
          total_loss: 3.7450907230377197
          vf_explained_var: 0.7761012315750122
          vf_loss: 3.764324426651001
        train: null
    num_agent_steps_sampled: 376000
    num_agent_steps_trained: 376000
    num_steps_sampled: 376000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,94,607.631,376000,127.313,155.627,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 380000
  custom_metrics: {}
  date: 2022-12-23_17-38-27
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 155.62657769973515
  episode_reward_mean: 128.36415823091195
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 2
  episodes_total: 268
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.168962001800537
          entropy_coeff: 0.0
          kl: 0.019138526171445847
          model: {}
          policy_loss: -0.020162787288427353
          total_loss: 2.5402209758758545
          vf_explained_var: 0.7894414067268372
          vf_loss: 2.556555986404419
        train: null
    num_agent_steps_sampled: 380000
    num_agent_steps_trained: 380000
    num_steps_sampled: 380000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,95,615.329,380000,128.364,155.627,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,95,615.329,380000,128.364,155.627,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 384000
  custom_metrics: {}
  date: 2022-12-23_17-38-35
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 165.31675665614569
  episode_reward_mean: 130.7380520477591
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 3
  episodes_total: 271
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.1845260858535767
          entropy_coeff: 0.0
          kl: 0.021208051592111588
          model: {}
          policy_loss: -0.023954784497618675
          total_loss: 3.0957534313201904
          vf_explained_var: 0.7849385142326355
          vf_loss: 3.115467071533203
        train: null
    num_agent_steps_sampled: 384000
    num_agent_steps_trained: 384000
    num_steps_sampled: 384000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,96,623.728,384000,130.738,165.317,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 388000
  custom_metrics: {}
  date: 2022-12-23_17-38-43
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 165.31675665614569
  episode_reward_mean: 132.62327257312504
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 3
  episodes_total: 274
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.2083194255828857
          entropy_coeff: 0.0
          kl: 0.021694425493478775
          model: {}
          policy_loss: -0.01960534043610096
          total_loss: 4.451814651489258
          vf_explained_var: 0.7487953901290894
          vf_loss: 4.467080593109131
        train: null
    num_agent_steps_sampled: 388000
    num_agent_steps_trained: 388000
    num_steps_sampled: 388000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,97,631.321,388000,132.623,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,97,631.321,388000,132.623,165.317,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 392000
  custom_metrics: {}
  date: 2022-12-23_17-38-51
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 165.31675665614569
  episode_reward_mean: 134.03289484391777
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 2
  episodes_total: 276
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.2029165029525757
          entropy_coeff: 0.0
          kl: 0.02198474109172821
          model: {}
          policy_loss: -0.02438454143702984
          total_loss: 2.8891921043395996
          vf_explained_var: 0.7424609065055847
          vf_loss: 2.9091796875
        train: null
    num_agent_steps_sampled: 392000
    num_agent_steps_trained: 392000
    num_steps_sampled: 392000
    num_steps_trained

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,98,639.461,392000,134.033,165.317,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 396000
  custom_metrics: {}
  date: 2022-12-23_17-38-58
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 165.31675665614569
  episode_reward_mean: 135.12990210404448
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 2
  episodes_total: 278
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.194416880607605
          entropy_coeff: 0.0
          kl: 0.023148750886321068
          model: {}
          policy_loss: -0.02270408906042576
          total_loss: 3.00585675239563
          vf_explained_var: 0.7824265360832214
          vf_loss: 3.0239310264587402
        train: null
    num_agent_steps_sampled: 396000
    num_agent_steps_trained: 396000
    num_steps_sampled: 396000
    num_steps_tra

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,99,646.964,396000,135.13,165.317,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 400000
  custom_metrics: {}
  date: 2022-12-23_17-43-34
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 165.31675665614569
  episode_reward_mean: 136.99515304034477
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 3
  episodes_total: 281
  evaluation:
    custom_metrics: {}
    episode_len_mean: 1564.59
    episode_media: {}
    episode_reward_max: 177.08494279565102
    episode_reward_mean: 151.6827130599735
    episode_reward_min: -81.63790952512372
    episodes_this_iter: 100
    hist_stats:
      episode_lengths:
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      - 1600
      -

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,100,922.63,400000,136.995,165.317,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 404000
  custom_metrics: {}
  date: 2022-12-23_17-43-41
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 167.1156446566322
  episode_reward_mean: 138.7016051483566
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 3
  episodes_total: 284
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.2115111351013184
          entropy_coeff: 0.0
          kl: 0.022446125745773315
          model: {}
          policy_loss: -0.027156470343470573
          total_loss: 3.898442506790161
          vf_explained_var: 0.747413694858551
          vf_loss: 3.921109676361084
        train: null
    num_agent_steps_sampled: 404000
    num_agent_steps_trained: 404000
    num_steps_sampled: 404000
    num_steps_trai

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,101,928.921,404000,138.702,167.116,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,101,928.921,404000,138.702,167.116,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 408000
  custom_metrics: {}
  date: 2022-12-23_17-43-47
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 167.1156446566322
  episode_reward_mean: 139.38449779667155
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 2
  episodes_total: 286
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.1165621280670166
          entropy_coeff: 0.0
          kl: 0.020652014762163162
          model: {}
          policy_loss: -0.026598898693919182
          total_loss: 4.07981014251709
          vf_explained_var: 0.8019489049911499
          vf_loss: 4.102278709411621
        train: null
    num_agent_steps_sampled: 408000
    num_agent_steps_trained: 408000
    num_steps_sampled: 408000
    num_steps_tra

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,102,935.566,408000,139.384,167.116,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 412000
  custom_metrics: {}
  date: 2022-12-23_17-43-54
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 167.85608853499915
  episode_reward_mean: 140.36763652350692
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 2
  episodes_total: 288
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.1163411140441895
          entropy_coeff: 0.0
          kl: 0.01949864812195301
          model: {}
          policy_loss: -0.02145601622760296
          total_loss: 2.530954122543335
          vf_explained_var: 0.8255624175071716
          vf_loss: 2.5485105514526367
        train: null
    num_agent_steps_sampled: 412000
    num_agent_steps_trained: 412000
    num_steps_sampled: 412000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,103,941.914,412000,140.368,167.856,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 416000
  custom_metrics: {}
  date: 2022-12-23_17-44-00
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 169.6027647026398
  episode_reward_mean: 141.91821046443857
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 3
  episodes_total: 291
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.1963956356048584
          entropy_coeff: 0.0
          kl: 0.020615125074982643
          model: {}
          policy_loss: -0.025517892092466354
          total_loss: 3.535935878753662
          vf_explained_var: 0.7365742921829224
          vf_loss: 3.55733060836792
        train: null
    num_agent_steps_sampled: 416000
    num_agent_steps_trained: 416000
    num_steps_sampled: 416000
    num_steps_tra

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,104,948.13,416000,141.918,169.603,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 420000
  custom_metrics: {}
  date: 2022-12-23_17-44-06
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 170.75916797824772
  episode_reward_mean: 143.35606630700636
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 3
  episodes_total: 294
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.23232102394104
          entropy_coeff: 0.0
          kl: 0.021366065368056297
          model: {}
          policy_loss: -0.027586355805397034
          total_loss: 4.0263166427612305
          vf_explained_var: 0.7089212536811829
          vf_loss: 4.0496296882629395
        train: null
    num_agent_steps_sampled: 420000
    num_agent_steps_trained: 420000
    num_steps_sampled: 420000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,105,954.376,420000,143.356,170.759,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 424000
  custom_metrics: {}
  date: 2022-12-23_17-44-12
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 170.75916797824772
  episode_reward_mean: 144.34756540276757
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 2
  episodes_total: 296
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.1035417318344116
          entropy_coeff: 0.0
          kl: 0.02170182764530182
          model: {}
          policy_loss: -0.02481427788734436
          total_loss: 2.2184224128723145
          vf_explained_var: 0.8036614656448364
          vf_loss: 2.238896608352661
        train: null
    num_agent_steps_sampled: 424000
    num_agent_steps_trained: 424000
    num_steps_sampled: 424000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,106,960.651,424000,144.348,170.759,-23.2592,1592.6


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,106,960.651,424000,144.348,170.759,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 428000
  custom_metrics: {}
  date: 2022-12-23_17-44-19
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 170.75916797824772
  episode_reward_mean: 144.98443563669466
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 2
  episodes_total: 298
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.0951378345489502
          entropy_coeff: 0.0
          kl: 0.02024080418050289
          model: {}
          policy_loss: -0.025222396478056908
          total_loss: 3.0589122772216797
          vf_explained_var: 0.7922682166099548
          vf_loss: 3.0800864696502686
        train: null
    num_agent_steps_sampled: 428000
    num_agent_steps_trained: 428000
    num_steps_sampled: 428000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,107,966.898,428000,144.984,170.759,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 432000
  custom_metrics: {}
  date: 2022-12-23_17-44-25
  done: false
  episode_len_mean: 1592.6
  episode_media: {}
  episode_reward_max: 170.75916797824772
  episode_reward_mean: 146.18773085577826
  episode_reward_min: -23.25915887827422
  episodes_this_iter: 3
  episodes_total: 301
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.1457833051681519
          entropy_coeff: 0.0
          kl: 0.023061752319335938
          model: {}
          policy_loss: -0.027643360197544098
          total_loss: 3.659602642059326
          vf_explained_var: 0.7275583744049072
          vf_loss: 3.682633399963379
        train: null
    num_agent_steps_sampled: 432000
    num_agent_steps_trained: 432000
    num_steps_sampled: 432000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,108,973.135,432000,146.188,170.759,-23.2592,1592.6


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 436000
  custom_metrics: {}
  date: 2022-12-23_17-44-31
  done: false
  episode_len_mean: 1582.78
  episode_media: {}
  episode_reward_max: 170.75916797824772
  episode_reward_mean: 145.6774394291549
  episode_reward_min: -40.69877418219788
  episodes_this_iter: 4
  episodes_total: 305
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.1541049480438232
          entropy_coeff: 0.0
          kl: 0.01772129535675049
          model: {}
          policy_loss: -0.03146097809076309
          total_loss: 114.98363494873047
          vf_explained_var: 0.6160273551940918
          vf_loss: 115.01155090332031
        train: null
    num_agent_steps_sampled: 436000
    num_agent_steps_trained: 436000
    num_steps_sampled: 436000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,109,979.329,436000,145.677,170.759,-40.6988,1582.78


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 440000
  custom_metrics: {}
  date: 2022-12-23_17-44-37
  done: false
  episode_len_mean: 1582.78
  episode_media: {}
  episode_reward_max: 170.75916797824772
  episode_reward_mean: 146.07965254063345
  episode_reward_min: -40.69877418219788
  episodes_this_iter: 2
  episodes_total: 307
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.186484694480896
          entropy_coeff: 0.0
          kl: 0.023413734510540962
          model: {}
          policy_loss: -0.02492360770702362
          total_loss: 2.538810968399048
          vf_explained_var: 0.6059569120407104
          vf_loss: 2.559051752090454
        train: null
    num_agent_steps_sampled: 440000
    num_agent_steps_trained: 440000
    num_steps_sampled: 440000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,110,985.721,440000,146.08,170.759,-40.6988,1582.78


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 444000
  custom_metrics: {}
  date: 2022-12-23_17-44-44
  done: false
  episode_len_mean: 1582.78
  episode_media: {}
  episode_reward_max: 170.75916797824772
  episode_reward_mean: 146.78379002423978
  episode_reward_min: -40.69877418219788
  episodes_this_iter: 2
  episodes_total: 309
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.1778619289398193
          entropy_coeff: 0.0
          kl: 0.02176639623939991
          model: {}
          policy_loss: -0.029138732701539993
          total_loss: 3.6291630268096924
          vf_explained_var: 0.7502771019935608
          vf_loss: 3.6539487838745117
        train: null
    num_agent_steps_sampled: 444000
    num_agent_steps_trained: 444000
    num_steps_sampled: 444000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,111,992.005,444000,146.784,170.759,-40.6988,1582.78


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,111,992.005,444000,146.784,170.759,-40.6988,1582.78


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 448000
  custom_metrics: {}
  date: 2022-12-23_17-44-50
  done: false
  episode_len_mean: 1582.78
  episode_media: {}
  episode_reward_max: 170.75916797824772
  episode_reward_mean: 147.34300151162267
  episode_reward_min: -40.69877418219788
  episodes_this_iter: 2
  episodes_total: 311
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.1291580200195312
          entropy_coeff: 0.0
          kl: 0.022594213485717773
          model: {}
          policy_loss: -0.02546831965446472
          total_loss: 3.555968761444092
          vf_explained_var: 0.7942007184028625
          vf_loss: 3.576918363571167
        train: null
    num_agent_steps_sampled: 448000
    num_agent_steps_trained: 448000
    num_steps_sampled: 448000
    num_steps_t

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,112,998.277,448000,147.343,170.759,-40.6988,1582.78


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 452000
  custom_metrics: {}
  date: 2022-12-23_17-44-56
  done: false
  episode_len_mean: 1582.78
  episode_media: {}
  episode_reward_max: 170.75916797824772
  episode_reward_mean: 148.3552731192055
  episode_reward_min: -40.69877418219788
  episodes_this_iter: 4
  episodes_total: 315
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.189015507698059
          entropy_coeff: 0.0
          kl: 0.021781712770462036
          model: {}
          policy_loss: -0.044324345886707306
          total_loss: 4.983988285064697
          vf_explained_var: 0.7012355327606201
          vf_loss: 5.023955345153809
        train: null
    num_agent_steps_sampled: 452000
    num_agent_steps_trained: 452000
    num_steps_sampled: 452000
    num_steps_tr

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,113,1004.46,452000,148.355,170.759,-40.6988,1582.78


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 456000
  custom_metrics: {}
  date: 2022-12-23_17-45-03
  done: false
  episode_len_mean: 1582.78
  episode_media: {}
  episode_reward_max: 174.62091577386317
  episode_reward_mean: 148.89910697797868
  episode_reward_min: -40.69877418219788
  episodes_this_iter: 2
  episodes_total: 317
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.1468409299850464
          entropy_coeff: 0.0
          kl: 0.022562937811017036
          model: {}
          policy_loss: -0.03289782255887985
          total_loss: 2.770758628845215
          vf_explained_var: 0.7525264620780945
          vf_loss: 2.7991440296173096
        train: null
    num_agent_steps_sampled: 456000
    num_agent_steps_trained: 456000
    num_steps_sampled: 456000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,114,1010.7,456000,148.899,174.621,-40.6988,1582.78


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 460000
  custom_metrics: {}
  date: 2022-12-23_17-45-09
  done: false
  episode_len_mean: 1582.78
  episode_media: {}
  episode_reward_max: 174.8014450286627
  episode_reward_mean: 149.37809416413458
  episode_reward_min: -40.69877418219788
  episodes_this_iter: 2
  episodes_total: 319
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.1646305322647095
          entropy_coeff: 0.0
          kl: 0.020159201696515083
          model: {}
          policy_loss: -0.026339324191212654
          total_loss: 2.694669246673584
          vf_explained_var: 0.8167151212692261
          vf_loss: 2.7169766426086426
        train: null
    num_agent_steps_sampled: 460000
    num_agent_steps_trained: 460000
    num_steps_sampled: 460000
    num_steps_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,115,1016.92,460000,149.378,174.801,-40.6988,1582.78


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 464000
  custom_metrics: {}
  date: 2022-12-23_17-45-15
  done: false
  episode_len_mean: 1582.78
  episode_media: {}
  episode_reward_max: 180.52200362931202
  episode_reward_mean: 149.9950180276963
  episode_reward_min: -40.69877418219788
  episodes_this_iter: 2
  episodes_total: 321
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.1046898365020752
          entropy_coeff: 0.0
          kl: 0.022648053243756294
          model: {}
          policy_loss: -0.022482506930828094
          total_loss: 2.7533960342407227
          vf_explained_var: 0.8047184944152832
          vf_loss: 2.7713489532470703
        train: null
    num_agent_steps_sampled: 464000
    num_agent_steps_trained: 464000
    num_steps_sampled: 464000
    num_steps

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,116,1023.18,464000,149.995,180.522,-40.6988,1582.78


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,116,1023.18,464000,149.995,180.522,-40.6988,1582.78


Result for PPO_BipedalWalker-v3_25b72_00000:
  agent_timesteps_total: 468000
  custom_metrics: {}
  date: 2022-12-23_17-45-21
  done: false
  episode_len_mean: 1590.18
  episode_media: {}
  episode_reward_max: 180.52200362931202
  episode_reward_mean: 152.88540523813523
  episode_reward_min: -40.69877418219788
  episodes_this_iter: 4
  episodes_total: 325
  experiment_id: 80c501b76ccb4ec99d8dc8072c7f6825
  hostname: dl
  info:
    learner:
      default_policy:
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 1.1084564924240112
          entropy_coeff: 0.0
          kl: 0.01912125013768673
          model: {}
          policy_loss: -0.03804410621523857
          total_loss: 4.275332450866699
          vf_explained_var: 0.7723701596260071
          vf_loss: 4.30955171585083
        train: null
    num_agent_steps_sampled: 468000
    num_agent_steps_trained: 468000
    num_steps_sampled: 468000
    num_steps_tra



Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_BipedalWalker-v3_25b72_00000,RUNNING,192.168.0.98:158805,117,1029.45,468000,152.885,180.522,-40.6988,1590.18


2022-12-23 17:45:26,121	ERROR tune.py:635 -- Trials did not complete: [PPO_BipedalWalker-v3_25b72_00000]
2022-12-23 17:45:26,121	INFO tune.py:639 -- Total run time: 1046.11 seconds (1045.89 seconds for the tuning loop).


<ray.tune.analysis.experiment_analysis.ExperimentAnalysis at 0x7f35f90ad490>

If the experiment is running and producing the expected output (like we saw in `CartPole-v1`), then well done! The robot has started learning. 

How can we see if it's learning? You can move on to the next video, where we will discuss how to visualize the results from a running experiment in real time.