## Duckietown Import

##### In order to not get conflitcts in the project files we recommend you use this notebook in a different directory where you can safely clone the duckietown repository

In [1]:
import os

In [15]:
if not os.path.isdir('gym-duckietown') and not os.path.isdir('../gym-duckietown'):
    branch = "master"
    !git clone --branch {branch} https://github.com/duckietown/gym-duckietown.git
    !pip install -e gym-duckietown

In [2]:
if "/gym-duckietown" not in os.getcwd():
    os.chdir('gym-duckietown')

## Initialize environment

#### Environment Informations:
- action space: We used the original action space provided by the Duckietown environment. This is a Box type space with two-tuple of wheel torques, each in the range [-1, 1].
- observation space: The original Duckietown observation space is a 640x480x3 RGB image. We modified it with the help of some wrappers and the new observation space is shaped as 120x160x1. This means that the current image is grey and some worthless information - such as the background - has been clipped.
- reward: We also use some wrappers to modify the rewards. Our aim was to make the agent drive as far as possible without leaving the track. So we reward the agent if he stays far away from the edge of the track and if he made substantial distance from the position it was in the last frame.

In [3]:
import gym
import numpy as np
import torch as th
import matplotlib.pyplot as plt
import gym_duckietown
from env import launch_env

from stable_baselines3 import PPO
from stable_baselines3.common.callbacks import CheckpointCallback

In [4]:
env = launch_env()

In [1]:
%load_ext tensorboard
tb_logger = "..\\log\\" #TODO

#### Model Informations:
PPO (Proximal Policy Optimization) is a policy gradient method for reinforcement learning developed by OpenAI. The PPO algorithm's aim is to improve the agent's training stability by avoiding too large policy updates. To do that, it uses a ratio that will indicate the difference between the current and old policy and clip this ratio from a specific range [1−ϵ, 1+ϵ].

In [6]:
ppo_model = PPO(policy='MlpPolicy',
                env=env,
                learning_rate=5.e-5,
                gamma=0.99,
                tensorboard_log=tb_logger,
                verbose=1)

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.


## Learning

In [13]:
checkpoint_callback = CheckpointCallback(
  save_freq=10000,
  save_path="..\\saves\\", #TODO
  name_prefix="ppo_model",)

In [14]:
ppo_model.learn(total_timesteps=2000000, callback=checkpoint_callback)

Logging to ..\log\PPO_4
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 73.6     |
|    ep_rew_mean     | 49.9     |
| time/              |          |
|    fps             | 49       |
|    iterations      | 1        |
|    time_elapsed    | 41       |
|    total_timesteps | 2048     |
---------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 97.4       |
|    ep_rew_mean          | 46.7       |
| time/                   |            |
|    fps                  | 44         |
|    iterations           | 2          |
|    time_elapsed         | 92         |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.00841886 |
|    clip_fraction        | 0.0678     |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.82      |
|    explained_variance   | 3.15e-05   |
|    learning_rate   

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 78.7        |
|    ep_rew_mean          | 59.3        |
| time/                   |             |
|    fps                  | 40          |
|    iterations           | 11          |
|    time_elapsed         | 551         |
|    total_timesteps      | 22528       |
| train/                  |             |
|    approx_kl            | 0.009965686 |
|    clip_fraction        | 0.111       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.79       |
|    explained_variance   | -0.000611   |
|    learning_rate        | 5e-05       |
|    loss                 | 26.3        |
|    n_updates            | 140         |
|    policy_gradient_loss | -0.00449    |
|    std                  | 0.977       |
|    value_loss           | 73.3        |
-----------------------------------------
-----------------------------------------
| rollout/                |       

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 113          |
|    ep_rew_mean          | 95.1         |
| time/                   |              |
|    fps                  | 40           |
|    iterations           | 20           |
|    time_elapsed         | 1013         |
|    total_timesteps      | 40960        |
| train/                  |              |
|    approx_kl            | 0.0062146224 |
|    clip_fraction        | 0.0771       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.76        |
|    explained_variance   | 0.000518     |
|    learning_rate        | 5e-05        |
|    loss                 | 42.3         |
|    n_updates            | 230          |
|    policy_gradient_loss | -0.00508     |
|    std                  | 0.962        |
|    value_loss           | 131          |
------------------------------------------
-----------------------------------------
| rollout/  

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 160        |
|    ep_rew_mean          | 124        |
| time/                   |            |
|    fps                  | 40         |
|    iterations           | 29         |
|    time_elapsed         | 1473       |
|    total_timesteps      | 59392      |
| train/                  |            |
|    approx_kl            | 0.00854611 |
|    clip_fraction        | 0.0835     |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.72      |
|    explained_variance   | 0.00725    |
|    learning_rate        | 5e-05      |
|    loss                 | 83.2       |
|    n_updates            | 320        |
|    policy_gradient_loss | -0.0114    |
|    std                  | 0.942      |
|    value_loss           | 163        |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_me

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 152         |
|    ep_rew_mean          | 179         |
| time/                   |             |
|    fps                  | 40          |
|    iterations           | 38          |
|    time_elapsed         | 1926        |
|    total_timesteps      | 77824       |
| train/                  |             |
|    approx_kl            | 0.018366337 |
|    clip_fraction        | 0.298       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.69       |
|    explained_variance   | 4.33e-05    |
|    learning_rate        | 5e-05       |
|    loss                 | 117         |
|    n_updates            | 410         |
|    policy_gradient_loss | 0.0171      |
|    std                  | 0.927       |
|    value_loss           | 271         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 153         |
|    ep_rew_mean          | 205         |
| time/                   |             |
|    fps                  | 40          |
|    iterations           | 47          |
|    time_elapsed         | 2383        |
|    total_timesteps      | 96256       |
| train/                  |             |
|    approx_kl            | 0.026922513 |
|    clip_fraction        | 0.119       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.67       |
|    explained_variance   | 0.0103      |
|    learning_rate        | 5e-05       |
|    loss                 | 82.5        |
|    n_updates            | 500         |
|    policy_gradient_loss | -0.00958    |
|    std                  | 0.92        |
|    value_loss           | 180         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 186         |
|    ep_rew_mean          | 262         |
| time/                   |             |
|    fps                  | 40          |
|    iterations           | 56          |
|    time_elapsed         | 2840        |
|    total_timesteps      | 114688      |
| train/                  |             |
|    approx_kl            | 0.015660046 |
|    clip_fraction        | 0.25        |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.64       |
|    explained_variance   | 0.00132     |
|    learning_rate        | 5e-05       |
|    loss                 | 148         |
|    n_updates            | 590         |
|    policy_gradient_loss | 0.00276     |
|    std                  | 0.903       |
|    value_loss           | 265         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 153        |
|    ep_rew_mean          | 243        |
| time/                   |            |
|    fps                  | 40         |
|    iterations           | 65         |
|    time_elapsed         | 3295       |
|    total_timesteps      | 133120     |
| train/                  |            |
|    approx_kl            | 0.01430521 |
|    clip_fraction        | 0.172      |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.61      |
|    explained_variance   | 0.000685   |
|    learning_rate        | 5e-05      |
|    loss                 | 404        |
|    n_updates            | 680        |
|    policy_gradient_loss | -0.0109    |
|    std                  | 0.892      |
|    value_loss           | 631        |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_me

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 107          |
|    ep_rew_mean          | 177          |
| time/                   |              |
|    fps                  | 40           |
|    iterations           | 74           |
|    time_elapsed         | 3752         |
|    total_timesteps      | 151552       |
| train/                  |              |
|    approx_kl            | 0.0052669924 |
|    clip_fraction        | 0.0555       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.61        |
|    explained_variance   | 0.0882       |
|    learning_rate        | 5e-05        |
|    loss                 | 170          |
|    n_updates            | 770          |
|    policy_gradient_loss | -0.0117      |
|    std                  | 0.892        |
|    value_loss           | 447          |
------------------------------------------
-----------------------------------------
| rollout/  

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 159          |
|    ep_rew_mean          | 273          |
| time/                   |              |
|    fps                  | 40           |
|    iterations           | 83           |
|    time_elapsed         | 4206         |
|    total_timesteps      | 169984       |
| train/                  |              |
|    approx_kl            | 0.0076771574 |
|    clip_fraction        | 0.0883       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.6         |
|    explained_variance   | 0.0391       |
|    learning_rate        | 5e-05        |
|    loss                 | 233          |
|    n_updates            | 860          |
|    policy_gradient_loss | -0.0118      |
|    std                  | 0.886        |
|    value_loss           | 516          |
------------------------------------------
-----------------------------------------
| rollout/  

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 180        |
|    ep_rew_mean          | 324        |
| time/                   |            |
|    fps                  | 40         |
|    iterations           | 92         |
|    time_elapsed         | 4664       |
|    total_timesteps      | 188416     |
| train/                  |            |
|    approx_kl            | 0.00975789 |
|    clip_fraction        | 0.127      |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.58      |
|    explained_variance   | 0.00992    |
|    learning_rate        | 5e-05      |
|    loss                 | 238        |
|    n_updates            | 950        |
|    policy_gradient_loss | -0.0118    |
|    std                  | 0.877      |
|    value_loss           | 466        |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_me

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 181         |
|    ep_rew_mean          | 320         |
| time/                   |             |
|    fps                  | 40          |
|    iterations           | 101         |
|    time_elapsed         | 5120        |
|    total_timesteps      | 206848      |
| train/                  |             |
|    approx_kl            | 0.012391649 |
|    clip_fraction        | 0.133       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.55       |
|    explained_variance   | 0.0205      |
|    learning_rate        | 5e-05       |
|    loss                 | 286         |
|    n_updates            | 1040        |
|    policy_gradient_loss | -0.0138     |
|    std                  | 0.865       |
|    value_loss           | 662         |
-----------------------------------------
------------------------------------------
| rollout/                |      

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 191         |
|    ep_rew_mean          | 353         |
| time/                   |             |
|    fps                  | 40          |
|    iterations           | 110         |
|    time_elapsed         | 5574        |
|    total_timesteps      | 225280      |
| train/                  |             |
|    approx_kl            | 0.008634215 |
|    clip_fraction        | 0.103       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.53       |
|    explained_variance   | 0.00474     |
|    learning_rate        | 5e-05       |
|    loss                 | 245         |
|    n_updates            | 1130        |
|    policy_gradient_loss | -0.0142     |
|    std                  | 0.858       |
|    value_loss           | 719         |
-----------------------------------------
------------------------------------------
| rollout/                |      

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 202         |
|    ep_rew_mean          | 379         |
| time/                   |             |
|    fps                  | 40          |
|    iterations           | 119         |
|    time_elapsed         | 6025        |
|    total_timesteps      | 243712      |
| train/                  |             |
|    approx_kl            | 0.012177844 |
|    clip_fraction        | 0.102       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.51       |
|    explained_variance   | 0.0708      |
|    learning_rate        | 5e-05       |
|    loss                 | 721         |
|    n_updates            | 1220        |
|    policy_gradient_loss | -0.0125     |
|    std                  | 0.851       |
|    value_loss           | 746         |
-----------------------------------------
----------------------------------------
| rollout/                |        

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 169         |
|    ep_rew_mean          | 334         |
| time/                   |             |
|    fps                  | 39          |
|    iterations           | 128         |
|    time_elapsed         | 6667        |
|    total_timesteps      | 262144      |
| train/                  |             |
|    approx_kl            | 0.019241031 |
|    clip_fraction        | 0.171       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.49       |
|    explained_variance   | 0.0435      |
|    learning_rate        | 5e-05       |
|    loss                 | 239         |
|    n_updates            | 1310        |
|    policy_gradient_loss | -0.0145     |
|    std                  | 0.84        |
|    value_loss           | 452         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 189         |
|    ep_rew_mean          | 360         |
| time/                   |             |
|    fps                  | 38          |
|    iterations           | 137         |
|    time_elapsed         | 7297        |
|    total_timesteps      | 280576      |
| train/                  |             |
|    approx_kl            | 0.011799414 |
|    clip_fraction        | 0.129       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.46       |
|    explained_variance   | 0.0621      |
|    learning_rate        | 5e-05       |
|    loss                 | 294         |
|    n_updates            | 1400        |
|    policy_gradient_loss | -0.0153     |
|    std                  | 0.829       |
|    value_loss           | 688         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 175         |
|    ep_rew_mean          | 372         |
| time/                   |             |
|    fps                  | 37          |
|    iterations           | 146         |
|    time_elapsed         | 7942        |
|    total_timesteps      | 299008      |
| train/                  |             |
|    approx_kl            | 0.009420915 |
|    clip_fraction        | 0.112       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.45       |
|    explained_variance   | 0.0424      |
|    learning_rate        | 5e-05       |
|    loss                 | 439         |
|    n_updates            | 1490        |
|    policy_gradient_loss | -0.0157     |
|    std                  | 0.824       |
|    value_loss           | 1.13e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 186         |
|    ep_rew_mean          | 369         |
| time/                   |             |
|    fps                  | 36          |
|    iterations           | 155         |
|    time_elapsed         | 8583        |
|    total_timesteps      | 317440      |
| train/                  |             |
|    approx_kl            | 0.010959344 |
|    clip_fraction        | 0.107       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.44       |
|    explained_variance   | 0.0567      |
|    learning_rate        | 5e-05       |
|    loss                 | 300         |
|    n_updates            | 1580        |
|    policy_gradient_loss | -0.0127     |
|    std                  | 0.821       |
|    value_loss           | 807         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 192         |
|    ep_rew_mean          | 408         |
| time/                   |             |
|    fps                  | 36          |
|    iterations           | 164         |
|    time_elapsed         | 9229        |
|    total_timesteps      | 335872      |
| train/                  |             |
|    approx_kl            | 0.011572201 |
|    clip_fraction        | 0.129       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.42       |
|    explained_variance   | 0.0639      |
|    learning_rate        | 5e-05       |
|    loss                 | 365         |
|    n_updates            | 1670        |
|    policy_gradient_loss | -0.0134     |
|    std                  | 0.811       |
|    value_loss           | 758         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 209         |
|    ep_rew_mean          | 417         |
| time/                   |             |
|    fps                  | 35          |
|    iterations           | 173         |
|    time_elapsed         | 9873        |
|    total_timesteps      | 354304      |
| train/                  |             |
|    approx_kl            | 0.011625752 |
|    clip_fraction        | 0.117       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.4        |
|    explained_variance   | 0.0804      |
|    learning_rate        | 5e-05       |
|    loss                 | 116         |
|    n_updates            | 1760        |
|    policy_gradient_loss | -0.0127     |
|    std                  | 0.803       |
|    value_loss           | 688         |
-----------------------------------------
----------------------------------------
| rollout/                |        

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 205         |
|    ep_rew_mean          | 418         |
| time/                   |             |
|    fps                  | 35          |
|    iterations           | 182         |
|    time_elapsed         | 10494       |
|    total_timesteps      | 372736      |
| train/                  |             |
|    approx_kl            | 0.023580238 |
|    clip_fraction        | 0.18        |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.38       |
|    explained_variance   | 0.03        |
|    learning_rate        | 5e-05       |
|    loss                 | 312         |
|    n_updates            | 1850        |
|    policy_gradient_loss | -0.0118     |
|    std                  | 0.796       |
|    value_loss           | 727         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 208         |
|    ep_rew_mean          | 409         |
| time/                   |             |
|    fps                  | 35          |
|    iterations           | 191         |
|    time_elapsed         | 11116       |
|    total_timesteps      | 391168      |
| train/                  |             |
|    approx_kl            | 0.011923961 |
|    clip_fraction        | 0.127       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.36       |
|    explained_variance   | 0.0228      |
|    learning_rate        | 5e-05       |
|    loss                 | 299         |
|    n_updates            | 1940        |
|    policy_gradient_loss | -0.0132     |
|    std                  | 0.786       |
|    value_loss           | 597         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 222         |
|    ep_rew_mean          | 449         |
| time/                   |             |
|    fps                  | 34          |
|    iterations           | 200         |
|    time_elapsed         | 11737       |
|    total_timesteps      | 409600      |
| train/                  |             |
|    approx_kl            | 0.012148202 |
|    clip_fraction        | 0.131       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.34       |
|    explained_variance   | 0.0335      |
|    learning_rate        | 5e-05       |
|    loss                 | 999         |
|    n_updates            | 2030        |
|    policy_gradient_loss | -0.0122     |
|    std                  | 0.781       |
|    value_loss           | 1.07e+03    |
-----------------------------------------
----------------------------------------
| rollout/                |        

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 210         |
|    ep_rew_mean          | 444         |
| time/                   |             |
|    fps                  | 34          |
|    iterations           | 209         |
|    time_elapsed         | 12374       |
|    total_timesteps      | 428032      |
| train/                  |             |
|    approx_kl            | 0.012500701 |
|    clip_fraction        | 0.138       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.33       |
|    explained_variance   | 0.0273      |
|    learning_rate        | 5e-05       |
|    loss                 | 304         |
|    n_updates            | 2120        |
|    policy_gradient_loss | -0.0139     |
|    std                  | 0.774       |
|    value_loss           | 759         |
-----------------------------------------
------------------------------------------
| rollout/                |      

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 188          |
|    ep_rew_mean          | 412          |
| time/                   |              |
|    fps                  | 34           |
|    iterations           | 218          |
|    time_elapsed         | 13001        |
|    total_timesteps      | 446464       |
| train/                  |              |
|    approx_kl            | 0.0078098946 |
|    clip_fraction        | 0.0951       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.32        |
|    explained_variance   | 0.061        |
|    learning_rate        | 5e-05        |
|    loss                 | 430          |
|    n_updates            | 2210         |
|    policy_gradient_loss | -0.0135      |
|    std                  | 0.772        |
|    value_loss           | 1.07e+03     |
------------------------------------------
-----------------------------------------
| rollout/  

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 199         |
|    ep_rew_mean          | 438         |
| time/                   |             |
|    fps                  | 34          |
|    iterations           | 227         |
|    time_elapsed         | 13647       |
|    total_timesteps      | 464896      |
| train/                  |             |
|    approx_kl            | 0.009803809 |
|    clip_fraction        | 0.107       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.31       |
|    explained_variance   | 0.0339      |
|    learning_rate        | 5e-05       |
|    loss                 | 892         |
|    n_updates            | 2300        |
|    policy_gradient_loss | -0.013      |
|    std                  | 0.77        |
|    value_loss           | 988         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 204         |
|    ep_rew_mean          | 431         |
| time/                   |             |
|    fps                  | 33          |
|    iterations           | 236         |
|    time_elapsed         | 14284       |
|    total_timesteps      | 483328      |
| train/                  |             |
|    approx_kl            | 0.009910657 |
|    clip_fraction        | 0.0849      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.29       |
|    explained_variance   | 0.051       |
|    learning_rate        | 5e-05       |
|    loss                 | 361         |
|    n_updates            | 2390        |
|    policy_gradient_loss | -0.00944    |
|    std                  | 0.762       |
|    value_loss           | 1.09e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 224         |
|    ep_rew_mean          | 466         |
| time/                   |             |
|    fps                  | 33          |
|    iterations           | 245         |
|    time_elapsed         | 14927       |
|    total_timesteps      | 501760      |
| train/                  |             |
|    approx_kl            | 0.011163006 |
|    clip_fraction        | 0.125       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.3        |
|    explained_variance   | -0.0082     |
|    learning_rate        | 5e-05       |
|    loss                 | 295         |
|    n_updates            | 2480        |
|    policy_gradient_loss | -0.00948    |
|    std                  | 0.766       |
|    value_loss           | 727         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 183          |
|    ep_rew_mean          | 408          |
| time/                   |              |
|    fps                  | 33           |
|    iterations           | 254          |
|    time_elapsed         | 15551        |
|    total_timesteps      | 520192       |
| train/                  |              |
|    approx_kl            | 0.0107572265 |
|    clip_fraction        | 0.137        |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.29        |
|    explained_variance   | 0.0349       |
|    learning_rate        | 5e-05        |
|    loss                 | 376          |
|    n_updates            | 2570         |
|    policy_gradient_loss | -0.00866     |
|    std                  | 0.76         |
|    value_loss           | 792          |
------------------------------------------
-----------------------------------------
| rollout/  

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 206         |
|    ep_rew_mean          | 455         |
| time/                   |             |
|    fps                  | 33          |
|    iterations           | 263         |
|    time_elapsed         | 16173       |
|    total_timesteps      | 538624      |
| train/                  |             |
|    approx_kl            | 0.010196021 |
|    clip_fraction        | 0.131       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.29       |
|    explained_variance   | 0.0459      |
|    learning_rate        | 5e-05       |
|    loss                 | 368         |
|    n_updates            | 2660        |
|    policy_gradient_loss | -0.00679    |
|    std                  | 0.76        |
|    value_loss           | 735         |
-----------------------------------------
------------------------------------------
| rollout/                |      

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 255         |
|    ep_rew_mean          | 535         |
| time/                   |             |
|    fps                  | 33          |
|    iterations           | 272         |
|    time_elapsed         | 16800       |
|    total_timesteps      | 557056      |
| train/                  |             |
|    approx_kl            | 0.008383846 |
|    clip_fraction        | 0.0944      |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.28       |
|    explained_variance   | 0.0596      |
|    learning_rate        | 5e-05       |
|    loss                 | 393         |
|    n_updates            | 2750        |
|    policy_gradient_loss | -0.009      |
|    std                  | 0.756       |
|    value_loss           | 940         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 225         |
|    ep_rew_mean          | 483         |
| time/                   |             |
|    fps                  | 33          |
|    iterations           | 281         |
|    time_elapsed         | 17426       |
|    total_timesteps      | 575488      |
| train/                  |             |
|    approx_kl            | 0.010010066 |
|    clip_fraction        | 0.113       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.27       |
|    explained_variance   | 0.0117      |
|    learning_rate        | 5e-05       |
|    loss                 | 454         |
|    n_updates            | 2840        |
|    policy_gradient_loss | -0.0121     |
|    std                  | 0.754       |
|    value_loss           | 865         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 220        |
|    ep_rew_mean          | 448        |
| time/                   |            |
|    fps                  | 32         |
|    iterations           | 290        |
|    time_elapsed         | 18048      |
|    total_timesteps      | 593920     |
| train/                  |            |
|    approx_kl            | 0.01474163 |
|    clip_fraction        | 0.137      |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.26      |
|    explained_variance   | -0.013     |
|    learning_rate        | 5e-05      |
|    loss                 | 322        |
|    n_updates            | 2930       |
|    policy_gradient_loss | -0.0119    |
|    std                  | 0.749      |
|    value_loss           | 946        |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_me

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 208         |
|    ep_rew_mean          | 440         |
| time/                   |             |
|    fps                  | 32          |
|    iterations           | 299         |
|    time_elapsed         | 18674       |
|    total_timesteps      | 612352      |
| train/                  |             |
|    approx_kl            | 0.009303792 |
|    clip_fraction        | 0.111       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.25       |
|    explained_variance   | 0.0378      |
|    learning_rate        | 5e-05       |
|    loss                 | 521         |
|    n_updates            | 3020        |
|    policy_gradient_loss | -0.0119     |
|    std                  | 0.746       |
|    value_loss           | 888         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 257         |
|    ep_rew_mean          | 533         |
| time/                   |             |
|    fps                  | 32          |
|    iterations           | 308         |
|    time_elapsed         | 19324       |
|    total_timesteps      | 630784      |
| train/                  |             |
|    approx_kl            | 0.020937944 |
|    clip_fraction        | 0.208       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.25       |
|    explained_variance   | 0.0049      |
|    learning_rate        | 5e-05       |
|    loss                 | 251         |
|    n_updates            | 3110        |
|    policy_gradient_loss | -0.0112     |
|    std                  | 0.745       |
|    value_loss           | 552         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 241         |
|    ep_rew_mean          | 502         |
| time/                   |             |
|    fps                  | 32          |
|    iterations           | 317         |
|    time_elapsed         | 19946       |
|    total_timesteps      | 649216      |
| train/                  |             |
|    approx_kl            | 0.020567726 |
|    clip_fraction        | 0.15        |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.24       |
|    explained_variance   | 0.0121      |
|    learning_rate        | 5e-05       |
|    loss                 | 379         |
|    n_updates            | 3200        |
|    policy_gradient_loss | -0.0143     |
|    std                  | 0.741       |
|    value_loss           | 710         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 217         |
|    ep_rew_mean          | 469         |
| time/                   |             |
|    fps                  | 32          |
|    iterations           | 326         |
|    time_elapsed         | 20571       |
|    total_timesteps      | 667648      |
| train/                  |             |
|    approx_kl            | 0.020424917 |
|    clip_fraction        | 0.209       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.23       |
|    explained_variance   | 0.0276      |
|    learning_rate        | 5e-05       |
|    loss                 | 177         |
|    n_updates            | 3290        |
|    policy_gradient_loss | -0.00866    |
|    std                  | 0.74        |
|    value_loss           | 508         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 235         |
|    ep_rew_mean          | 509         |
| time/                   |             |
|    fps                  | 32          |
|    iterations           | 335         |
|    time_elapsed         | 21193       |
|    total_timesteps      | 686080      |
| train/                  |             |
|    approx_kl            | 0.019021956 |
|    clip_fraction        | 0.157       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.22       |
|    explained_variance   | 0.0216      |
|    learning_rate        | 5e-05       |
|    loss                 | 374         |
|    n_updates            | 3380        |
|    policy_gradient_loss | -0.0116     |
|    std                  | 0.735       |
|    value_loss           | 614         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 215          |
|    ep_rew_mean          | 477          |
| time/                   |              |
|    fps                  | 32           |
|    iterations           | 344          |
|    time_elapsed         | 21829        |
|    total_timesteps      | 704512       |
| train/                  |              |
|    approx_kl            | 0.0103052445 |
|    clip_fraction        | 0.0845       |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.21        |
|    explained_variance   | 0.0383       |
|    learning_rate        | 5e-05        |
|    loss                 | 574          |
|    n_updates            | 3470         |
|    policy_gradient_loss | -0.0107      |
|    std                  | 0.732        |
|    value_loss           | 1.25e+03     |
------------------------------------------
-----------------------------------------
| rollout/  

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 214          |
|    ep_rew_mean          | 496          |
| time/                   |              |
|    fps                  | 32           |
|    iterations           | 353          |
|    time_elapsed         | 22459        |
|    total_timesteps      | 722944       |
| train/                  |              |
|    approx_kl            | 0.0148128355 |
|    clip_fraction        | 0.144        |
|    clip_range           | 0.2          |
|    entropy_loss         | -2.19        |
|    explained_variance   | 0.0234       |
|    learning_rate        | 5e-05        |
|    loss                 | 681          |
|    n_updates            | 3560         |
|    policy_gradient_loss | -0.00892     |
|    std                  | 0.723        |
|    value_loss           | 1e+03        |
------------------------------------------
-----------------------------------------
| rollout/  

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 209        |
|    ep_rew_mean          | 466        |
| time/                   |            |
|    fps                  | 32         |
|    iterations           | 362        |
|    time_elapsed         | 23093      |
|    total_timesteps      | 741376     |
| train/                  |            |
|    approx_kl            | 0.01808909 |
|    clip_fraction        | 0.192      |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.18      |
|    explained_variance   | 0.0175     |
|    learning_rate        | 5e-05      |
|    loss                 | 289        |
|    n_updates            | 3650       |
|    policy_gradient_loss | -0.0048    |
|    std                  | 0.719      |
|    value_loss           | 807        |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_me

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 253        |
|    ep_rew_mean          | 544        |
| time/                   |            |
|    fps                  | 31         |
|    iterations           | 371        |
|    time_elapsed         | 23751      |
|    total_timesteps      | 759808     |
| train/                  |            |
|    approx_kl            | 0.01670705 |
|    clip_fraction        | 0.143      |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.18      |
|    explained_variance   | 0.0216     |
|    learning_rate        | 5e-05      |
|    loss                 | 751        |
|    n_updates            | 3740       |
|    policy_gradient_loss | -0.0106    |
|    std                  | 0.718      |
|    value_loss           | 1.32e+03   |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_me

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 221         |
|    ep_rew_mean          | 502         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 380         |
|    time_elapsed         | 24391       |
|    total_timesteps      | 778240      |
| train/                  |             |
|    approx_kl            | 0.010625251 |
|    clip_fraction        | 0.12        |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.17       |
|    explained_variance   | 0.0226      |
|    learning_rate        | 5e-05       |
|    loss                 | 625         |
|    n_updates            | 3830        |
|    policy_gradient_loss | -0.0121     |
|    std                  | 0.716       |
|    value_loss           | 1.27e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 223         |
|    ep_rew_mean          | 504         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 389         |
|    time_elapsed         | 25025       |
|    total_timesteps      | 796672      |
| train/                  |             |
|    approx_kl            | 0.008436054 |
|    clip_fraction        | 0.11        |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.16       |
|    explained_variance   | 0.0368      |
|    learning_rate        | 5e-05       |
|    loss                 | 475         |
|    n_updates            | 3920        |
|    policy_gradient_loss | -0.0127     |
|    std                  | 0.713       |
|    value_loss           | 1.17e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 210         |
|    ep_rew_mean          | 477         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 398         |
|    time_elapsed         | 25658       |
|    total_timesteps      | 815104      |
| train/                  |             |
|    approx_kl            | 0.024700858 |
|    clip_fraction        | 0.199       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.17       |
|    explained_variance   | 0.0732      |
|    learning_rate        | 5e-05       |
|    loss                 | 409         |
|    n_updates            | 4010        |
|    policy_gradient_loss | -0.000254   |
|    std                  | 0.716       |
|    value_loss           | 734         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 238         |
|    ep_rew_mean          | 510         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 407         |
|    time_elapsed         | 26307       |
|    total_timesteps      | 833536      |
| train/                  |             |
|    approx_kl            | 0.012402972 |
|    clip_fraction        | 0.129       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.16       |
|    explained_variance   | 0.051       |
|    learning_rate        | 5e-05       |
|    loss                 | 928         |
|    n_updates            | 4100        |
|    policy_gradient_loss | -0.0101     |
|    std                  | 0.713       |
|    value_loss           | 990         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 215         |
|    ep_rew_mean          | 495         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 416         |
|    time_elapsed         | 26940       |
|    total_timesteps      | 851968      |
| train/                  |             |
|    approx_kl            | 0.021150593 |
|    clip_fraction        | 0.203       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.18       |
|    explained_variance   | 0.0237      |
|    learning_rate        | 5e-05       |
|    loss                 | 455         |
|    n_updates            | 4190        |
|    policy_gradient_loss | -0.00878    |
|    std                  | 0.719       |
|    value_loss           | 594         |
-----------------------------------------
----------------------------------------
| rollout/                |        

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 235         |
|    ep_rew_mean          | 532         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 425         |
|    time_elapsed         | 27578       |
|    total_timesteps      | 870400      |
| train/                  |             |
|    approx_kl            | 0.013013127 |
|    clip_fraction        | 0.154       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.18       |
|    explained_variance   | 0.00761     |
|    learning_rate        | 5e-05       |
|    loss                 | 550         |
|    n_updates            | 4280        |
|    policy_gradient_loss | -0.00787    |
|    std                  | 0.72        |
|    value_loss           | 702         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 197         |
|    ep_rew_mean          | 419         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 434         |
|    time_elapsed         | 28199       |
|    total_timesteps      | 888832      |
| train/                  |             |
|    approx_kl            | 0.013296264 |
|    clip_fraction        | 0.172       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.17       |
|    explained_variance   | 0.0311      |
|    learning_rate        | 5e-05       |
|    loss                 | 350         |
|    n_updates            | 4370        |
|    policy_gradient_loss | -0.0115     |
|    std                  | 0.717       |
|    value_loss           | 724         |
-----------------------------------------
----------------------------------------
| rollout/                |        

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 211         |
|    ep_rew_mean          | 452         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 443         |
|    time_elapsed         | 28794       |
|    total_timesteps      | 907264      |
| train/                  |             |
|    approx_kl            | 0.014079695 |
|    clip_fraction        | 0.153       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.15       |
|    explained_variance   | 0.0375      |
|    learning_rate        | 5e-05       |
|    loss                 | 450         |
|    n_updates            | 4460        |
|    policy_gradient_loss | -0.011      |
|    std                  | 0.71        |
|    value_loss           | 1.3e+03     |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 173         |
|    ep_rew_mean          | 417         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 452         |
|    time_elapsed         | 29388       |
|    total_timesteps      | 925696      |
| train/                  |             |
|    approx_kl            | 0.013383334 |
|    clip_fraction        | 0.158       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.15       |
|    explained_variance   | 0.0434      |
|    learning_rate        | 5e-05       |
|    loss                 | 257         |
|    n_updates            | 4550        |
|    policy_gradient_loss | -0.0135     |
|    std                  | 0.71        |
|    value_loss           | 852         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 190         |
|    ep_rew_mean          | 469         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 461         |
|    time_elapsed         | 29986       |
|    total_timesteps      | 944128      |
| train/                  |             |
|    approx_kl            | 0.019019824 |
|    clip_fraction        | 0.162       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.15       |
|    explained_variance   | 0.0187      |
|    learning_rate        | 5e-05       |
|    loss                 | 469         |
|    n_updates            | 4640        |
|    policy_gradient_loss | -0.0173     |
|    std                  | 0.708       |
|    value_loss           | 1.17e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 195         |
|    ep_rew_mean          | 450         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 470         |
|    time_elapsed         | 30580       |
|    total_timesteps      | 962560      |
| train/                  |             |
|    approx_kl            | 0.010272415 |
|    clip_fraction        | 0.113       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.15       |
|    explained_variance   | 0.0245      |
|    learning_rate        | 5e-05       |
|    loss                 | 347         |
|    n_updates            | 4730        |
|    policy_gradient_loss | -0.0127     |
|    std                  | 0.71        |
|    value_loss           | 1.1e+03     |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 198         |
|    ep_rew_mean          | 470         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 479         |
|    time_elapsed         | 31173       |
|    total_timesteps      | 980992      |
| train/                  |             |
|    approx_kl            | 0.020264179 |
|    clip_fraction        | 0.177       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.16       |
|    explained_variance   | 0.0328      |
|    learning_rate        | 5e-05       |
|    loss                 | 366         |
|    n_updates            | 4820        |
|    policy_gradient_loss | -0.0142     |
|    std                  | 0.714       |
|    value_loss           | 1.1e+03     |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 189         |
|    ep_rew_mean          | 449         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 488         |
|    time_elapsed         | 31771       |
|    total_timesteps      | 999424      |
| train/                  |             |
|    approx_kl            | 0.013978276 |
|    clip_fraction        | 0.133       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.15       |
|    explained_variance   | 0.0462      |
|    learning_rate        | 5e-05       |
|    loss                 | 709         |
|    n_updates            | 4910        |
|    policy_gradient_loss | -0.0156     |
|    std                  | 0.711       |
|    value_loss           | 1.26e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 216         |
|    ep_rew_mean          | 482         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 497         |
|    time_elapsed         | 32363       |
|    total_timesteps      | 1017856     |
| train/                  |             |
|    approx_kl            | 0.023654178 |
|    clip_fraction        | 0.277       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.14       |
|    explained_variance   | 0.00165     |
|    learning_rate        | 5e-05       |
|    loss                 | 455         |
|    n_updates            | 5000        |
|    policy_gradient_loss | -0.000423   |
|    std                  | 0.707       |
|    value_loss           | 934         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 224        |
|    ep_rew_mean          | 504        |
| time/                   |            |
|    fps                  | 31         |
|    iterations           | 506        |
|    time_elapsed         | 32954      |
|    total_timesteps      | 1036288    |
| train/                  |            |
|    approx_kl            | 0.01226971 |
|    clip_fraction        | 0.132      |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.15      |
|    explained_variance   | -0.0289    |
|    learning_rate        | 5e-05      |
|    loss                 | 601        |
|    n_updates            | 5090       |
|    policy_gradient_loss | -0.0127    |
|    std                  | 0.708      |
|    value_loss           | 929        |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_me

---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 244       |
|    ep_rew_mean          | 534       |
| time/                   |           |
|    fps                  | 31        |
|    iterations           | 515       |
|    time_elapsed         | 33544     |
|    total_timesteps      | 1054720   |
| train/                  |           |
|    approx_kl            | 0.0205364 |
|    clip_fraction        | 0.214     |
|    clip_range           | 0.2       |
|    entropy_loss         | -2.13     |
|    explained_variance   | 0.00577   |
|    learning_rate        | 5e-05     |
|    loss                 | 769       |
|    n_updates            | 5180      |
|    policy_gradient_loss | -0.0122   |
|    std                  | 0.704     |
|    value_loss           | 900       |
---------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 238     

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 226         |
|    ep_rew_mean          | 502         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 524         |
|    time_elapsed         | 34133       |
|    total_timesteps      | 1073152     |
| train/                  |             |
|    approx_kl            | 0.024063498 |
|    clip_fraction        | 0.232       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.15       |
|    explained_variance   | 0.0144      |
|    learning_rate        | 5e-05       |
|    loss                 | 285         |
|    n_updates            | 5270        |
|    policy_gradient_loss | -0.0147     |
|    std                  | 0.709       |
|    value_loss           | 789         |
-----------------------------------------
----------------------------------------
| rollout/                |        

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 217         |
|    ep_rew_mean          | 487         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 533         |
|    time_elapsed         | 34722       |
|    total_timesteps      | 1091584     |
| train/                  |             |
|    approx_kl            | 0.025173128 |
|    clip_fraction        | 0.247       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.13       |
|    explained_variance   | 0.0316      |
|    learning_rate        | 5e-05       |
|    loss                 | 140         |
|    n_updates            | 5360        |
|    policy_gradient_loss | -0.00548    |
|    std                  | 0.704       |
|    value_loss           | 655         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 191         |
|    ep_rew_mean          | 425         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 542         |
|    time_elapsed         | 35315       |
|    total_timesteps      | 1110016     |
| train/                  |             |
|    approx_kl            | 0.013838652 |
|    clip_fraction        | 0.163       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.12       |
|    explained_variance   | 0.0756      |
|    learning_rate        | 5e-05       |
|    loss                 | 493         |
|    n_updates            | 5450        |
|    policy_gradient_loss | -0.0129     |
|    std                  | 0.699       |
|    value_loss           | 876         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 204         |
|    ep_rew_mean          | 469         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 551         |
|    time_elapsed         | 35905       |
|    total_timesteps      | 1128448     |
| train/                  |             |
|    approx_kl            | 0.016782317 |
|    clip_fraction        | 0.188       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.1        |
|    explained_variance   | 0.0476      |
|    learning_rate        | 5e-05       |
|    loss                 | 484         |
|    n_updates            | 5540        |
|    policy_gradient_loss | -0.0117     |
|    std                  | 0.693       |
|    value_loss           | 1.32e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 207         |
|    ep_rew_mean          | 471         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 560         |
|    time_elapsed         | 36498       |
|    total_timesteps      | 1146880     |
| train/                  |             |
|    approx_kl            | 0.016391411 |
|    clip_fraction        | 0.179       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.09       |
|    explained_variance   | 0.013       |
|    learning_rate        | 5e-05       |
|    loss                 | 348         |
|    n_updates            | 5630        |
|    policy_gradient_loss | -0.00788    |
|    std                  | 0.69        |
|    value_loss           | 1.21e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 211         |
|    ep_rew_mean          | 478         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 569         |
|    time_elapsed         | 37090       |
|    total_timesteps      | 1165312     |
| train/                  |             |
|    approx_kl            | 0.013680907 |
|    clip_fraction        | 0.169       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.08       |
|    explained_variance   | -0.00324    |
|    learning_rate        | 5e-05       |
|    loss                 | 364         |
|    n_updates            | 5720        |
|    policy_gradient_loss | -0.0124     |
|    std                  | 0.686       |
|    value_loss           | 680         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 205         |
|    ep_rew_mean          | 464         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 578         |
|    time_elapsed         | 37563       |
|    total_timesteps      | 1183744     |
| train/                  |             |
|    approx_kl            | 0.028199434 |
|    clip_fraction        | 0.148       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.08       |
|    explained_variance   | 0.0133      |
|    learning_rate        | 5e-05       |
|    loss                 | 438         |
|    n_updates            | 5810        |
|    policy_gradient_loss | -0.0119     |
|    std                  | 0.686       |
|    value_loss           | 1.06e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 203         |
|    ep_rew_mean          | 463         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 587         |
|    time_elapsed         | 38034       |
|    total_timesteps      | 1202176     |
| train/                  |             |
|    approx_kl            | 0.016367134 |
|    clip_fraction        | 0.239       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.08       |
|    explained_variance   | 0.0154      |
|    learning_rate        | 5e-05       |
|    loss                 | 138         |
|    n_updates            | 5900        |
|    policy_gradient_loss | -0.00784    |
|    std                  | 0.684       |
|    value_loss           | 776         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 200         |
|    ep_rew_mean          | 467         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 596         |
|    time_elapsed         | 38494       |
|    total_timesteps      | 1220608     |
| train/                  |             |
|    approx_kl            | 0.027690748 |
|    clip_fraction        | 0.242       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.08       |
|    explained_variance   | 0.0142      |
|    learning_rate        | 5e-05       |
|    loss                 | 149         |
|    n_updates            | 5990        |
|    policy_gradient_loss | -0.00478    |
|    std                  | 0.687       |
|    value_loss           | 650         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 225         |
|    ep_rew_mean          | 513         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 605         |
|    time_elapsed         | 38941       |
|    total_timesteps      | 1239040     |
| train/                  |             |
|    approx_kl            | 0.045194604 |
|    clip_fraction        | 0.216       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.08       |
|    explained_variance   | 0.0397      |
|    learning_rate        | 5e-05       |
|    loss                 | 340         |
|    n_updates            | 6080        |
|    policy_gradient_loss | -0.00563    |
|    std                  | 0.687       |
|    value_loss           | 974         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 212         |
|    ep_rew_mean          | 491         |
| time/                   |             |
|    fps                  | 31          |
|    iterations           | 614         |
|    time_elapsed         | 39402       |
|    total_timesteps      | 1257472     |
| train/                  |             |
|    approx_kl            | 0.019077372 |
|    clip_fraction        | 0.156       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.07       |
|    explained_variance   | 0.0181      |
|    learning_rate        | 5e-05       |
|    loss                 | 333         |
|    n_updates            | 6170        |
|    policy_gradient_loss | -0.00643    |
|    std                  | 0.682       |
|    value_loss           | 1.04e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 225         |
|    ep_rew_mean          | 510         |
| time/                   |             |
|    fps                  | 32          |
|    iterations           | 623         |
|    time_elapsed         | 39855       |
|    total_timesteps      | 1275904     |
| train/                  |             |
|    approx_kl            | 0.019042786 |
|    clip_fraction        | 0.218       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.07       |
|    explained_variance   | 0.0125      |
|    learning_rate        | 5e-05       |
|    loss                 | 301         |
|    n_updates            | 6260        |
|    policy_gradient_loss | -0.00269    |
|    std                  | 0.684       |
|    value_loss           | 767         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 235         |
|    ep_rew_mean          | 515         |
| time/                   |             |
|    fps                  | 32          |
|    iterations           | 632         |
|    time_elapsed         | 40314       |
|    total_timesteps      | 1294336     |
| train/                  |             |
|    approx_kl            | 0.022643145 |
|    clip_fraction        | 0.209       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.05       |
|    explained_variance   | 0.0065      |
|    learning_rate        | 5e-05       |
|    loss                 | 572         |
|    n_updates            | 6350        |
|    policy_gradient_loss | -0.0142     |
|    std                  | 0.677       |
|    value_loss           | 1.08e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 208         |
|    ep_rew_mean          | 407         |
| time/                   |             |
|    fps                  | 32          |
|    iterations           | 641         |
|    time_elapsed         | 40766       |
|    total_timesteps      | 1312768     |
| train/                  |             |
|    approx_kl            | 0.021765321 |
|    clip_fraction        | 0.172       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.04       |
|    explained_variance   | 0.0774      |
|    learning_rate        | 5e-05       |
|    loss                 | 463         |
|    n_updates            | 6440        |
|    policy_gradient_loss | -0.0134     |
|    std                  | 0.673       |
|    value_loss           | 747         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 218         |
|    ep_rew_mean          | 444         |
| time/                   |             |
|    fps                  | 32          |
|    iterations           | 650         |
|    time_elapsed         | 41218       |
|    total_timesteps      | 1331200     |
| train/                  |             |
|    approx_kl            | 0.014157001 |
|    clip_fraction        | 0.148       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.03       |
|    explained_variance   | 0.0285      |
|    learning_rate        | 5e-05       |
|    loss                 | 463         |
|    n_updates            | 6530        |
|    policy_gradient_loss | -0.0113     |
|    std                  | 0.669       |
|    value_loss           | 1.03e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 223         |
|    ep_rew_mean          | 472         |
| time/                   |             |
|    fps                  | 32          |
|    iterations           | 659         |
|    time_elapsed         | 41668       |
|    total_timesteps      | 1349632     |
| train/                  |             |
|    approx_kl            | 0.014347095 |
|    clip_fraction        | 0.167       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.03       |
|    explained_variance   | 0.0145      |
|    learning_rate        | 5e-05       |
|    loss                 | 242         |
|    n_updates            | 6620        |
|    policy_gradient_loss | -0.0117     |
|    std                  | 0.669       |
|    value_loss           | 846         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 227         |
|    ep_rew_mean          | 500         |
| time/                   |             |
|    fps                  | 32          |
|    iterations           | 668         |
|    time_elapsed         | 42116       |
|    total_timesteps      | 1368064     |
| train/                  |             |
|    approx_kl            | 0.011894728 |
|    clip_fraction        | 0.147       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.01       |
|    explained_variance   | 0.0399      |
|    learning_rate        | 5e-05       |
|    loss                 | 376         |
|    n_updates            | 6710        |
|    policy_gradient_loss | -0.0113     |
|    std                  | 0.662       |
|    value_loss           | 1.2e+03     |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 237         |
|    ep_rew_mean          | 502         |
| time/                   |             |
|    fps                  | 32          |
|    iterations           | 677         |
|    time_elapsed         | 42567       |
|    total_timesteps      | 1386496     |
| train/                  |             |
|    approx_kl            | 0.015745511 |
|    clip_fraction        | 0.215       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.01       |
|    explained_variance   | 0.0467      |
|    learning_rate        | 5e-05       |
|    loss                 | 353         |
|    n_updates            | 6800        |
|    policy_gradient_loss | -0.0084     |
|    std                  | 0.664       |
|    value_loss           | 685         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 245        |
|    ep_rew_mean          | 526        |
| time/                   |            |
|    fps                  | 32         |
|    iterations           | 686        |
|    time_elapsed         | 43007      |
|    total_timesteps      | 1404928    |
| train/                  |            |
|    approx_kl            | 0.01877147 |
|    clip_fraction        | 0.198      |
|    clip_range           | 0.2        |
|    entropy_loss         | -2         |
|    explained_variance   | 0.0483     |
|    learning_rate        | 5e-05      |
|    loss                 | 328        |
|    n_updates            | 6890       |
|    policy_gradient_loss | -0.011     |
|    std                  | 0.661      |
|    value_loss           | 866        |
----------------------------------------
---------------------------------------
| rollout/                |           |
|    ep_len_mean  

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 247         |
|    ep_rew_mean          | 568         |
| time/                   |             |
|    fps                  | 32          |
|    iterations           | 695         |
|    time_elapsed         | 43439       |
|    total_timesteps      | 1423360     |
| train/                  |             |
|    approx_kl            | 0.026800547 |
|    clip_fraction        | 0.243       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2          |
|    explained_variance   | -0.00374    |
|    learning_rate        | 5e-05       |
|    loss                 | 220         |
|    n_updates            | 6980        |
|    policy_gradient_loss | -0.00498    |
|    std                  | 0.662       |
|    value_loss           | 607         |
-----------------------------------------
----------------------------------------
| rollout/                |        

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 227         |
|    ep_rew_mean          | 505         |
| time/                   |             |
|    fps                  | 32          |
|    iterations           | 704         |
|    time_elapsed         | 43873       |
|    total_timesteps      | 1441792     |
| train/                  |             |
|    approx_kl            | 0.018526241 |
|    clip_fraction        | 0.149       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2          |
|    explained_variance   | 0.0231      |
|    learning_rate        | 5e-05       |
|    loss                 | 859         |
|    n_updates            | 7070        |
|    policy_gradient_loss | -0.017      |
|    std                  | 0.659       |
|    value_loss           | 1.02e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 209         |
|    ep_rew_mean          | 489         |
| time/                   |             |
|    fps                  | 32          |
|    iterations           | 713         |
|    time_elapsed         | 44308       |
|    total_timesteps      | 1460224     |
| train/                  |             |
|    approx_kl            | 0.009942916 |
|    clip_fraction        | 0.115       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2          |
|    explained_variance   | 0.0444      |
|    learning_rate        | 5e-05       |
|    loss                 | 404         |
|    n_updates            | 7160        |
|    policy_gradient_loss | -0.0099     |
|    std                  | 0.657       |
|    value_loss           | 1.49e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 217         |
|    ep_rew_mean          | 501         |
| time/                   |             |
|    fps                  | 33          |
|    iterations           | 722         |
|    time_elapsed         | 44742       |
|    total_timesteps      | 1478656     |
| train/                  |             |
|    approx_kl            | 0.015440727 |
|    clip_fraction        | 0.156       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.99       |
|    explained_variance   | 0.0083      |
|    learning_rate        | 5e-05       |
|    loss                 | 491         |
|    n_updates            | 7250        |
|    policy_gradient_loss | -0.0152     |
|    std                  | 0.656       |
|    value_loss           | 781         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 227         |
|    ep_rew_mean          | 516         |
| time/                   |             |
|    fps                  | 33          |
|    iterations           | 731         |
|    time_elapsed         | 45175       |
|    total_timesteps      | 1497088     |
| train/                  |             |
|    approx_kl            | 0.016416954 |
|    clip_fraction        | 0.175       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.99       |
|    explained_variance   | 0.0327      |
|    learning_rate        | 5e-05       |
|    loss                 | 662         |
|    n_updates            | 7340        |
|    policy_gradient_loss | -0.00576    |
|    std                  | 0.655       |
|    value_loss           | 1.05e+03    |
-----------------------------------------
----------------------------------------
| rollout/                |        

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 194        |
|    ep_rew_mean          | 467        |
| time/                   |            |
|    fps                  | 33         |
|    iterations           | 740        |
|    time_elapsed         | 45610      |
|    total_timesteps      | 1515520    |
| train/                  |            |
|    approx_kl            | 0.04248943 |
|    clip_fraction        | 0.274      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.97      |
|    explained_variance   | 0.0205     |
|    learning_rate        | 5e-05      |
|    loss                 | 411        |
|    n_updates            | 7430       |
|    policy_gradient_loss | -0.00706   |
|    std                  | 0.651      |
|    value_loss           | 767        |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_me

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 229         |
|    ep_rew_mean          | 505         |
| time/                   |             |
|    fps                  | 33          |
|    iterations           | 750         |
|    time_elapsed         | 46088       |
|    total_timesteps      | 1536000     |
| train/                  |             |
|    approx_kl            | 0.017012168 |
|    clip_fraction        | 0.221       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.97       |
|    explained_variance   | 0.0404      |
|    learning_rate        | 5e-05       |
|    loss                 | 379         |
|    n_updates            | 7530        |
|    policy_gradient_loss | -1.99e-05   |
|    std                  | 0.651       |
|    value_loss           | 1.03e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 208        |
|    ep_rew_mean          | 468        |
| time/                   |            |
|    fps                  | 33         |
|    iterations           | 759        |
|    time_elapsed         | 46521      |
|    total_timesteps      | 1554432    |
| train/                  |            |
|    approx_kl            | 0.02476797 |
|    clip_fraction        | 0.328      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.97      |
|    explained_variance   | 0.0216     |
|    learning_rate        | 5e-05      |
|    loss                 | 430        |
|    n_updates            | 7620       |
|    policy_gradient_loss | 0.0129     |
|    std                  | 0.651      |
|    value_loss           | 1.04e+03   |
----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 213         |
|    ep_rew_mean          | 487         |
| time/                   |             |
|    fps                  | 33          |
|    iterations           | 768         |
|    time_elapsed         | 46955       |
|    total_timesteps      | 1572864     |
| train/                  |             |
|    approx_kl            | 0.019862749 |
|    clip_fraction        | 0.137       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.97       |
|    explained_variance   | 0.083       |
|    learning_rate        | 5e-05       |
|    loss                 | 468         |
|    n_updates            | 7710        |
|    policy_gradient_loss | -0.00906    |
|    std                  | 0.649       |
|    value_loss           | 798         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 200         |
|    ep_rew_mean          | 481         |
| time/                   |             |
|    fps                  | 33          |
|    iterations           | 777         |
|    time_elapsed         | 47400       |
|    total_timesteps      | 1591296     |
| train/                  |             |
|    approx_kl            | 0.013487209 |
|    clip_fraction        | 0.157       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.97       |
|    explained_variance   | 0.0585      |
|    learning_rate        | 5e-05       |
|    loss                 | 644         |
|    n_updates            | 7800        |
|    policy_gradient_loss | -0.0148     |
|    std                  | 0.652       |
|    value_loss           | 986         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 209         |
|    ep_rew_mean          | 483         |
| time/                   |             |
|    fps                  | 33          |
|    iterations           | 786         |
|    time_elapsed         | 47851       |
|    total_timesteps      | 1609728     |
| train/                  |             |
|    approx_kl            | 0.043586485 |
|    clip_fraction        | 0.299       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.97       |
|    explained_variance   | -0.00281    |
|    learning_rate        | 5e-05       |
|    loss                 | 543         |
|    n_updates            | 7890        |
|    policy_gradient_loss | 0.0038      |
|    std                  | 0.652       |
|    value_loss           | 872         |
-----------------------------------------
----------------------------------------
| rollout/                |        

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 226        |
|    ep_rew_mean          | 506        |
| time/                   |            |
|    fps                  | 33         |
|    iterations           | 795        |
|    time_elapsed         | 48305      |
|    total_timesteps      | 1628160    |
| train/                  |            |
|    approx_kl            | 0.06726641 |
|    clip_fraction        | 0.279      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.97      |
|    explained_variance   | 0.0192     |
|    learning_rate        | 5e-05      |
|    loss                 | 163        |
|    n_updates            | 7980       |
|    policy_gradient_loss | -0.00261   |
|    std                  | 0.652      |
|    value_loss           | 417        |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_me

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 245         |
|    ep_rew_mean          | 535         |
| time/                   |             |
|    fps                  | 33          |
|    iterations           | 804         |
|    time_elapsed         | 48755       |
|    total_timesteps      | 1646592     |
| train/                  |             |
|    approx_kl            | 0.029648496 |
|    clip_fraction        | 0.258       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.96       |
|    explained_variance   | 0.00596     |
|    learning_rate        | 5e-05       |
|    loss                 | 705         |
|    n_updates            | 8070        |
|    policy_gradient_loss | -0.00881    |
|    std                  | 0.648       |
|    value_loss           | 1.33e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 175        |
|    ep_rew_mean          | 446        |
| time/                   |            |
|    fps                  | 33         |
|    iterations           | 813        |
|    time_elapsed         | 49206      |
|    total_timesteps      | 1665024    |
| train/                  |            |
|    approx_kl            | 0.07661872 |
|    clip_fraction        | 0.263      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.98      |
|    explained_variance   | 0.00203    |
|    learning_rate        | 5e-05      |
|    loss                 | 649        |
|    n_updates            | 8160       |
|    policy_gradient_loss | 0.00321    |
|    std                  | 0.652      |
|    value_loss           | 1.41e+03   |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_me

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 131        |
|    ep_rew_mean          | 260        |
| time/                   |            |
|    fps                  | 33         |
|    iterations           | 822        |
|    time_elapsed         | 49664      |
|    total_timesteps      | 1683456    |
| train/                  |            |
|    approx_kl            | 0.26357907 |
|    clip_fraction        | 0.303      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.97      |
|    explained_variance   | 0.0186     |
|    learning_rate        | 5e-05      |
|    loss                 | 243        |
|    n_updates            | 8250       |
|    policy_gradient_loss | 0.0233     |
|    std                  | 0.651      |
|    value_loss           | 637        |
----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 49.4        |
|    ep_rew_mean          | 103         |
| time/                   |             |
|    fps                  | 33          |
|    iterations           | 831         |
|    time_elapsed         | 50155       |
|    total_timesteps      | 1701888     |
| train/                  |             |
|    approx_kl            | 0.018010162 |
|    clip_fraction        | 0.134       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.97       |
|    explained_variance   | 0.31        |
|    learning_rate        | 5e-05       |
|    loss                 | 312         |
|    n_updates            | 8340        |
|    policy_gradient_loss | -0.0164     |
|    std                  | 0.652       |
|    value_loss           | 694         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 70          |
|    ep_rew_mean          | 151         |
| time/                   |             |
|    fps                  | 33          |
|    iterations           | 840         |
|    time_elapsed         | 50626       |
|    total_timesteps      | 1720320     |
| train/                  |             |
|    approx_kl            | 0.011138463 |
|    clip_fraction        | 0.102       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.96       |
|    explained_variance   | 0.426       |
|    learning_rate        | 5e-05       |
|    loss                 | 368         |
|    n_updates            | 8430        |
|    policy_gradient_loss | -0.0152     |
|    std                  | 0.648       |
|    value_loss           | 769         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 94          |
|    ep_rew_mean          | 196         |
| time/                   |             |
|    fps                  | 34          |
|    iterations           | 849         |
|    time_elapsed         | 51090       |
|    total_timesteps      | 1738752     |
| train/                  |             |
|    approx_kl            | 0.013818212 |
|    clip_fraction        | 0.11        |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.95       |
|    explained_variance   | 0.303       |
|    learning_rate        | 5e-05       |
|    loss                 | 370         |
|    n_updates            | 8520        |
|    policy_gradient_loss | -0.0167     |
|    std                  | 0.644       |
|    value_loss           | 1.13e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 145         |
|    ep_rew_mean          | 338         |
| time/                   |             |
|    fps                  | 34          |
|    iterations           | 858         |
|    time_elapsed         | 51554       |
|    total_timesteps      | 1757184     |
| train/                  |             |
|    approx_kl            | 0.017342221 |
|    clip_fraction        | 0.2         |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.95       |
|    explained_variance   | 0.0801      |
|    learning_rate        | 5e-05       |
|    loss                 | 453         |
|    n_updates            | 8610        |
|    policy_gradient_loss | -0.0137     |
|    std                  | 0.644       |
|    value_loss           | 965         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 222         |
|    ep_rew_mean          | 494         |
| time/                   |             |
|    fps                  | 34          |
|    iterations           | 867         |
|    time_elapsed         | 52020       |
|    total_timesteps      | 1775616     |
| train/                  |             |
|    approx_kl            | 0.021835431 |
|    clip_fraction        | 0.176       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.95       |
|    explained_variance   | 0.00144     |
|    learning_rate        | 5e-05       |
|    loss                 | 546         |
|    n_updates            | 8700        |
|    policy_gradient_loss | -0.00803    |
|    std                  | 0.644       |
|    value_loss           | 928         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 177         |
|    ep_rew_mean          | 410         |
| time/                   |             |
|    fps                  | 34          |
|    iterations           | 876         |
|    time_elapsed         | 52485       |
|    total_timesteps      | 1794048     |
| train/                  |             |
|    approx_kl            | 0.016320039 |
|    clip_fraction        | 0.174       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.93       |
|    explained_variance   | 0.0125      |
|    learning_rate        | 5e-05       |
|    loss                 | 418         |
|    n_updates            | 8790        |
|    policy_gradient_loss | -0.0138     |
|    std                  | 0.637       |
|    value_loss           | 887         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 202         |
|    ep_rew_mean          | 470         |
| time/                   |             |
|    fps                  | 34          |
|    iterations           | 885         |
|    time_elapsed         | 52950       |
|    total_timesteps      | 1812480     |
| train/                  |             |
|    approx_kl            | 0.023514703 |
|    clip_fraction        | 0.257       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.93       |
|    explained_variance   | -0.0011     |
|    learning_rate        | 5e-05       |
|    loss                 | 570         |
|    n_updates            | 8880        |
|    policy_gradient_loss | 0.00131     |
|    std                  | 0.636       |
|    value_loss           | 592         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 202        |
|    ep_rew_mean          | 466        |
| time/                   |            |
|    fps                  | 34         |
|    iterations           | 894        |
|    time_elapsed         | 53411      |
|    total_timesteps      | 1830912    |
| train/                  |            |
|    approx_kl            | 0.04046198 |
|    clip_fraction        | 0.252      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.92      |
|    explained_variance   | 0.00354    |
|    learning_rate        | 5e-05      |
|    loss                 | 264        |
|    n_updates            | 8970       |
|    policy_gradient_loss | -0.00779   |
|    std                  | 0.635      |
|    value_loss           | 753        |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_me

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 222         |
|    ep_rew_mean          | 504         |
| time/                   |             |
|    fps                  | 34          |
|    iterations           | 903         |
|    time_elapsed         | 53868       |
|    total_timesteps      | 1849344     |
| train/                  |             |
|    approx_kl            | 0.022903103 |
|    clip_fraction        | 0.229       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.93       |
|    explained_variance   | 0.0011      |
|    learning_rate        | 5e-05       |
|    loss                 | 203         |
|    n_updates            | 9060        |
|    policy_gradient_loss | -0.00303    |
|    std                  | 0.637       |
|    value_loss           | 775         |
-----------------------------------------
----------------------------------------
| rollout/                |        

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 213         |
|    ep_rew_mean          | 482         |
| time/                   |             |
|    fps                  | 34          |
|    iterations           | 912         |
|    time_elapsed         | 54332       |
|    total_timesteps      | 1867776     |
| train/                  |             |
|    approx_kl            | 0.011409141 |
|    clip_fraction        | 0.134       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.93       |
|    explained_variance   | 0.0306      |
|    learning_rate        | 5e-05       |
|    loss                 | 492         |
|    n_updates            | 9150        |
|    policy_gradient_loss | -0.0139     |
|    std                  | 0.638       |
|    value_loss           | 1.26e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 211         |
|    ep_rew_mean          | 496         |
| time/                   |             |
|    fps                  | 34          |
|    iterations           | 921         |
|    time_elapsed         | 54773       |
|    total_timesteps      | 1886208     |
| train/                  |             |
|    approx_kl            | 0.019344483 |
|    clip_fraction        | 0.181       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.92       |
|    explained_variance   | 0.0158      |
|    learning_rate        | 5e-05       |
|    loss                 | 621         |
|    n_updates            | 9240        |
|    policy_gradient_loss | -0.0122     |
|    std                  | 0.636       |
|    value_loss           | 971         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 208         |
|    ep_rew_mean          | 494         |
| time/                   |             |
|    fps                  | 34          |
|    iterations           | 930         |
|    time_elapsed         | 55138       |
|    total_timesteps      | 1904640     |
| train/                  |             |
|    approx_kl            | 0.030672066 |
|    clip_fraction        | 0.229       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.92       |
|    explained_variance   | 0.0368      |
|    learning_rate        | 5e-05       |
|    loss                 | 395         |
|    n_updates            | 9330        |
|    policy_gradient_loss | -0.00747    |
|    std                  | 0.636       |
|    value_loss           | 728         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 221        |
|    ep_rew_mean          | 481        |
| time/                   |            |
|    fps                  | 34         |
|    iterations           | 939        |
|    time_elapsed         | 55515      |
|    total_timesteps      | 1923072    |
| train/                  |            |
|    approx_kl            | 0.01719563 |
|    clip_fraction        | 0.215      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.92      |
|    explained_variance   | 0.0179     |
|    learning_rate        | 5e-05      |
|    loss                 | 448        |
|    n_updates            | 9420       |
|    policy_gradient_loss | -0.00868   |
|    std                  | 0.634      |
|    value_loss           | 902        |
----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_me

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 226         |
|    ep_rew_mean          | 495         |
| time/                   |             |
|    fps                  | 34          |
|    iterations           | 948         |
|    time_elapsed         | 55902       |
|    total_timesteps      | 1941504     |
| train/                  |             |
|    approx_kl            | 0.014943941 |
|    clip_fraction        | 0.129       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.92       |
|    explained_variance   | 0.0534      |
|    learning_rate        | 5e-05       |
|    loss                 | 540         |
|    n_updates            | 9510        |
|    policy_gradient_loss | -0.0143     |
|    std                  | 0.634       |
|    value_loss           | 1.18e+03    |
-----------------------------------------
----------------------------------------
| rollout/                |        

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 222        |
|    ep_rew_mean          | 486        |
| time/                   |            |
|    fps                  | 34         |
|    iterations           | 957        |
|    time_elapsed         | 56291      |
|    total_timesteps      | 1959936    |
| train/                  |            |
|    approx_kl            | 0.07146259 |
|    clip_fraction        | 0.269      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.93      |
|    explained_variance   | -0.0101    |
|    learning_rate        | 5e-05      |
|    loss                 | 325        |
|    n_updates            | 9600       |
|    policy_gradient_loss | 0.00575    |
|    std                  | 0.641      |
|    value_loss           | 768        |
----------------------------------------
---------------------------------------
| rollout/                |           |
|    ep_len_mean  

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 132         |
|    ep_rew_mean          | 276         |
| time/                   |             |
|    fps                  | 34          |
|    iterations           | 967         |
|    time_elapsed         | 56723       |
|    total_timesteps      | 1980416     |
| train/                  |             |
|    approx_kl            | 0.013269299 |
|    clip_fraction        | 0.138       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.95       |
|    explained_variance   | 0.0637      |
|    learning_rate        | 5e-05       |
|    loss                 | 366         |
|    n_updates            | 9700        |
|    policy_gradient_loss | -0.00924    |
|    std                  | 0.645       |
|    value_loss           | 875         |
-----------------------------------------
-----------------------------------------
| rollout/                |       

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 140         |
|    ep_rew_mean          | 302         |
| time/                   |             |
|    fps                  | 35          |
|    iterations           | 976         |
|    time_elapsed         | 57104       |
|    total_timesteps      | 1998848     |
| train/                  |             |
|    approx_kl            | 0.033680595 |
|    clip_fraction        | 0.278       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.96       |
|    explained_variance   | 0.0908      |
|    learning_rate        | 5e-05       |
|    loss                 | 481         |
|    n_updates            | 9790        |
|    policy_gradient_loss | 0.0216      |
|    std                  | 0.648       |
|    value_loss           | 1.32e+03    |
-----------------------------------------
-----------------------------------------
| rollout/                |       

<stable_baselines3.ppo.ppo.PPO at 0x1f4b6e872b0>

## Model Loading

In [5]:
env = launch_env()
ppo_model = PPO.load("..\\saves\\ppo_model_1420000_steps") #TODO

## Model Logging

In [9]:
%tensorboard --logdir tb_logger

## Model Testing

In [None]:
cutoff = 1000
with th.no_grad():
    while True:
        obs = env.reset()
        env.render()
        rewards = []
        steps = 0
        while True:
            action, _states = ppo_model.predict(obs, deterministic=True)
            obs, rew, done, misc = env.step(action)
            rewards.append(rew)
            env.render()
            steps += 1
            if done or steps >= cutoff:
                break
        print("mean episode reward:", np.mean(rewards))

mean episode reward: 1.7281790107668522
mean episode reward: 1.8654211431881356
mean episode reward: 1.4751107256472027
mean episode reward: 1.4896763040586714
mean episode reward: 1.5874962692922605
mean episode reward: 1.9338546759989308
