# Lunar Lander Box2D lösning med PPO (Proximal Policy Optimization)
-----
#### Karim Kanji & Sebastian Fallström
#### Preskriptiv Analytik, IA-20

In [7]:
# Pip Installs
!pip install gymnasium
!pip install stable_baselines3
!pip install ufal.pybox2d
!pip install tensorflow



In [8]:
import gymnasium as gym
# Skapa Gym omgivningen
env = gym.make("LunarLander-v2")

In [9]:
# Importerar PPO från SB3
# Källan användes för parametrarna och annan lånad kod
# Källa: https://huggingface.co/niftymark/ppo-LunarLander-v2 

from stable_baselines3 import PPO

# Definierar modellen
model = PPO(
    policy='MlpPolicy',
    env=env,
    n_steps=1024,
    batch_size=64,
    n_epochs=4,
    gamma=0.999,
    gae_lambda=0.98,
    ent_coef=0.01,
    verbose=1
)

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.


In [8]:
# Lär in modellen för 1000000 steg
model.learn(total_timesteps=1000000)

---------------------------------
| rollout/           |          |
|    ep_len_mean     | 82.9     |
|    ep_rew_mean     | -154     |
| time/              |          |
|    fps             | 1715     |
|    iterations      | 1        |
|    time_elapsed    | 0        |
|    total_timesteps | 1024     |
---------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 87          |
|    ep_rew_mean          | -188        |
| time/                   |             |
|    fps                  | 1424        |
|    iterations           | 2           |
|    time_elapsed         | 1           |
|    total_timesteps      | 2048        |
| train/                  |             |
|    approx_kl            | 0.001029172 |
|    clip_fraction        | 0           |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.39       |
|    explained_variance   | -0.0037     |
|    learning_rate        | 0.

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 94          |
|    ep_rew_mean          | -143        |
| time/                   |             |
|    fps                  | 1229        |
|    iterations           | 11          |
|    time_elapsed         | 9           |
|    total_timesteps      | 11264       |
| train/                  |             |
|    approx_kl            | 0.002016448 |
|    clip_fraction        | 0           |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.37       |
|    explained_variance   | -0.0118     |
|    learning_rate        | 0.0003      |
|    loss                 | 2.56e+03    |
|    n_updates            | 40          |
|    policy_gradient_loss | -0.00327    |
|    value_loss           | 3.89e+03    |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 93.2

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 106          |
|    ep_rew_mean          | -127         |
| time/                   |              |
|    fps                  | 1264         |
|    iterations           | 21           |
|    time_elapsed         | 17           |
|    total_timesteps      | 21504        |
| train/                  |              |
|    approx_kl            | 0.0038545767 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.34        |
|    explained_variance   | -0.00158     |
|    learning_rate        | 0.0003       |
|    loss                 | 638          |
|    n_updates            | 80           |
|    policy_gradient_loss | -0.00398     |
|    value_loss           | 1.16e+03     |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 105          |
|    ep_rew_mean          | -80.9        |
| time/                   |              |
|    fps                  | 1273         |
|    iterations           | 31           |
|    time_elapsed         | 24           |
|    total_timesteps      | 31744        |
| train/                  |              |
|    approx_kl            | 8.881139e-05 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.31        |
|    explained_variance   | -0.00862     |
|    learning_rate        | 0.0003       |
|    loss                 | 514          |
|    n_updates            | 120          |
|    policy_gradient_loss | 0.000225     |
|    value_loss           | 1.11e+03     |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 132          |
|    ep_rew_mean          | -77.4        |
| time/                   |              |
|    fps                  | 1273         |
|    iterations           | 41           |
|    time_elapsed         | 32           |
|    total_timesteps      | 41984        |
| train/                  |              |
|    approx_kl            | 0.0012555704 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.34        |
|    explained_variance   | 0.00671      |
|    learning_rate        | 0.0003       |
|    loss                 | 520          |
|    n_updates            | 160          |
|    policy_gradient_loss | -0.00176     |
|    value_loss           | 574          |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 121         |
|    ep_rew_mean          | -53.1       |
| time/                   |             |
|    fps                  | 1285        |
|    iterations           | 51          |
|    time_elapsed         | 40          |
|    total_timesteps      | 52224       |
| train/                  |             |
|    approx_kl            | 0.004533755 |
|    clip_fraction        | 0.0105      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.15       |
|    explained_variance   | -0.00217    |
|    learning_rate        | 0.0003      |
|    loss                 | 897         |
|    n_updates            | 200         |
|    policy_gradient_loss | -0.00269    |
|    value_loss           | 1.43e+03    |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 115 

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 155         |
|    ep_rew_mean          | -31.4       |
| time/                   |             |
|    fps                  | 1284        |
|    iterations           | 61          |
|    time_elapsed         | 48          |
|    total_timesteps      | 62464       |
| train/                  |             |
|    approx_kl            | 0.002880517 |
|    clip_fraction        | 0.00293     |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.3        |
|    explained_variance   | 0.00518     |
|    learning_rate        | 0.0003      |
|    loss                 | 125         |
|    n_updates            | 240         |
|    policy_gradient_loss | -0.000696   |
|    value_loss           | 165         |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 166 

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 145          |
|    ep_rew_mean          | -19          |
| time/                   |              |
|    fps                  | 1292         |
|    iterations           | 71           |
|    time_elapsed         | 56           |
|    total_timesteps      | 72704        |
| train/                  |              |
|    approx_kl            | 0.0008023738 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.16        |
|    explained_variance   | -0.000492    |
|    learning_rate        | 0.0003       |
|    loss                 | 430          |
|    n_updates            | 280          |
|    policy_gradient_loss | 0.000269     |
|    value_loss           | 889          |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-------------------------------------------
| rollout/                |               |
|    ep_len_mean          | 163           |
|    ep_rew_mean          | -24.3         |
| time/                   |               |
|    fps                  | 1290          |
|    iterations           | 81            |
|    time_elapsed         | 64            |
|    total_timesteps      | 82944         |
| train/                  |               |
|    approx_kl            | 0.00022402755 |
|    clip_fraction        | 0             |
|    clip_range           | 0.2           |
|    entropy_loss         | -1.22         |
|    explained_variance   | 0.0294        |
|    learning_rate        | 0.0003        |
|    loss                 | 429           |
|    n_updates            | 320           |
|    policy_gradient_loss | -0.000732     |
|    value_loss           | 911           |
-------------------------------------------
-------------------------------------------
| rollout/                |     

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 222          |
|    ep_rew_mean          | -27.9        |
| time/                   |              |
|    fps                  | 1295         |
|    iterations           | 90           |
|    time_elapsed         | 71           |
|    total_timesteps      | 92160        |
| train/                  |              |
|    approx_kl            | 2.829032e-05 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.21        |
|    explained_variance   | 0.233        |
|    learning_rate        | 0.0003       |
|    loss                 | 366          |
|    n_updates            | 356          |
|    policy_gradient_loss | -0.000183    |
|    value_loss           | 840          |
------------------------------------------
-------------------------------------------
| rollout/                |               |
|    ep_l

-------------------------------------------
| rollout/                |               |
|    ep_len_mean          | 287           |
|    ep_rew_mean          | -20.9         |
| time/                   |               |
|    fps                  | 1289          |
|    iterations           | 100           |
|    time_elapsed         | 79            |
|    total_timesteps      | 102400        |
| train/                  |               |
|    approx_kl            | 0.00033349148 |
|    clip_fraction        | 0             |
|    clip_range           | 0.2           |
|    entropy_loss         | -1.22         |
|    explained_variance   | 0.303         |
|    learning_rate        | 0.0003        |
|    loss                 | 272           |
|    n_updates            | 396           |
|    policy_gradient_loss | -0.000291     |
|    value_loss           | 803           |
-------------------------------------------
------------------------------------------
| rollout/                |      

-------------------------------------------
| rollout/                |               |
|    ep_len_mean          | 358           |
|    ep_rew_mean          | -15.6         |
| time/                   |               |
|    fps                  | 1293          |
|    iterations           | 110           |
|    time_elapsed         | 87            |
|    total_timesteps      | 112640        |
| train/                  |               |
|    approx_kl            | 0.00034177175 |
|    clip_fraction        | 0             |
|    clip_range           | 0.2           |
|    entropy_loss         | -1.07         |
|    explained_variance   | 0.554         |
|    learning_rate        | 0.0003        |
|    loss                 | 321           |
|    n_updates            | 436           |
|    policy_gradient_loss | -0.000463     |
|    value_loss           | 658           |
-------------------------------------------
------------------------------------------
| rollout/                |      

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 347         |
|    ep_rew_mean          | -2.5        |
| time/                   |             |
|    fps                  | 1287        |
|    iterations           | 120         |
|    time_elapsed         | 95          |
|    total_timesteps      | 122880      |
| train/                  |             |
|    approx_kl            | 0.008257494 |
|    clip_fraction        | 0.0344      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.26       |
|    explained_variance   | 0.757       |
|    learning_rate        | 0.0003      |
|    loss                 | 24.2        |
|    n_updates            | 476         |
|    policy_gradient_loss | -0.00281    |
|    value_loss           | 55.1        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 346 

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 381         |
|    ep_rew_mean          | 12.7        |
| time/                   |             |
|    fps                  | 1288        |
|    iterations           | 130         |
|    time_elapsed         | 103         |
|    total_timesteps      | 133120      |
| train/                  |             |
|    approx_kl            | 0.002492336 |
|    clip_fraction        | 0.00171     |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.17       |
|    explained_variance   | 0.83        |
|    learning_rate        | 0.0003      |
|    loss                 | 154         |
|    n_updates            | 516         |
|    policy_gradient_loss | -0.00243    |
|    value_loss           | 213         |
-----------------------------------------
-------------------------------------------
| rollout/                |               |
|    ep_len_mean          | 38

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 446          |
|    ep_rew_mean          | 23.2         |
| time/                   |              |
|    fps                  | 1284         |
|    iterations           | 140          |
|    time_elapsed         | 111          |
|    total_timesteps      | 143360       |
| train/                  |              |
|    approx_kl            | 0.0024337529 |
|    clip_fraction        | 0.00464      |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.19        |
|    explained_variance   | 0.928        |
|    learning_rate        | 0.0003       |
|    loss                 | 21           |
|    n_updates            | 556          |
|    policy_gradient_loss | -0.00107     |
|    value_loss           | 77.6         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 472          |
|    ep_rew_mean          | 33.1         |
| time/                   |              |
|    fps                  | 1287         |
|    iterations           | 150          |
|    time_elapsed         | 119          |
|    total_timesteps      | 153600       |
| train/                  |              |
|    approx_kl            | 0.0010286269 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.05        |
|    explained_variance   | 0.22         |
|    learning_rate        | 0.0003       |
|    loss                 | 442          |
|    n_updates            | 596          |
|    policy_gradient_loss | -0.0016      |
|    value_loss           | 1.1e+03      |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 511         |
|    ep_rew_mean          | 38.7        |
| time/                   |             |
|    fps                  | 1272        |
|    iterations           | 160         |
|    time_elapsed         | 128         |
|    total_timesteps      | 163840      |
| train/                  |             |
|    approx_kl            | 0.008431789 |
|    clip_fraction        | 0.0776      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.09       |
|    explained_variance   | 0.895       |
|    learning_rate        | 0.0003      |
|    loss                 | 22.2        |
|    n_updates            | 636         |
|    policy_gradient_loss | -0.00612    |
|    value_loss           | 56.8        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 511   

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 566          |
|    ep_rew_mean          | 38.9         |
| time/                   |              |
|    fps                  | 1273         |
|    iterations           | 170          |
|    time_elapsed         | 136          |
|    total_timesteps      | 174080       |
| train/                  |              |
|    approx_kl            | 0.0039019687 |
|    clip_fraction        | 0.0293       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.997       |
|    explained_variance   | 0.816        |
|    learning_rate        | 0.0003       |
|    loss                 | 59.3         |
|    n_updates            | 676          |
|    policy_gradient_loss | -0.00299     |
|    value_loss           | 173          |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 631         |
|    ep_rew_mean          | 47.4        |
| time/                   |             |
|    fps                  | 1272        |
|    iterations           | 180         |
|    time_elapsed         | 144         |
|    total_timesteps      | 184320      |
| train/                  |             |
|    approx_kl            | 0.006253388 |
|    clip_fraction        | 0.0242      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.12       |
|    explained_variance   | 0.837       |
|    learning_rate        | 0.0003      |
|    loss                 | 38.7        |
|    n_updates            | 716         |
|    policy_gradient_loss | -0.00211    |
|    value_loss           | 96.4        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 639 

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 622         |
|    ep_rew_mean          | 46          |
| time/                   |             |
|    fps                  | 1273        |
|    iterations           | 190         |
|    time_elapsed         | 152         |
|    total_timesteps      | 194560      |
| train/                  |             |
|    approx_kl            | 0.006835377 |
|    clip_fraction        | 0.0139      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.05       |
|    explained_variance   | 0.506       |
|    learning_rate        | 0.0003      |
|    loss                 | 97.7        |
|    n_updates            | 756         |
|    policy_gradient_loss | -0.00739    |
|    value_loss           | 345         |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 631   

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 634        |
|    ep_rew_mean          | 45         |
| time/                   |            |
|    fps                  | 1272       |
|    iterations           | 200        |
|    time_elapsed         | 160        |
|    total_timesteps      | 204800     |
| train/                  |            |
|    approx_kl            | 0.01005663 |
|    clip_fraction        | 0.0828     |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.02      |
|    explained_variance   | 0.659      |
|    learning_rate        | 0.0003     |
|    loss                 | 200        |
|    n_updates            | 796        |
|    policy_gradient_loss | -0.00897   |
|    value_loss           | 258        |
----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 634          |
|    ep_re

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 665         |
|    ep_rew_mean          | 49.2        |
| time/                   |             |
|    fps                  | 1274        |
|    iterations           | 210         |
|    time_elapsed         | 168         |
|    total_timesteps      | 215040      |
| train/                  |             |
|    approx_kl            | 0.012603398 |
|    clip_fraction        | 0.05        |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.02       |
|    explained_variance   | 0.889       |
|    learning_rate        | 0.0003      |
|    loss                 | 7.49        |
|    n_updates            | 836         |
|    policy_gradient_loss | -0.00153    |
|    value_loss           | 16.2        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 674 

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 711        |
|    ep_rew_mean          | 47.8       |
| time/                   |            |
|    fps                  | 1268       |
|    iterations           | 220        |
|    time_elapsed         | 177        |
|    total_timesteps      | 225280     |
| train/                  |            |
|    approx_kl            | 0.00799489 |
|    clip_fraction        | 0.0598     |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.875     |
|    explained_variance   | 0.938      |
|    learning_rate        | 0.0003     |
|    loss                 | 6.71       |
|    n_updates            | 876        |
|    policy_gradient_loss | -0.00517   |
|    value_loss           | 22.8       |
----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 711          |
|    ep_re

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 754          |
|    ep_rew_mean          | 48.1         |
| time/                   |              |
|    fps                  | 1268         |
|    iterations           | 230          |
|    time_elapsed         | 185          |
|    total_timesteps      | 235520       |
| train/                  |              |
|    approx_kl            | 0.0031121732 |
|    clip_fraction        | 0.00879      |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.976       |
|    explained_variance   | 0.927        |
|    learning_rate        | 0.0003       |
|    loss                 | 10.9         |
|    n_updates            | 916          |
|    policy_gradient_loss | -0.00284     |
|    value_loss           | 26.3         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 785          |
|    ep_rew_mean          | 52           |
| time/                   |              |
|    fps                  | 1270         |
|    iterations           | 240          |
|    time_elapsed         | 193          |
|    total_timesteps      | 245760       |
| train/                  |              |
|    approx_kl            | 0.0049230405 |
|    clip_fraction        | 0.0576       |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.03        |
|    explained_variance   | 0.548        |
|    learning_rate        | 0.0003       |
|    loss                 | 97.8         |
|    n_updates            | 956          |
|    policy_gradient_loss | -0.00102     |
|    value_loss           | 306          |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 785         |
|    ep_rew_mean          | 50.6        |
| time/                   |             |
|    fps                  | 1270        |
|    iterations           | 250         |
|    time_elapsed         | 201         |
|    total_timesteps      | 256000      |
| train/                  |             |
|    approx_kl            | 0.010095851 |
|    clip_fraction        | 0.0432      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1          |
|    explained_variance   | 0.928       |
|    learning_rate        | 0.0003      |
|    loss                 | 5.22        |
|    n_updates            | 996         |
|    policy_gradient_loss | -0.00323    |
|    value_loss           | 21.1        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 785 

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 787          |
|    ep_rew_mean          | 55.6         |
| time/                   |              |
|    fps                  | 1270         |
|    iterations           | 260          |
|    time_elapsed         | 209          |
|    total_timesteps      | 266240       |
| train/                  |              |
|    approx_kl            | 0.0058667064 |
|    clip_fraction        | 0.0557       |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.04        |
|    explained_variance   | 0.946        |
|    learning_rate        | 0.0003       |
|    loss                 | 6.91         |
|    n_updates            | 1036         |
|    policy_gradient_loss | -0.00183     |
|    value_loss           | 21.7         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-------------------------------------------
| rollout/                |               |
|    ep_len_mean          | 813           |
|    ep_rew_mean          | 59.4          |
| time/                   |               |
|    fps                  | 1268          |
|    iterations           | 270           |
|    time_elapsed         | 217           |
|    total_timesteps      | 276480        |
| train/                  |               |
|    approx_kl            | 0.00015584624 |
|    clip_fraction        | 0             |
|    clip_range           | 0.2           |
|    entropy_loss         | -0.986        |
|    explained_variance   | 0.536         |
|    learning_rate        | 0.0003        |
|    loss                 | 309           |
|    n_updates            | 1076          |
|    policy_gradient_loss | 5.86e-05      |
|    value_loss           | 1.47e+03      |
-------------------------------------------
------------------------------------------
| rollout/                |      

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 797         |
|    ep_rew_mean          | 59.7        |
| time/                   |             |
|    fps                  | 1259        |
|    iterations           | 280         |
|    time_elapsed         | 227         |
|    total_timesteps      | 286720      |
| train/                  |             |
|    approx_kl            | 0.000842249 |
|    clip_fraction        | 0           |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.845      |
|    explained_variance   | 0.658       |
|    learning_rate        | 0.0003      |
|    loss                 | 82.9        |
|    n_updates            | 1116        |
|    policy_gradient_loss | -2.62e-05   |
|    value_loss           | 284         |
-----------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 791     

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 759         |
|    ep_rew_mean          | 59.3        |
| time/                   |             |
|    fps                  | 1254        |
|    iterations           | 290         |
|    time_elapsed         | 236         |
|    total_timesteps      | 296960      |
| train/                  |             |
|    approx_kl            | 0.006177157 |
|    clip_fraction        | 0.0154      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.84       |
|    explained_variance   | 0.635       |
|    learning_rate        | 0.0003      |
|    loss                 | 66.3        |
|    n_updates            | 1156        |
|    policy_gradient_loss | -0.00343    |
|    value_loss           | 189         |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 764 

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 760          |
|    ep_rew_mean          | 62.5         |
| time/                   |              |
|    fps                  | 1252         |
|    iterations           | 300          |
|    time_elapsed         | 245          |
|    total_timesteps      | 307200       |
| train/                  |              |
|    approx_kl            | 0.0006659835 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.832       |
|    explained_variance   | 0.379        |
|    learning_rate        | 0.0003       |
|    loss                 | 357          |
|    n_updates            | 1196         |
|    policy_gradient_loss | -0.000158    |
|    value_loss           | 550          |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 761          |
|    ep_rew_mean          | 69           |
| time/                   |              |
|    fps                  | 1247         |
|    iterations           | 310          |
|    time_elapsed         | 254          |
|    total_timesteps      | 317440       |
| train/                  |              |
|    approx_kl            | 0.0073845526 |
|    clip_fraction        | 0.0754       |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.05        |
|    explained_variance   | 0.982        |
|    learning_rate        | 0.0003       |
|    loss                 | 2            |
|    n_updates            | 1236         |
|    policy_gradient_loss | -0.00112     |
|    value_loss           | 8.14         |
------------------------------------------
-------------------------------------------
| rollout/                |               |
|    ep_l

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 768         |
|    ep_rew_mean          | 73.9        |
| time/                   |             |
|    fps                  | 1246        |
|    iterations           | 320         |
|    time_elapsed         | 262         |
|    total_timesteps      | 327680      |
| train/                  |             |
|    approx_kl            | 0.008669017 |
|    clip_fraction        | 0.0183      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.06       |
|    explained_variance   | 0.936       |
|    learning_rate        | 0.0003      |
|    loss                 | 4.57        |
|    n_updates            | 1276        |
|    policy_gradient_loss | -0.00203    |
|    value_loss           | 14.7        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 772   

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 768          |
|    ep_rew_mean          | 63.4         |
| time/                   |              |
|    fps                  | 1241         |
|    iterations           | 330          |
|    time_elapsed         | 272          |
|    total_timesteps      | 337920       |
| train/                  |              |
|    approx_kl            | 0.0009595796 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.978       |
|    explained_variance   | 0.948        |
|    learning_rate        | 0.0003       |
|    loss                 | 2.76         |
|    n_updates            | 1316         |
|    policy_gradient_loss | -0.00146     |
|    value_loss           | 21.5         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 773         |
|    ep_rew_mean          | 62.6        |
| time/                   |             |
|    fps                  | 1239        |
|    iterations           | 340         |
|    time_elapsed         | 280         |
|    total_timesteps      | 348160      |
| train/                  |             |
|    approx_kl            | 0.022587642 |
|    clip_fraction        | 0.0864      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.907      |
|    explained_variance   | 0.986       |
|    learning_rate        | 0.0003      |
|    loss                 | 1.02        |
|    n_updates            | 1356        |
|    policy_gradient_loss | -0.00306    |
|    value_loss           | 4.66        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 768 

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 788          |
|    ep_rew_mean          | 67.5         |
| time/                   |              |
|    fps                  | 1236         |
|    iterations           | 350          |
|    time_elapsed         | 289          |
|    total_timesteps      | 358400       |
| train/                  |              |
|    approx_kl            | 0.0037637306 |
|    clip_fraction        | 0.0107       |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.01        |
|    explained_variance   | 0.908        |
|    learning_rate        | 0.0003       |
|    loss                 | 10.7         |
|    n_updates            | 1396         |
|    policy_gradient_loss | -0.00273     |
|    value_loss           | 16.8         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 813          |
|    ep_rew_mean          | 72           |
| time/                   |              |
|    fps                  | 1233         |
|    iterations           | 360          |
|    time_elapsed         | 298          |
|    total_timesteps      | 368640       |
| train/                  |              |
|    approx_kl            | 0.0058238697 |
|    clip_fraction        | 0.00879      |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.911       |
|    explained_variance   | 0.876        |
|    learning_rate        | 0.0003       |
|    loss                 | 33.8         |
|    n_updates            | 1436         |
|    policy_gradient_loss | -0.00366     |
|    value_loss           | 33.4         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 827         |
|    ep_rew_mean          | 72.1        |
| time/                   |             |
|    fps                  | 1233        |
|    iterations           | 370         |
|    time_elapsed         | 307         |
|    total_timesteps      | 378880      |
| train/                  |             |
|    approx_kl            | 0.005231522 |
|    clip_fraction        | 0.0381      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.953      |
|    explained_variance   | 0.959       |
|    learning_rate        | 0.0003      |
|    loss                 | 2.78        |
|    n_updates            | 1476        |
|    policy_gradient_loss | -0.00189    |
|    value_loss           | 13.6        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 832 

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 847          |
|    ep_rew_mean          | 76.9         |
| time/                   |              |
|    fps                  | 1230         |
|    iterations           | 380          |
|    time_elapsed         | 316          |
|    total_timesteps      | 389120       |
| train/                  |              |
|    approx_kl            | 0.0032152778 |
|    clip_fraction        | 0.0061       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.807       |
|    explained_variance   | 0.924        |
|    learning_rate        | 0.0003       |
|    loss                 | 7.29         |
|    n_updates            | 1516         |
|    policy_gradient_loss | -0.000683    |
|    value_loss           | 39.9         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 846         |
|    ep_rew_mean          | 81.3        |
| time/                   |             |
|    fps                  | 1229        |
|    iterations           | 390         |
|    time_elapsed         | 324         |
|    total_timesteps      | 399360      |
| train/                  |             |
|    approx_kl            | 0.015824784 |
|    clip_fraction        | 0.0623      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.942      |
|    explained_variance   | 0.85        |
|    learning_rate        | 0.0003      |
|    loss                 | 10.1        |
|    n_updates            | 1556        |
|    policy_gradient_loss | -0.00327    |
|    value_loss           | 31.1        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 846   

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 852          |
|    ep_rew_mean          | 84.1         |
| time/                   |              |
|    fps                  | 1226         |
|    iterations           | 400          |
|    time_elapsed         | 334          |
|    total_timesteps      | 409600       |
| train/                  |              |
|    approx_kl            | 0.0044319015 |
|    clip_fraction        | 0.0132       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.916       |
|    explained_variance   | 0.951        |
|    learning_rate        | 0.0003       |
|    loss                 | 13.8         |
|    n_updates            | 1596         |
|    policy_gradient_loss | 8.87e-06     |
|    value_loss           | 26           |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 843         |
|    ep_rew_mean          | 91.1        |
| time/                   |             |
|    fps                  | 1225        |
|    iterations           | 410         |
|    time_elapsed         | 342         |
|    total_timesteps      | 419840      |
| train/                  |             |
|    approx_kl            | 0.011853765 |
|    clip_fraction        | 0.111       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.905      |
|    explained_variance   | 0.935       |
|    learning_rate        | 0.0003      |
|    loss                 | 6.2         |
|    n_updates            | 1636        |
|    policy_gradient_loss | -0.00584    |
|    value_loss           | 12.5        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 856 

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 844         |
|    ep_rew_mean          | 97.4        |
| time/                   |             |
|    fps                  | 1223        |
|    iterations           | 420         |
|    time_elapsed         | 351         |
|    total_timesteps      | 430080      |
| train/                  |             |
|    approx_kl            | 0.010334045 |
|    clip_fraction        | 0.0859      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.968      |
|    explained_variance   | 0.923       |
|    learning_rate        | 0.0003      |
|    loss                 | 43.6        |
|    n_updates            | 1676        |
|    policy_gradient_loss | -0.00472    |
|    value_loss           | 82.9        |
-----------------------------------------
-------------------------------------------
| rollout/                |               |
|    ep_len_mean          | 84

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 839         |
|    ep_rew_mean          | 98.3        |
| time/                   |             |
|    fps                  | 1222        |
|    iterations           | 430         |
|    time_elapsed         | 360         |
|    total_timesteps      | 440320      |
| train/                  |             |
|    approx_kl            | 0.009978552 |
|    clip_fraction        | 0.0703      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.09       |
|    explained_variance   | 0.979       |
|    learning_rate        | 0.0003      |
|    loss                 | 3.84        |
|    n_updates            | 1716        |
|    policy_gradient_loss | -0.00698    |
|    value_loss           | 12.6        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 839 

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 815          |
|    ep_rew_mean          | 97.3         |
| time/                   |              |
|    fps                  | 1220         |
|    iterations           | 440          |
|    time_elapsed         | 369          |
|    total_timesteps      | 450560       |
| train/                  |              |
|    approx_kl            | 0.0040140217 |
|    clip_fraction        | 0.00806      |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.954       |
|    explained_variance   | 0.933        |
|    learning_rate        | 0.0003       |
|    loss                 | 7.97         |
|    n_updates            | 1756         |
|    policy_gradient_loss | -0.00198     |
|    value_loss           | 34.4         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 801          |
|    ep_rew_mean          | 103          |
| time/                   |              |
|    fps                  | 1219         |
|    iterations           | 450          |
|    time_elapsed         | 377          |
|    total_timesteps      | 460800       |
| train/                  |              |
|    approx_kl            | 4.709745e-05 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.01        |
|    explained_variance   | 0.669        |
|    learning_rate        | 0.0003       |
|    loss                 | 277          |
|    n_updates            | 1796         |
|    policy_gradient_loss | -0.000109    |
|    value_loss           | 732          |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 794          |
|    ep_rew_mean          | 101          |
| time/                   |              |
|    fps                  | 1217         |
|    iterations           | 460          |
|    time_elapsed         | 386          |
|    total_timesteps      | 471040       |
| train/                  |              |
|    approx_kl            | 0.0051938845 |
|    clip_fraction        | 0.0205       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.984       |
|    explained_variance   | 0.979        |
|    learning_rate        | 0.0003       |
|    loss                 | 3.7          |
|    n_updates            | 1836         |
|    policy_gradient_loss | -0.00171     |
|    value_loss           | 12.8         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 818          |
|    ep_rew_mean          | 105          |
| time/                   |              |
|    fps                  | 1215         |
|    iterations           | 470          |
|    time_elapsed         | 396          |
|    total_timesteps      | 481280       |
| train/                  |              |
|    approx_kl            | 0.0099761095 |
|    clip_fraction        | 0.0503       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.984       |
|    explained_variance   | 0.99         |
|    learning_rate        | 0.0003       |
|    loss                 | 3.24         |
|    n_updates            | 1876         |
|    policy_gradient_loss | -0.00306     |
|    value_loss           | 6.65         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 817          |
|    ep_rew_mean          | 105          |
| time/                   |              |
|    fps                  | 1215         |
|    iterations           | 480          |
|    time_elapsed         | 404          |
|    total_timesteps      | 491520       |
| train/                  |              |
|    approx_kl            | 0.0035414656 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.973       |
|    explained_variance   | 0.828        |
|    learning_rate        | 0.0003       |
|    loss                 | 91.8         |
|    n_updates            | 1916         |
|    policy_gradient_loss | -0.00131     |
|    value_loss           | 270          |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 823          |
|    ep_rew_mean          | 102          |
| time/                   |              |
|    fps                  | 1213         |
|    iterations           | 490          |
|    time_elapsed         | 413          |
|    total_timesteps      | 501760       |
| train/                  |              |
|    approx_kl            | 0.0006508348 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.983       |
|    explained_variance   | 0.587        |
|    learning_rate        | 0.0003       |
|    loss                 | 249          |
|    n_updates            | 1956         |
|    policy_gradient_loss | 0.000868     |
|    value_loss           | 520          |
------------------------------------------
-------------------------------------------
| rollout/                |               |
|    ep_l

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 837          |
|    ep_rew_mean          | 102          |
| time/                   |              |
|    fps                  | 1213         |
|    iterations           | 500          |
|    time_elapsed         | 421          |
|    total_timesteps      | 512000       |
| train/                  |              |
|    approx_kl            | 0.0009774535 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.848       |
|    explained_variance   | 0.583        |
|    learning_rate        | 0.0003       |
|    loss                 | 758          |
|    n_updates            | 1996         |
|    policy_gradient_loss | 0.000688     |
|    value_loss           | 571          |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 851          |
|    ep_rew_mean          | 107          |
| time/                   |              |
|    fps                  | 1211         |
|    iterations           | 510          |
|    time_elapsed         | 431          |
|    total_timesteps      | 522240       |
| train/                  |              |
|    approx_kl            | 0.0069079413 |
|    clip_fraction        | 0.0312       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.826       |
|    explained_variance   | 0.973        |
|    learning_rate        | 0.0003       |
|    loss                 | 1.86         |
|    n_updates            | 2036         |
|    policy_gradient_loss | -0.00352     |
|    value_loss           | 13.5         |
------------------------------------------
-------------------------------------------
| rollout/                |               |
|    ep_l

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 867          |
|    ep_rew_mean          | 115          |
| time/                   |              |
|    fps                  | 1211         |
|    iterations           | 520          |
|    time_elapsed         | 439          |
|    total_timesteps      | 532480       |
| train/                  |              |
|    approx_kl            | 0.0040782522 |
|    clip_fraction        | 0.0347       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.734       |
|    explained_variance   | 0.971        |
|    learning_rate        | 0.0003       |
|    loss                 | 2.45         |
|    n_updates            | 2076         |
|    policy_gradient_loss | -0.00121     |
|    value_loss           | 10.1         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 859          |
|    ep_rew_mean          | 130          |
| time/                   |              |
|    fps                  | 1209         |
|    iterations           | 530          |
|    time_elapsed         | 448          |
|    total_timesteps      | 542720       |
| train/                  |              |
|    approx_kl            | 0.0015195265 |
|    clip_fraction        | 0.00708      |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.744       |
|    explained_variance   | 0.801        |
|    learning_rate        | 0.0003       |
|    loss                 | 40.2         |
|    n_updates            | 2116         |
|    policy_gradient_loss | -0.00194     |
|    value_loss           | 73.5         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 781          |
|    ep_rew_mean          | 158          |
| time/                   |              |
|    fps                  | 1209         |
|    iterations           | 540          |
|    time_elapsed         | 457          |
|    total_timesteps      | 552960       |
| train/                  |              |
|    approx_kl            | 0.0041205226 |
|    clip_fraction        | 0.0496       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.797       |
|    explained_variance   | 0.888        |
|    learning_rate        | 0.0003       |
|    loss                 | 7.74         |
|    n_updates            | 2156         |
|    policy_gradient_loss | -0.00322     |
|    value_loss           | 24           |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 685         |
|    ep_rew_mean          | 179         |
| time/                   |             |
|    fps                  | 1208        |
|    iterations           | 550         |
|    time_elapsed         | 465         |
|    total_timesteps      | 563200      |
| train/                  |             |
|    approx_kl            | 0.005044096 |
|    clip_fraction        | 0.0359      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.807      |
|    explained_variance   | 0.919       |
|    learning_rate        | 0.0003      |
|    loss                 | 7           |
|    n_updates            | 2196        |
|    policy_gradient_loss | -0.00289    |
|    value_loss           | 18.5        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 673 

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 631         |
|    ep_rew_mean          | 199         |
| time/                   |             |
|    fps                  | 1202        |
|    iterations           | 560         |
|    time_elapsed         | 476         |
|    total_timesteps      | 573440      |
| train/                  |             |
|    approx_kl            | 0.003387144 |
|    clip_fraction        | 0.0181      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.824      |
|    explained_variance   | 0.933       |
|    learning_rate        | 0.0003      |
|    loss                 | 8.38        |
|    n_updates            | 2236        |
|    policy_gradient_loss | -0.00114    |
|    value_loss           | 30.3        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 627   

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 545         |
|    ep_rew_mean          | 220         |
| time/                   |             |
|    fps                  | 1189        |
|    iterations           | 570         |
|    time_elapsed         | 490         |
|    total_timesteps      | 583680      |
| train/                  |             |
|    approx_kl            | 0.005654469 |
|    clip_fraction        | 0.0305      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.833      |
|    explained_variance   | 0.92        |
|    learning_rate        | 0.0003      |
|    loss                 | 11.4        |
|    n_updates            | 2276        |
|    policy_gradient_loss | -0.00174    |
|    value_loss           | 25.9        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 542 

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 509         |
|    ep_rew_mean          | 226         |
| time/                   |             |
|    fps                  | 1183        |
|    iterations           | 580         |
|    time_elapsed         | 501         |
|    total_timesteps      | 593920      |
| train/                  |             |
|    approx_kl            | 0.004308505 |
|    clip_fraction        | 0.0483      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.87       |
|    explained_variance   | 0.943       |
|    learning_rate        | 0.0003      |
|    loss                 | 8.71        |
|    n_updates            | 2316        |
|    policy_gradient_loss | -0.00436    |
|    value_loss           | 21          |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 511 

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 499          |
|    ep_rew_mean          | 231          |
| time/                   |              |
|    fps                  | 1175         |
|    iterations           | 590          |
|    time_elapsed         | 514          |
|    total_timesteps      | 604160       |
| train/                  |              |
|    approx_kl            | 0.0009105522 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.663       |
|    explained_variance   | 0.412        |
|    learning_rate        | 0.0003       |
|    loss                 | 445          |
|    n_updates            | 2356         |
|    policy_gradient_loss | -0.0013      |
|    value_loss           | 1.12e+03     |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 490          |
|    ep_rew_mean          | 225          |
| time/                   |              |
|    fps                  | 1168         |
|    iterations           | 600          |
|    time_elapsed         | 525          |
|    total_timesteps      | 614400       |
| train/                  |              |
|    approx_kl            | 0.0002679294 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.663       |
|    explained_variance   | 0.326        |
|    learning_rate        | 0.0003       |
|    loss                 | 138          |
|    n_updates            | 2396         |
|    policy_gradient_loss | -0.00171     |
|    value_loss           | 882          |
------------------------------------------
-------------------------------------------
| rollout/                |               |
|    ep_l

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 455          |
|    ep_rew_mean          | 216          |
| time/                   |              |
|    fps                  | 1166         |
|    iterations           | 610          |
|    time_elapsed         | 535          |
|    total_timesteps      | 624640       |
| train/                  |              |
|    approx_kl            | 0.0003392371 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.829       |
|    explained_variance   | 0.439        |
|    learning_rate        | 0.0003       |
|    loss                 | 60.6         |
|    n_updates            | 2436         |
|    policy_gradient_loss | 0.000223     |
|    value_loss           | 831          |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 424          |
|    ep_rew_mean          | 213          |
| time/                   |              |
|    fps                  | 1162         |
|    iterations           | 620          |
|    time_elapsed         | 546          |
|    total_timesteps      | 634880       |
| train/                  |              |
|    approx_kl            | 0.0019429123 |
|    clip_fraction        | 0.00122      |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.831       |
|    explained_variance   | 0.424        |
|    learning_rate        | 0.0003       |
|    loss                 | 79.1         |
|    n_updates            | 2476         |
|    policy_gradient_loss | -0.00156     |
|    value_loss           | 982          |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 417         |
|    ep_rew_mean          | 213         |
| time/                   |             |
|    fps                  | 1159        |
|    iterations           | 630         |
|    time_elapsed         | 556         |
|    total_timesteps      | 645120      |
| train/                  |             |
|    approx_kl            | 0.009043449 |
|    clip_fraction        | 0.0842      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.811      |
|    explained_variance   | 0.964       |
|    learning_rate        | 0.0003      |
|    loss                 | 5.76        |
|    n_updates            | 2516        |
|    policy_gradient_loss | -0.00609    |
|    value_loss           | 13.4        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 423 

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 431         |
|    ep_rew_mean          | 214         |
| time/                   |             |
|    fps                  | 1157        |
|    iterations           | 640         |
|    time_elapsed         | 565         |
|    total_timesteps      | 655360      |
| train/                  |             |
|    approx_kl            | 0.004931789 |
|    clip_fraction        | 0.0244      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.747      |
|    explained_variance   | 0.956       |
|    learning_rate        | 0.0003      |
|    loss                 | 8.1         |
|    n_updates            | 2556        |
|    policy_gradient_loss | -0.000531   |
|    value_loss           | 17.5        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 428 

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 443          |
|    ep_rew_mean          | 222          |
| time/                   |              |
|    fps                  | 1155         |
|    iterations           | 650          |
|    time_elapsed         | 576          |
|    total_timesteps      | 665600       |
| train/                  |              |
|    approx_kl            | 9.177672e-05 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.839       |
|    explained_variance   | 0.483        |
|    learning_rate        | 0.0003       |
|    loss                 | 103          |
|    n_updates            | 2596         |
|    policy_gradient_loss | -0.000526    |
|    value_loss           | 751          |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 473          |
|    ep_rew_mean          | 219          |
| time/                   |              |
|    fps                  | 1153         |
|    iterations           | 660          |
|    time_elapsed         | 586          |
|    total_timesteps      | 675840       |
| train/                  |              |
|    approx_kl            | 0.0059345826 |
|    clip_fraction        | 0.0647       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.785       |
|    explained_variance   | 0.923        |
|    learning_rate        | 0.0003       |
|    loss                 | 7.08         |
|    n_updates            | 2636         |
|    policy_gradient_loss | -0.00308     |
|    value_loss           | 24.9         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 498          |
|    ep_rew_mean          | 216          |
| time/                   |              |
|    fps                  | 1151         |
|    iterations           | 670          |
|    time_elapsed         | 595          |
|    total_timesteps      | 686080       |
| train/                  |              |
|    approx_kl            | 0.0024625608 |
|    clip_fraction        | 0.0139       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.825       |
|    explained_variance   | 0.876        |
|    learning_rate        | 0.0003       |
|    loss                 | 6.3          |
|    n_updates            | 2676         |
|    policy_gradient_loss | -0.00363     |
|    value_loss           | 15.3         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 531         |
|    ep_rew_mean          | 215         |
| time/                   |             |
|    fps                  | 1149        |
|    iterations           | 680         |
|    time_elapsed         | 605         |
|    total_timesteps      | 696320      |
| train/                  |             |
|    approx_kl            | 0.008399909 |
|    clip_fraction        | 0.0347      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.863      |
|    explained_variance   | 0.941       |
|    learning_rate        | 0.0003      |
|    loss                 | 4.57        |
|    n_updates            | 2716        |
|    policy_gradient_loss | -0.00589    |
|    value_loss           | 11.7        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 536   

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 566         |
|    ep_rew_mean          | 208         |
| time/                   |             |
|    fps                  | 1149        |
|    iterations           | 690         |
|    time_elapsed         | 614         |
|    total_timesteps      | 706560      |
| train/                  |             |
|    approx_kl            | 0.004333582 |
|    clip_fraction        | 0.021       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.741      |
|    explained_variance   | 0.924       |
|    learning_rate        | 0.0003      |
|    loss                 | 13.5        |
|    n_updates            | 2756        |
|    policy_gradient_loss | -0.00363    |
|    value_loss           | 32.8        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 571 

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 576        |
|    ep_rew_mean          | 205        |
| time/                   |            |
|    fps                  | 1147       |
|    iterations           | 700        |
|    time_elapsed         | 624        |
|    total_timesteps      | 716800     |
| train/                  |            |
|    approx_kl            | 0.00468608 |
|    clip_fraction        | 0.012      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.75      |
|    explained_variance   | 0.907      |
|    learning_rate        | 0.0003     |
|    loss                 | 3.35       |
|    n_updates            | 2796       |
|    policy_gradient_loss | 0.00149    |
|    value_loss           | 23.4       |
----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 578          |
|    ep_re

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 587         |
|    ep_rew_mean          | 204         |
| time/                   |             |
|    fps                  | 1143        |
|    iterations           | 710         |
|    time_elapsed         | 635         |
|    total_timesteps      | 727040      |
| train/                  |             |
|    approx_kl            | 0.009984198 |
|    clip_fraction        | 0.0332      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.714      |
|    explained_variance   | 0.964       |
|    learning_rate        | 0.0003      |
|    loss                 | 2.46        |
|    n_updates            | 2836        |
|    policy_gradient_loss | -0.0038     |
|    value_loss           | 6.69        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 587   

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 552          |
|    ep_rew_mean          | 209          |
| time/                   |              |
|    fps                  | 1143         |
|    iterations           | 720          |
|    time_elapsed         | 644          |
|    total_timesteps      | 737280       |
| train/                  |              |
|    approx_kl            | 0.0019133692 |
|    clip_fraction        | 0.000244     |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.831       |
|    explained_variance   | 0.457        |
|    learning_rate        | 0.0003       |
|    loss                 | 283          |
|    n_updates            | 2876         |
|    policy_gradient_loss | -0.00174     |
|    value_loss           | 908          |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 546          |
|    ep_rew_mean          | 204          |
| time/                   |              |
|    fps                  | 1141         |
|    iterations           | 730          |
|    time_elapsed         | 654          |
|    total_timesteps      | 747520       |
| train/                  |              |
|    approx_kl            | 0.0055747246 |
|    clip_fraction        | 0.0352       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.825       |
|    explained_variance   | 0.95         |
|    learning_rate        | 0.0003       |
|    loss                 | 3.72         |
|    n_updates            | 2916         |
|    policy_gradient_loss | -0.00032     |
|    value_loss           | 10           |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-------------------------------------------
| rollout/                |               |
|    ep_len_mean          | 501           |
|    ep_rew_mean          | 208           |
| time/                   |               |
|    fps                  | 1141          |
|    iterations           | 740           |
|    time_elapsed         | 663           |
|    total_timesteps      | 757760        |
| train/                  |               |
|    approx_kl            | 0.00030402694 |
|    clip_fraction        | 0             |
|    clip_range           | 0.2           |
|    entropy_loss         | -0.808        |
|    explained_variance   | 0.431         |
|    learning_rate        | 0.0003        |
|    loss                 | 146           |
|    n_updates            | 2956          |
|    policy_gradient_loss | 0.000281      |
|    value_loss           | 1.05e+03      |
-------------------------------------------
------------------------------------------
| rollout/                |      

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 473          |
|    ep_rew_mean          | 204          |
| time/                   |              |
|    fps                  | 1139         |
|    iterations           | 750          |
|    time_elapsed         | 673          |
|    total_timesteps      | 768000       |
| train/                  |              |
|    approx_kl            | 0.0014997281 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.732       |
|    explained_variance   | 0.654        |
|    learning_rate        | 0.0003       |
|    loss                 | 204          |
|    n_updates            | 2996         |
|    policy_gradient_loss | -0.000466    |
|    value_loss           | 825          |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 461        |
|    ep_rew_mean          | 206        |
| time/                   |            |
|    fps                  | 1138       |
|    iterations           | 760        |
|    time_elapsed         | 683        |
|    total_timesteps      | 778240     |
| train/                  |            |
|    approx_kl            | 0.00875631 |
|    clip_fraction        | 0.0532     |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.684     |
|    explained_variance   | 0.909      |
|    learning_rate        | 0.0003     |
|    loss                 | 8.26       |
|    n_updates            | 3036       |
|    policy_gradient_loss | -0.00489   |
|    value_loss           | 45.7       |
----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 459          |
|    ep_re

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 436         |
|    ep_rew_mean          | 212         |
| time/                   |             |
|    fps                  | 1137        |
|    iterations           | 770         |
|    time_elapsed         | 692         |
|    total_timesteps      | 788480      |
| train/                  |             |
|    approx_kl            | 0.002201971 |
|    clip_fraction        | 0.00317     |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.749      |
|    explained_variance   | 0.945       |
|    learning_rate        | 0.0003      |
|    loss                 | 8.63        |
|    n_updates            | 3076        |
|    policy_gradient_loss | -0.000835   |
|    value_loss           | 37.7        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 435 

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 419         |
|    ep_rew_mean          | 221         |
| time/                   |             |
|    fps                  | 1136        |
|    iterations           | 780         |
|    time_elapsed         | 702         |
|    total_timesteps      | 798720      |
| train/                  |             |
|    approx_kl            | 0.003576046 |
|    clip_fraction        | 0.0151      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.732      |
|    explained_variance   | 0.988       |
|    learning_rate        | 0.0003      |
|    loss                 | 3.64        |
|    n_updates            | 3116        |
|    policy_gradient_loss | -0.00119    |
|    value_loss           | 7.73        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 414 

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 424          |
|    ep_rew_mean          | 230          |
| time/                   |              |
|    fps                  | 1135         |
|    iterations           | 790          |
|    time_elapsed         | 712          |
|    total_timesteps      | 808960       |
| train/                  |              |
|    approx_kl            | 0.0030333463 |
|    clip_fraction        | 0.0347       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.689       |
|    explained_variance   | 0.989        |
|    learning_rate        | 0.0003       |
|    loss                 | 4.39         |
|    n_updates            | 3156         |
|    policy_gradient_loss | -4.13e-05    |
|    value_loss           | 7.66         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 404          |
|    ep_rew_mean          | 221          |
| time/                   |              |
|    fps                  | 1134         |
|    iterations           | 800          |
|    time_elapsed         | 722          |
|    total_timesteps      | 819200       |
| train/                  |              |
|    approx_kl            | 0.0005472242 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.767       |
|    explained_variance   | 0.727        |
|    learning_rate        | 0.0003       |
|    loss                 | 27.5         |
|    n_updates            | 3196         |
|    policy_gradient_loss | -0.00116     |
|    value_loss           | 162          |
------------------------------------------
-------------------------------------------
| rollout/                |               |
|    ep_l

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 392          |
|    ep_rew_mean          | 217          |
| time/                   |              |
|    fps                  | 1130         |
|    iterations           | 810          |
|    time_elapsed         | 733          |
|    total_timesteps      | 829440       |
| train/                  |              |
|    approx_kl            | 0.0047514504 |
|    clip_fraction        | 0.0234       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.651       |
|    explained_variance   | 0.96         |
|    learning_rate        | 0.0003       |
|    loss                 | 5.27         |
|    n_updates            | 3236         |
|    policy_gradient_loss | 0.0005       |
|    value_loss           | 31.1         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 391          |
|    ep_rew_mean          | 216          |
| time/                   |              |
|    fps                  | 1130         |
|    iterations           | 820          |
|    time_elapsed         | 743          |
|    total_timesteps      | 839680       |
| train/                  |              |
|    approx_kl            | 0.0024634793 |
|    clip_fraction        | 0.0227       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.711       |
|    explained_variance   | 0.973        |
|    learning_rate        | 0.0003       |
|    loss                 | 10.7         |
|    n_updates            | 3276         |
|    policy_gradient_loss | -0.000225    |
|    value_loss           | 20.5         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 396         |
|    ep_rew_mean          | 213         |
| time/                   |             |
|    fps                  | 1125        |
|    iterations           | 830         |
|    time_elapsed         | 755         |
|    total_timesteps      | 849920      |
| train/                  |             |
|    approx_kl            | 0.003983734 |
|    clip_fraction        | 0.0132      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.653      |
|    explained_variance   | 0.97        |
|    learning_rate        | 0.0003      |
|    loss                 | 7.41        |
|    n_updates            | 3316        |
|    policy_gradient_loss | -0.00174    |
|    value_loss           | 18.1        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 402   

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 406         |
|    ep_rew_mean          | 222         |
| time/                   |             |
|    fps                  | 1124        |
|    iterations           | 840         |
|    time_elapsed         | 765         |
|    total_timesteps      | 860160      |
| train/                  |             |
|    approx_kl            | 0.006030214 |
|    clip_fraction        | 0.0266      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.734      |
|    explained_variance   | 0.973       |
|    learning_rate        | 0.0003      |
|    loss                 | 3.01        |
|    n_updates            | 3356        |
|    policy_gradient_loss | -0.00197    |
|    value_loss           | 9.62        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 407   

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 426         |
|    ep_rew_mean          | 227         |
| time/                   |             |
|    fps                  | 1121        |
|    iterations           | 850         |
|    time_elapsed         | 775         |
|    total_timesteps      | 870400      |
| train/                  |             |
|    approx_kl            | 0.056064557 |
|    clip_fraction        | 0.128       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.571      |
|    explained_variance   | 0.983       |
|    learning_rate        | 0.0003      |
|    loss                 | 6.33        |
|    n_updates            | 3396        |
|    policy_gradient_loss | -0.00864    |
|    value_loss           | 16.3        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 434 

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 424          |
|    ep_rew_mean          | 230          |
| time/                   |              |
|    fps                  | 1119         |
|    iterations           | 860          |
|    time_elapsed         | 786          |
|    total_timesteps      | 880640       |
| train/                  |              |
|    approx_kl            | 0.0028809006 |
|    clip_fraction        | 0.0283       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.786       |
|    explained_variance   | 0.987        |
|    learning_rate        | 0.0003       |
|    loss                 | 3.15         |
|    n_updates            | 3436         |
|    policy_gradient_loss | -0.00168     |
|    value_loss           | 6.42         |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 402          |
|    ep_rew_mean          | 235          |
| time/                   |              |
|    fps                  | 1116         |
|    iterations           | 870          |
|    time_elapsed         | 797          |
|    total_timesteps      | 890880       |
| train/                  |              |
|    approx_kl            | 0.0044847475 |
|    clip_fraction        | 0.0352       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.795       |
|    explained_variance   | 0.989        |
|    learning_rate        | 0.0003       |
|    loss                 | 7.64         |
|    n_updates            | 3476         |
|    policy_gradient_loss | -0.0021      |
|    value_loss           | 10.3         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 397          |
|    ep_rew_mean          | 235          |
| time/                   |              |
|    fps                  | 1117         |
|    iterations           | 880          |
|    time_elapsed         | 806          |
|    total_timesteps      | 901120       |
| train/                  |              |
|    approx_kl            | 0.0019274353 |
|    clip_fraction        | 0.0022       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.765       |
|    explained_variance   | 0.617        |
|    learning_rate        | 0.0003       |
|    loss                 | 157          |
|    n_updates            | 3516         |
|    policy_gradient_loss | 0.00125      |
|    value_loss           | 655          |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 367          |
|    ep_rew_mean          | 233          |
| time/                   |              |
|    fps                  | 1115         |
|    iterations           | 890          |
|    time_elapsed         | 816          |
|    total_timesteps      | 911360       |
| train/                  |              |
|    approx_kl            | 0.0017304963 |
|    clip_fraction        | 0.00415      |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.693       |
|    explained_variance   | 0.543        |
|    learning_rate        | 0.0003       |
|    loss                 | 455          |
|    n_updates            | 3556         |
|    policy_gradient_loss | 0.000161     |
|    value_loss           | 1.03e+03     |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 369         |
|    ep_rew_mean          | 229         |
| time/                   |             |
|    fps                  | 1114        |
|    iterations           | 900         |
|    time_elapsed         | 827         |
|    total_timesteps      | 921600      |
| train/                  |             |
|    approx_kl            | 0.006193763 |
|    clip_fraction        | 0.00806     |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.702      |
|    explained_variance   | 0.841       |
|    learning_rate        | 0.0003      |
|    loss                 | 3.47        |
|    n_updates            | 3596        |
|    policy_gradient_loss | -0.0014     |
|    value_loss           | 29.7        |
-----------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 373 

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 365          |
|    ep_rew_mean          | 226          |
| time/                   |              |
|    fps                  | 1112         |
|    iterations           | 910          |
|    time_elapsed         | 837          |
|    total_timesteps      | 931840       |
| train/                  |              |
|    approx_kl            | 0.0003029345 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.73        |
|    explained_variance   | 0.665        |
|    learning_rate        | 0.0003       |
|    loss                 | 500          |
|    n_updates            | 3636         |
|    policy_gradient_loss | -0.00126     |
|    value_loss           | 1.22e+03     |
------------------------------------------
----------------------------------------
| rollout/                |            |
|    ep_len_mea

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 357          |
|    ep_rew_mean          | 225          |
| time/                   |              |
|    fps                  | 1110         |
|    iterations           | 920          |
|    time_elapsed         | 848          |
|    total_timesteps      | 942080       |
| train/                  |              |
|    approx_kl            | 0.0026747247 |
|    clip_fraction        | 0.0134       |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.659       |
|    explained_variance   | 0.975        |
|    learning_rate        | 0.0003       |
|    loss                 | 5.83         |
|    n_updates            | 3676         |
|    policy_gradient_loss | 0.000797     |
|    value_loss           | 22.8         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 352          |
|    ep_rew_mean          | 221          |
| time/                   |              |
|    fps                  | 1109         |
|    iterations           | 930          |
|    time_elapsed         | 858          |
|    total_timesteps      | 952320       |
| train/                  |              |
|    approx_kl            | 0.0017186718 |
|    clip_fraction        | 0.00415      |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.777       |
|    explained_variance   | 0.609        |
|    learning_rate        | 0.0003       |
|    loss                 | 35.9         |
|    n_updates            | 3716         |
|    policy_gradient_loss | -0.00349     |
|    value_loss           | 774          |
------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 353         |
|    ep_rew_mean          | 220         |
| time/                   |             |
|    fps                  | 1108        |
|    iterations           | 940         |
|    time_elapsed         | 868         |
|    total_timesteps      | 962560      |
| train/                  |             |
|    approx_kl            | 0.004992444 |
|    clip_fraction        | 0.0503      |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.683      |
|    explained_variance   | 0.978       |
|    learning_rate        | 0.0003      |
|    loss                 | 3.86        |
|    n_updates            | 3756        |
|    policy_gradient_loss | 0.000723    |
|    value_loss           | 20.1        |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 353   

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 363          |
|    ep_rew_mean          | 229          |
| time/                   |              |
|    fps                  | 1106         |
|    iterations           | 950          |
|    time_elapsed         | 878          |
|    total_timesteps      | 972800       |
| train/                  |              |
|    approx_kl            | 0.0011070948 |
|    clip_fraction        | 0.000977     |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.753       |
|    explained_variance   | 0.543        |
|    learning_rate        | 0.0003       |
|    loss                 | 183          |
|    n_updates            | 3796         |
|    policy_gradient_loss | -0.00119     |
|    value_loss           | 968          |
------------------------------------------
-------------------------------------------
| rollout/                |               |
|    ep_l

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 373          |
|    ep_rew_mean          | 227          |
| time/                   |              |
|    fps                  | 1107         |
|    iterations           | 960          |
|    time_elapsed         | 887          |
|    total_timesteps      | 983040       |
| train/                  |              |
|    approx_kl            | 0.0012780866 |
|    clip_fraction        | 0.000977     |
|    clip_range           | 0.2          |
|    entropy_loss         | -0.66        |
|    explained_variance   | 0.904        |
|    learning_rate        | 0.0003       |
|    loss                 | 22.3         |
|    n_updates            | 3836         |
|    policy_gradient_loss | -0.000991    |
|    value_loss           | 83.5         |
------------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_m

-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 356         |
|    ep_rew_mean          | 218         |
| time/                   |             |
|    fps                  | 1106        |
|    iterations           | 970         |
|    time_elapsed         | 897         |
|    total_timesteps      | 993280      |
| train/                  |             |
|    approx_kl            | 0.001355611 |
|    clip_fraction        | 0           |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.691      |
|    explained_variance   | 0.8         |
|    learning_rate        | 0.0003      |
|    loss                 | 29.6        |
|    n_updates            | 3876        |
|    policy_gradient_loss | -0.000673   |
|    value_loss           | 146         |
-----------------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 359   

<stable_baselines3.ppo.ppo.PPO at 0x218d856ec10>

In [9]:
# Sparar modellen med x namn
model_name = "run3"
model.save(model_name)


In [11]:
from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
import gymnasium as gym
import numpy as np
import random
from collections import deque

# Navigera till modellen
path = "C:/Users/karim/Documents/Skola/Preskriptiv analytik/run3.zip"
model = PPO.load(path)

# Kör ny omgivning
eval_env = gym.make("LunarLander-v2", render_mode="human")
obs = eval_env.reset()

done = False
while not done:
    observation = obs if isinstance(obs, np.ndarray) else obs[0]
    action, _states = model.predict(observation, deterministic=True)
    
    obs, rewards, done, info, *additional_info = eval_env.step(action)
    eval_env.render()
# Stäng omgivningen
eval_env.close()


In [12]:
from stable_baselines3.common.env_util import make_vec_env

# Icke rendered omgivning
eval_env = make_vec_env('LunarLander-v2', n_envs=1)


try:
    # Evaluering
    mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=100, deterministic=True)
    print(f"Mean Reward: {mean_reward}, Std Reward: {std_reward}")
except Exception as e:
    print(f"An error occurred during evaluation: {e}")

Mean Reward: 256.94645629999997, Std Reward: 23.868989568029615


## Rapport om Användning av PPO-Metoden för Lunar Lander

#### Inledning
I detta projekt har vi använt Proximal Policy Optimization (PPO), en populär metod inom förstärkningsinlärning, för att träna en agent att utföra uppgifter i Lunar Lander-miljön. PPO är känd för sin effektivitet och balans mellan utforskning och exploatering.

#### Metod
PPO-metoden utnyttjar MlpPolicy (definierat i början av koden, står för Multi-Layer Perceptron) för att optimera agentens beslutsfattande process. I vårt fall är målet för agenten att framgångsrikt landa en rymdfarkost på en angiven plats.

#### Träningsresultat
Efter träningen har agenten utvärderats över 1,000,000 steg. De statistiska resultaten är följande:

Genomsnittlig belöning (Mean Reward): 257.308 <br>
Standardavvikelse för belöning (Std Reward): 21.6988

Dessa resultat indikerar en hög prestanda och pålitlighet hos agenten. En genomsnittlig belöning över 250 pekar på att agenten konsekvent kan landa rymdfarkosten på ett säkert och effektivt sätt. Standardavvikelsen, som är relativt låg, visar på att agentens prestation är jämn och förutsägbar över olika episoder.

#### Slutsats
Användningen av PPO-metoden i detta projekt har visat sig vara framgångsrik. Agenten har lärt sig att effektivt hantera de utmaningar som Lunar Lander-miljön erbjuder, vilket resulterar i höga och konsekventa belöningar. Denna framgång understryker PPO-metodens styrka i att hantera komplexa uppgifter inom förstärkningsinlärning.

#### Källor

OpenAI. (2023). Svar på fråga om bl.a. ["Proximal Policy Optimization and Multi-Layer Perceptron in Reinforcement Learning"]. ChatGPT verion 4/3.5.
<br><br>
niftymark. (u.å.). ppo-LunarLander-v2. Hugging Face. Hämtad 4 december 2023, från https://huggingface.co/niftymark/ppo-LunarLander-v2
<br><br>
Rem4rkable. (u.å.). Solving-the-LunarLanderContinuous-v2. GitHub. Hämtad 4 december 2023, från https://github.com/Rem4rkable/Solving-the-LunarLanderContinuous-v2/blob/main/RL_Project_notebook.ipynb
<br><br>
Farama Foundation. (u.å.). Lunar Lander. Gymnasium by Farama. Hämtad 4 december 2023, från https://gymnasium.farama.org/environments/box2d/lunar_lander/