# Controlling Burgers' Equation with Reinforcement Learning
This notebook will demonstrate how reinforcement learning can be used to control Burgers' equation, a nonlinear PDE. The approach uses the reinforcement learning framework [stable_baselines3](https://github.com/DLR-RM/stable-baselines3) and the differentiable PDE solver [Φ<sub>Flow</sub>](https://github.com/tum-pbs/PhiFlow). As a reinforcement learning algorithm, [PPO](https://arxiv.org/abs/1707.06347v2) was selected.

In [1]:
%load_ext autoreload
%autoreload 2
import sys; sys.path.append('../src'); sys.path.append('../PDE-Control/PhiFlow'); sys.path.append('../PDE-Control/src')
from phi.flow import *
import matplotlib.pyplot as plt
import burgers_plots as bplt
from experiment import BurgersTraining


Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.


Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.


Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.


Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.


Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.


Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.



## Reinforcement Learning Initialization

In [2]:
domain = Domain([32], box=box[0:1]) # 1d grid resolution and physical size
viscosity = 0.003 # viscosity constant for Burgers' equation
step_count = 32 # length of each trajectory
dt = 0.03 # time step size
diffusion_substeps = 1 # how many diffusion steps to perform at each solver step

n_envs = 10 # On how many environments to train in parallel, load balancing
final_reward_factor = step_count # How hard to punish the agent for not reaching the goal if that is the case
steps_per_rollout = step_count * 10 # How many steps to collect per environment between agent updates
training_timesteps = steps_per_rollout * 1000 # How long the actual training should be
n_epochs = 10 # How many epochs to perform during agent update
learning_rate = 1e-4 # Learning rate for agent updates
batch_size = 128 # Batch size for agent updates
test_path = 'forced-burgers-clash' # Path of the used test set for comparison to cfe method
test_range = range(100) # Test samples inside the dataset

To start training, we create a trainer object, which manages the environment and the agent internally. Additionally, a directory for storing models, logs, and hyperparameters is created. This way, training can be continued at any later point using the same configuration. If the model folder specified in `exp_name` already exists, the agent within is loaded. Otherwise, a new agent is created

In [3]:
trainer = BurgersTraining(
    exp_name='../networks/rl-models/ControlBurgersBench',
    domain=domain,
    viscosity=viscosity,
    step_count=step_count,
    dt=dt,
    diffusion_substeps=diffusion_substeps,
    n_envs=n_envs,
    final_reward_factor=final_reward_factor,
    steps_per_rollout=steps_per_rollout,
    n_epochs=n_epochs,
    learning_rate=learning_rate,
    batch_size=batch_size,
    test_path=test_path,
    test_range=range(100),
)

Using new running mean for reward
Creating new agent...
Using given reward rms
Using cuda device


Now we are set up to start training the agent. The next line will take quite some time to execute, so grab a coffee or take your dog for a walk or so.

`n_rollouts` denotes the length of the training

`save_freq` specifies the number of epochs after which the stored model is overwritten

In [4]:
trainer.train(n_rollouts=1000, save_freq=50)

Logging to ../networks/rl-models/ControlBurgersBench/tensorboard-log/PPO_1
Storing agent and hyperparameters to disk...



Default upsampling behavior when mode=linear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.



Forces on test set: 34499.577050
-----------------------------------
| forces             | 1143.5386  |
| rew_unnormalized   | -262437.3  |
| rollout/           |            |
|    ep_len_mean     | 32.0       |
|    ep_rew_mean     | -0.5929396 |
| test set forces    | 3.45e+04   |
| time/              |            |
|    fps             | 664        |
|    iterations      | 1          |
|    time_elapsed    | 4          |
|    total_timesteps | 3200       |
-----------------------------------
Forces on test set: 35935.269890
-----------------------------------------
| forces                  | 1124.006    |
| rew_unnormalized        | -245304.72  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 0.13344853  |
| test set forces         | 3.59e+04    |
| time/                   |             |
|    fps                  | 362         |
|    iterations           | 2           |
|    time_elapsed         | 17          |
|   

Forces on test set: 37768.909721
-----------------------------------------
| forces                  | 1143.9934   |
| rew_unnormalized        | -269292.6   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | -0.19531393 |
| test set forces         | 3.78e+04    |
| time/                   |             |
|    fps                  | 269         |
|    iterations           | 10          |
|    time_elapsed         | 118         |
|    total_timesteps      | 32000       |
| train/                  |             |
|    approx_kl            | 0.006911783 |
|    clip_fraction        | 0.0355      |
|    clip_range           | 0.2         |
|    entropy_loss         | -45.4       |
|    explained_variance   | 0.941       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.123       |
|    n_updates            | 90          |
|    policy_gradient_loss | -0.00897    |
|    std                  | 1           |
|

Forces on test set: 38034.041977
------------------------------------------
| forces                  | 1143.6226    |
| rew_unnormalized        | -258097.95   |
| rollout/                |              |
|    ep_len_mean          | 32.0         |
|    ep_rew_mean          | -0.10245581  |
| test set forces         | 3.8e+04      |
| time/                   |              |
|    fps                  | 240          |
|    iterations           | 18           |
|    time_elapsed         | 239          |
|    total_timesteps      | 57600        |
| train/                  |              |
|    approx_kl            | 0.0048261476 |
|    clip_fraction        | 0.0571       |
|    clip_range           | 0.2          |
|    entropy_loss         | -45.4        |
|    explained_variance   | 0.979        |
|    learning_rate        | 0.0001       |
|    loss                 | 0.0861       |
|    n_updates            | 170          |
|    policy_gradient_loss | -0.0102      |
|    std             

Forces on test set: 41008.844330
-----------------------------------------
| forces                  | 1069.7668   |
| rew_unnormalized        | -154138.19  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.3442922   |
| test set forces         | 4.1e+04     |
| time/                   |             |
|    fps                  | 234         |
|    iterations           | 26          |
|    time_elapsed         | 355         |
|    total_timesteps      | 83200       |
| train/                  |             |
|    approx_kl            | 0.012759478 |
|    clip_fraction        | 0.114       |
|    clip_range           | 0.2         |
|    entropy_loss         | -45.3       |
|    explained_variance   | 0.975       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.0811      |
|    n_updates            | 250         |
|    policy_gradient_loss | -0.0151     |
|    std                  | 0.997       |
|

Forces on test set: 45455.907166
-----------------------------------------
| forces                  | 1030.1382   |
| rew_unnormalized        | -105703.75  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.9346317   |
| test set forces         | 4.55e+04    |
| time/                   |             |
|    fps                  | 240         |
|    iterations           | 34          |
|    time_elapsed         | 452         |
|    total_timesteps      | 108800      |
| train/                  |             |
|    approx_kl            | 0.012881973 |
|    clip_fraction        | 0.125       |
|    clip_range           | 0.2         |
|    entropy_loss         | -45.3       |
|    explained_variance   | 0.975       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.0619      |
|    n_updates            | 330         |
|    policy_gradient_loss | -0.0128     |
|    std                  | 0.996       |
|

Forces on test set: 48106.263611
-----------------------------------------
| forces                  | 1031.9454   |
| rew_unnormalized        | -81075.36   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 2.1391346   |
| test set forces         | 4.81e+04    |
| time/                   |             |
|    fps                  | 243         |
|    iterations           | 42          |
|    time_elapsed         | 551         |
|    total_timesteps      | 134400      |
| train/                  |             |
|    approx_kl            | 0.014851035 |
|    clip_fraction        | 0.145       |
|    clip_range           | 0.2         |
|    entropy_loss         | -45.2       |
|    explained_variance   | 0.968       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.0113      |
|    n_updates            | 410         |
|    policy_gradient_loss | -0.0148     |
|    std                  | 0.994       |
|

Forces on test set: 52012.845703
Storing agent and hyperparameters to disk...
-----------------------------------------
| forces                  | 1009.07776  |
| rew_unnormalized        | -55486.73   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 2.4273918   |
| test set forces         | 5.2e+04     |
| time/                   |             |
|    fps                  | 245         |
|    iterations           | 50          |
|    time_elapsed         | 651         |
|    total_timesteps      | 160000      |
| train/                  |             |
|    approx_kl            | 0.012567021 |
|    clip_fraction        | 0.153       |
|    clip_range           | 0.2         |
|    entropy_loss         | -45.1       |
|    explained_variance   | 0.951       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.00741     |
|    n_updates            | 490         |
|    policy_gradient_loss | -0.0123     

Forces on test set: 54380.164307
------------------------------------------
| forces                  | 995.5925     |
| rew_unnormalized        | -30908.064   |
| rollout/                |              |
|    ep_len_mean          | 32.0         |
|    ep_rew_mean          | 2.7773101    |
| test set forces         | 5.44e+04     |
| time/                   |              |
|    fps                  | 246          |
|    iterations           | 58           |
|    time_elapsed         | 752          |
|    total_timesteps      | 185600       |
| train/                  |              |
|    approx_kl            | 0.0110943625 |
|    clip_fraction        | 0.154        |
|    clip_range           | 0.2          |
|    entropy_loss         | -45          |
|    explained_variance   | 0.918        |
|    learning_rate        | 0.0001       |
|    loss                 | -0.00961     |
|    n_updates            | 570          |
|    policy_gradient_loss | -0.0108      |
|    std             

Forces on test set: 54701.208862
-----------------------------------------
| forces                  | 1007.28876  |
| rew_unnormalized        | -29751.643  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 2.6353376   |
| test set forces         | 5.47e+04    |
| time/                   |             |
|    fps                  | 247         |
|    iterations           | 66          |
|    time_elapsed         | 853         |
|    total_timesteps      | 211200      |
| train/                  |             |
|    approx_kl            | 0.012641535 |
|    clip_fraction        | 0.128       |
|    clip_range           | 0.2         |
|    entropy_loss         | -45         |
|    explained_variance   | 0.938       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.00106     |
|    n_updates            | 650         |
|    policy_gradient_loss | -0.00779    |
|    std                  | 0.986       |
|

Forces on test set: 54328.049194
-----------------------------------------
| forces                  | 1018.1816   |
| rew_unnormalized        | -31016.93   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 2.4531558   |
| test set forces         | 5.43e+04    |
| time/                   |             |
|    fps                  | 248         |
|    iterations           | 74          |
|    time_elapsed         | 954         |
|    total_timesteps      | 236800      |
| train/                  |             |
|    approx_kl            | 0.012628674 |
|    clip_fraction        | 0.126       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.9       |
|    explained_variance   | 0.927       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.0127      |
|    n_updates            | 730         |
|    policy_gradient_loss | -0.00626    |
|    std                  | 0.984       |
|

Forces on test set: 54713.904297
-----------------------------------------
| forces                  | 1006.4338   |
| rew_unnormalized        | -28478.213  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 2.3880422   |
| test set forces         | 5.47e+04    |
| time/                   |             |
|    fps                  | 248         |
|    iterations           | 82          |
|    time_elapsed         | 1054        |
|    total_timesteps      | 262400      |
| train/                  |             |
|    approx_kl            | 0.020385046 |
|    clip_fraction        | 0.174       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.9       |
|    explained_variance   | 0.933       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.011       |
|    n_updates            | 810         |
|    policy_gradient_loss | -0.00965    |
|    std                  | 0.986       |
|

Forces on test set: 54687.619751
-----------------------------------------
| forces                  | 1015.74994  |
| rew_unnormalized        | -22660.332  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 2.4318      |
| test set forces         | 5.47e+04    |
| time/                   |             |
|    fps                  | 249         |
|    iterations           | 90          |
|    time_elapsed         | 1154        |
|    total_timesteps      | 288000      |
| train/                  |             |
|    approx_kl            | 0.010921692 |
|    clip_fraction        | 0.165       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.9       |
|    explained_variance   | 0.943       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.00761     |
|    n_updates            | 890         |
|    policy_gradient_loss | -0.0127     |
|    std                  | 0.986       |
|

Forces on test set: 55087.384888
-----------------------------------------
| forces                  | 1017.4531   |
| rew_unnormalized        | -21406.26   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 2.3615592   |
| test set forces         | 5.51e+04    |
| time/                   |             |
|    fps                  | 250         |
|    iterations           | 98          |
|    time_elapsed         | 1251        |
|    total_timesteps      | 313600      |
| train/                  |             |
|    approx_kl            | 0.013481216 |
|    clip_fraction        | 0.144       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.8       |
|    explained_variance   | 0.925       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.00219     |
|    n_updates            | 970         |
|    policy_gradient_loss | -0.00927    |
|    std                  | 0.983       |
|

Forces on test set: 56718.713257
-----------------------------------------
| forces                  | 1016.5042   |
| rew_unnormalized        | -18079.254  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 2.3681402   |
| test set forces         | 5.67e+04    |
| time/                   |             |
|    fps                  | 251         |
|    iterations           | 106         |
|    time_elapsed         | 1350        |
|    total_timesteps      | 339200      |
| train/                  |             |
|    approx_kl            | 0.011640339 |
|    clip_fraction        | 0.182       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.8       |
|    explained_variance   | 0.935       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.00769    |
|    n_updates            | 1050        |
|    policy_gradient_loss | -0.0113     |
|    std                  | 0.981       |
|

Forces on test set: 56248.429077
------------------------------------------
| forces                  | 1008.6495    |
| rew_unnormalized        | -17805.03    |
| rollout/                |              |
|    ep_len_mean          | 32.0         |
|    ep_rew_mean          | 2.2933826    |
| test set forces         | 5.62e+04     |
| time/                   |              |
|    fps                  | 251          |
|    iterations           | 114          |
|    time_elapsed         | 1449         |
|    total_timesteps      | 364800       |
| train/                  |              |
|    approx_kl            | 0.0107088685 |
|    clip_fraction        | 0.191        |
|    clip_range           | 0.2          |
|    entropy_loss         | -44.6        |
|    explained_variance   | 0.941        |
|    learning_rate        | 0.0001       |
|    loss                 | 0.00246      |
|    n_updates            | 1130         |
|    policy_gradient_loss | -0.00821     |
|    std             

Forces on test set: 58235.223389
-----------------------------------------
| forces                  | 990.4509    |
| rew_unnormalized        | -16512.818  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 2.259225    |
| test set forces         | 5.82e+04    |
| time/                   |             |
|    fps                  | 252         |
|    iterations           | 122         |
|    time_elapsed         | 1548        |
|    total_timesteps      | 390400      |
| train/                  |             |
|    approx_kl            | 0.021078987 |
|    clip_fraction        | 0.162       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.6       |
|    explained_variance   | 0.938       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0168     |
|    n_updates            | 1210        |
|    policy_gradient_loss | -0.0118     |
|    std                  | 0.975       |
|

Forces on test set: 56420.298462
----------------------------------------
| forces                  | 1031.2086  |
| rew_unnormalized        | -22072.803 |
| rollout/                |            |
|    ep_len_mean          | 32.0       |
|    ep_rew_mean          | 1.999515   |
| test set forces         | 5.64e+04   |
| time/                   |            |
|    fps                  | 252        |
|    iterations           | 130        |
|    time_elapsed         | 1648       |
|    total_timesteps      | 416000     |
| train/                  |            |
|    approx_kl            | 0.01949708 |
|    clip_fraction        | 0.139      |
|    clip_range           | 0.2        |
|    entropy_loss         | -44.6      |
|    explained_variance   | 0.929      |
|    learning_rate        | 0.0001     |
|    loss                 | 0.0119     |
|    n_updates            | 1290       |
|    policy_gradient_loss | -0.00928   |
|    std                  | 0.974      |
|    value_loss         

Forces on test set: 57457.980225
-----------------------------------------
| forces                  | 1006.4871   |
| rew_unnormalized        | -20153.592  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.9988867   |
| test set forces         | 5.75e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 138         |
|    time_elapsed         | 1741        |
|    total_timesteps      | 441600      |
| train/                  |             |
|    approx_kl            | 0.011595811 |
|    clip_fraction        | 0.166       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.5       |
|    explained_variance   | 0.94        |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0131     |
|    n_updates            | 1370        |
|    policy_gradient_loss | -0.0126     |
|    std                  | 0.973       |
|

Forces on test set: 56423.652344
-----------------------------------------
| forces                  | 992.80255   |
| rew_unnormalized        | -14774.901  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 2.1308756   |
| test set forces         | 5.64e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 146         |
|    time_elapsed         | 1840        |
|    total_timesteps      | 467200      |
| train/                  |             |
|    approx_kl            | 0.010091806 |
|    clip_fraction        | 0.17        |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.4       |
|    explained_variance   | 0.942       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0191     |
|    n_updates            | 1450        |
|    policy_gradient_loss | -0.0126     |
|    std                  | 0.97        |
|

Forces on test set: 55795.845947
-----------------------------------------
| forces                  | 1011.21906  |
| rew_unnormalized        | -17015.082  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.993299    |
| test set forces         | 5.58e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 154         |
|    time_elapsed         | 1943        |
|    total_timesteps      | 492800      |
| train/                  |             |
|    approx_kl            | 0.017846962 |
|    clip_fraction        | 0.197       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.4       |
|    explained_variance   | 0.93        |
|    learning_rate        | 0.0001      |
|    loss                 | -0.00133    |
|    n_updates            | 1530        |
|    policy_gradient_loss | -0.00972    |
|    std                  | 0.971       |
|

Forces on test set: 56843.846680
-----------------------------------------
| forces                  | 1032.489    |
| rew_unnormalized        | -16211.277  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.9708449   |
| test set forces         | 5.68e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 162         |
|    time_elapsed         | 2046        |
|    total_timesteps      | 518400      |
| train/                  |             |
|    approx_kl            | 0.011231517 |
|    clip_fraction        | 0.16        |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.3       |
|    explained_variance   | 0.934       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.0139      |
|    n_updates            | 1610        |
|    policy_gradient_loss | -0.00959    |
|    std                  | 0.968       |
|

Forces on test set: 57626.617798
-----------------------------------------
| forces                  | 997.34436   |
| rew_unnormalized        | -15055.935  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.9652945   |
| test set forces         | 5.76e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 170         |
|    time_elapsed         | 2149        |
|    total_timesteps      | 544000      |
| train/                  |             |
|    approx_kl            | 0.011051084 |
|    clip_fraction        | 0.17        |
|    clip_range           | 0.2         |
|    entropy_loss         | -44.2       |
|    explained_variance   | 0.928       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.0178      |
|    n_updates            | 1690        |
|    policy_gradient_loss | -0.00945    |
|    std                  | 0.964       |
|

Forces on test set: 58000.231934
-----------------------------------------
| forces                  | 995.7951    |
| rew_unnormalized        | -14863.285  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.9270074   |
| test set forces         | 5.8e+04     |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 178         |
|    time_elapsed         | 2249        |
|    total_timesteps      | 569600      |
| train/                  |             |
|    approx_kl            | 0.014382574 |
|    clip_fraction        | 0.181       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44         |
|    explained_variance   | 0.93        |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0119     |
|    n_updates            | 1770        |
|    policy_gradient_loss | -0.0122     |
|    std                  | 0.959       |
|

Forces on test set: 60233.930420
-----------------------------------------
| forces                  | 1030.4128   |
| rew_unnormalized        | -14213.822  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.909802    |
| test set forces         | 6.02e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 186         |
|    time_elapsed         | 2351        |
|    total_timesteps      | 595200      |
| train/                  |             |
|    approx_kl            | 0.017401634 |
|    clip_fraction        | 0.187       |
|    clip_range           | 0.2         |
|    entropy_loss         | -44         |
|    explained_variance   | 0.95        |
|    learning_rate        | 0.0001      |
|    loss                 | -0.000976   |
|    n_updates            | 1850        |
|    policy_gradient_loss | -0.0125     |
|    std                  | 0.957       |
|

Forces on test set: 59131.929321
-----------------------------------------
| forces                  | 1038.4541   |
| rew_unnormalized        | -15040.388  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.8345613   |
| test set forces         | 5.91e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 194         |
|    time_elapsed         | 2452        |
|    total_timesteps      | 620800      |
| train/                  |             |
|    approx_kl            | 0.014970579 |
|    clip_fraction        | 0.207       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.9       |
|    explained_variance   | 0.937       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0203     |
|    n_updates            | 1930        |
|    policy_gradient_loss | -0.00993    |
|    std                  | 0.956       |
|

Forces on test set: 57888.904541
-----------------------------------------
| forces                  | 1011.03455  |
| rew_unnormalized        | -10859.996  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.9716722   |
| test set forces         | 5.79e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 202         |
|    time_elapsed         | 2553        |
|    total_timesteps      | 646400      |
| train/                  |             |
|    approx_kl            | 0.022821523 |
|    clip_fraction        | 0.21        |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.8       |
|    explained_variance   | 0.936       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.00324     |
|    n_updates            | 2010        |
|    policy_gradient_loss | -0.0123     |
|    std                  | 0.952       |
|

Forces on test set: 58321.644165
-----------------------------------------
| forces                  | 998.6562    |
| rew_unnormalized        | -13681.411  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.8133535   |
| test set forces         | 5.83e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 210         |
|    time_elapsed         | 2654        |
|    total_timesteps      | 672000      |
| train/                  |             |
|    approx_kl            | 0.016018454 |
|    clip_fraction        | 0.193       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.8       |
|    explained_variance   | 0.948       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0267     |
|    n_updates            | 2090        |
|    policy_gradient_loss | -0.0112     |
|    std                  | 0.951       |
|

Forces on test set: 58298.250488
-----------------------------------------
| forces                  | 1015.2179   |
| rew_unnormalized        | -13641.728  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.7791415   |
| test set forces         | 5.83e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 218         |
|    time_elapsed         | 2755        |
|    total_timesteps      | 697600      |
| train/                  |             |
|    approx_kl            | 0.014804111 |
|    clip_fraction        | 0.189       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.8       |
|    explained_variance   | 0.94        |
|    learning_rate        | 0.0001      |
|    loss                 | -0.00209    |
|    n_updates            | 2170        |
|    policy_gradient_loss | -0.00837    |
|    std                  | 0.951       |
|

Forces on test set: 59643.970947
-----------------------------------------
| forces                  | 1001.3223   |
| rew_unnormalized        | -12018.874  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.818507    |
| test set forces         | 5.96e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 226         |
|    time_elapsed         | 2854        |
|    total_timesteps      | 723200      |
| train/                  |             |
|    approx_kl            | 0.021450926 |
|    clip_fraction        | 0.208       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.7       |
|    explained_variance   | 0.934       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0108     |
|    n_updates            | 2250        |
|    policy_gradient_loss | -0.0121     |
|    std                  | 0.948       |
|

Forces on test set: 60919.820312
-----------------------------------------
| forces                  | 1038.224    |
| rew_unnormalized        | -14097.124  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.6922957   |
| test set forces         | 6.09e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 234         |
|    time_elapsed         | 2954        |
|    total_timesteps      | 748800      |
| train/                  |             |
|    approx_kl            | 0.018892664 |
|    clip_fraction        | 0.214       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.5       |
|    explained_variance   | 0.906       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.0128      |
|    n_updates            | 2330        |
|    policy_gradient_loss | -0.0116     |
|    std                  | 0.944       |
|

Forces on test set: 61946.069092
-----------------------------------------
| forces                  | 997.65674   |
| rew_unnormalized        | -10612.898  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.8225788   |
| test set forces         | 6.19e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 242         |
|    time_elapsed         | 3054        |
|    total_timesteps      | 774400      |
| train/                  |             |
|    approx_kl            | 0.023399372 |
|    clip_fraction        | 0.218       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.6       |
|    explained_variance   | 0.948       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0256     |
|    n_updates            | 2410        |
|    policy_gradient_loss | -0.0137     |
|    std                  | 0.946       |
|

Forces on test set: 62379.603760
Storing agent and hyperparameters to disk...
-----------------------------------------
| forces                  | 1037.8622   |
| rew_unnormalized        | -13690.124  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.6498032   |
| test set forces         | 6.24e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 250         |
|    time_elapsed         | 3155        |
|    total_timesteps      | 800000      |
| train/                  |             |
|    approx_kl            | 0.016071016 |
|    clip_fraction        | 0.213       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.6       |
|    explained_variance   | 0.924       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0139     |
|    n_updates            | 2490        |
|    policy_gradient_loss | -0.0118     

Forces on test set: 62267.546143
-----------------------------------------
| forces                  | 1028.8326   |
| rew_unnormalized        | -13371.125  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.6364236   |
| test set forces         | 6.23e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 258         |
|    time_elapsed         | 3250        |
|    total_timesteps      | 825600      |
| train/                  |             |
|    approx_kl            | 0.014808809 |
|    clip_fraction        | 0.235       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.5       |
|    explained_variance   | 0.876       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.0145      |
|    n_updates            | 2570        |
|    policy_gradient_loss | -0.0107     |
|    std                  | 0.944       |
|

Forces on test set: 61588.528442
-----------------------------------------
| forces                  | 1047.9968   |
| rew_unnormalized        | -11716.48   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.6887305   |
| test set forces         | 6.16e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 266         |
|    time_elapsed         | 3351        |
|    total_timesteps      | 851200      |
| train/                  |             |
|    approx_kl            | 0.019769335 |
|    clip_fraction        | 0.227       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.5       |
|    explained_variance   | 0.891       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.00718     |
|    n_updates            | 2650        |
|    policy_gradient_loss | -0.00929    |
|    std                  | 0.943       |
|

Forces on test set: 61942.572388
-----------------------------------------
| forces                  | 1013.8906   |
| rew_unnormalized        | -12007.12   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.6483514   |
| test set forces         | 6.19e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 274         |
|    time_elapsed         | 3450        |
|    total_timesteps      | 876800      |
| train/                  |             |
|    approx_kl            | 0.017476918 |
|    clip_fraction        | 0.227       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.4       |
|    explained_variance   | 0.937       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0167     |
|    n_updates            | 2730        |
|    policy_gradient_loss | -0.0125     |
|    std                  | 0.941       |
|

Forces on test set: 61311.696411
-----------------------------------------
| forces                  | 1023.78986  |
| rew_unnormalized        | -11166.521  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.6658843   |
| test set forces         | 6.13e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 282         |
|    time_elapsed         | 3550        |
|    total_timesteps      | 902400      |
| train/                  |             |
|    approx_kl            | 0.020089455 |
|    clip_fraction        | 0.268       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.4       |
|    explained_variance   | 0.921       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.00873     |
|    n_updates            | 2810        |
|    policy_gradient_loss | -0.0155     |
|    std                  | 0.942       |
|

Forces on test set: 62065.280151
-----------------------------------------
| forces                  | 1032.0697   |
| rew_unnormalized        | -11312.254  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.633925    |
| test set forces         | 6.21e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 290         |
|    time_elapsed         | 3654        |
|    total_timesteps      | 928000      |
| train/                  |             |
|    approx_kl            | 0.016604315 |
|    clip_fraction        | 0.231       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.4       |
|    explained_variance   | 0.95        |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0304     |
|    n_updates            | 2890        |
|    policy_gradient_loss | -0.0142     |
|    std                  | 0.941       |
|

Forces on test set: 61035.338257
-----------------------------------------
| forces                  | 990.60345   |
| rew_unnormalized        | -10113.039  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.6730032   |
| test set forces         | 6.1e+04     |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 298         |
|    time_elapsed         | 3758        |
|    total_timesteps      | 953600      |
| train/                  |             |
|    approx_kl            | 0.014472025 |
|    clip_fraction        | 0.239       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.4       |
|    explained_variance   | 0.943       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.00513    |
|    n_updates            | 2970        |
|    policy_gradient_loss | -0.00895    |
|    std                  | 0.94        |
|

Forces on test set: 61135.081055
-----------------------------------------
| forces                  | 999.5343    |
| rew_unnormalized        | -10844.735  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.6126926   |
| test set forces         | 6.11e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 306         |
|    time_elapsed         | 3861        |
|    total_timesteps      | 979200      |
| train/                  |             |
|    approx_kl            | 0.024083966 |
|    clip_fraction        | 0.235       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.4       |
|    explained_variance   | 0.946       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0301     |
|    n_updates            | 3050        |
|    policy_gradient_loss | -0.014      |
|    std                  | 0.941       |
|

Forces on test set: 61278.164185
----------------------------------------
| forces                  | 977.27     |
| rew_unnormalized        | -9713.045  |
| rollout/                |            |
|    ep_len_mean          | 32.0       |
|    ep_rew_mean          | 1.6513919  |
| test set forces         | 6.13e+04   |
| time/                   |            |
|    fps                  | 253        |
|    iterations           | 314        |
|    time_elapsed         | 3963       |
|    total_timesteps      | 1004800    |
| train/                  |            |
|    approx_kl            | 0.01953765 |
|    clip_fraction        | 0.217      |
|    clip_range           | 0.2        |
|    entropy_loss         | -43.4      |
|    explained_variance   | 0.906      |
|    learning_rate        | 0.0001     |
|    loss                 | -0.00561   |
|    n_updates            | 3130       |
|    policy_gradient_loss | -0.011     |
|    std                  | 0.94       |
|    value_loss         

Forces on test set: 61298.771118
-----------------------------------------
| forces                  | 1013.5543   |
| rew_unnormalized        | -10487.84   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.5900123   |
| test set forces         | 6.13e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 322         |
|    time_elapsed         | 4065        |
|    total_timesteps      | 1030400     |
| train/                  |             |
|    approx_kl            | 0.017061409 |
|    clip_fraction        | 0.201       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.3       |
|    explained_variance   | 0.913       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0305     |
|    n_updates            | 3210        |
|    policy_gradient_loss | -0.00939    |
|    std                  | 0.938       |
|

Forces on test set: 60518.136963
-----------------------------------------
| forces                  | 1008.8982   |
| rew_unnormalized        | -10253.534  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.5822963   |
| test set forces         | 6.05e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 330         |
|    time_elapsed         | 4165        |
|    total_timesteps      | 1056000     |
| train/                  |             |
|    approx_kl            | 0.021453524 |
|    clip_fraction        | 0.24        |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.2       |
|    explained_variance   | 0.876       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0165     |
|    n_updates            | 3290        |
|    policy_gradient_loss | -0.0158     |
|    std                  | 0.936       |
|

Forces on test set: 60655.077271
-----------------------------------------
| forces                  | 978.9204    |
| rew_unnormalized        | -9105.179   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.6264514   |
| test set forces         | 6.07e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 338         |
|    time_elapsed         | 4263        |
|    total_timesteps      | 1081600     |
| train/                  |             |
|    approx_kl            | 0.022509348 |
|    clip_fraction        | 0.233       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43.2       |
|    explained_variance   | 0.917       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.00682    |
|    n_updates            | 3370        |
|    policy_gradient_loss | -0.0107     |
|    std                  | 0.933       |
|

Forces on test set: 60387.431763
-----------------------------------------
| forces                  | 1008.2902   |
| rew_unnormalized        | -8793.52    |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.6253476   |
| test set forces         | 6.04e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 346         |
|    time_elapsed         | 4362        |
|    total_timesteps      | 1107200     |
| train/                  |             |
|    approx_kl            | 0.014978203 |
|    clip_fraction        | 0.207       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43         |
|    explained_variance   | 0.941       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0121     |
|    n_updates            | 3450        |
|    policy_gradient_loss | -0.0152     |
|    std                  | 0.929       |
|

Forces on test set: 61088.339722
-----------------------------------------
| forces                  | 1000.6028   |
| rew_unnormalized        | -8796.596   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.6073239   |
| test set forces         | 6.11e+04    |
| time/                   |             |
|    fps                  | 253         |
|    iterations           | 354         |
|    time_elapsed         | 4461        |
|    total_timesteps      | 1132800     |
| train/                  |             |
|    approx_kl            | 0.026534487 |
|    clip_fraction        | 0.243       |
|    clip_range           | 0.2         |
|    entropy_loss         | -43         |
|    explained_variance   | 0.941       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.017       |
|    n_updates            | 3530        |
|    policy_gradient_loss | -0.0162     |
|    std                  | 0.927       |
|

Forces on test set: 61471.020386
-----------------------------------------
| forces                  | 986.1168    |
| rew_unnormalized        | -8015.205   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.6345559   |
| test set forces         | 6.15e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 362         |
|    time_elapsed         | 4559        |
|    total_timesteps      | 1158400     |
| train/                  |             |
|    approx_kl            | 0.032685667 |
|    clip_fraction        | 0.25        |
|    clip_range           | 0.2         |
|    entropy_loss         | -42.8       |
|    explained_variance   | 0.948       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0114     |
|    n_updates            | 3610        |
|    policy_gradient_loss | -0.00792    |
|    std                  | 0.924       |
|

Forces on test set: 60981.037476
-----------------------------------------
| forces                  | 1008.97144  |
| rew_unnormalized        | -9580.403   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.5282431   |
| test set forces         | 6.1e+04     |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 370         |
|    time_elapsed         | 4659        |
|    total_timesteps      | 1184000     |
| train/                  |             |
|    approx_kl            | 0.021116395 |
|    clip_fraction        | 0.241       |
|    clip_range           | 0.2         |
|    entropy_loss         | -42.7       |
|    explained_variance   | 0.942       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0101     |
|    n_updates            | 3690        |
|    policy_gradient_loss | -0.0126     |
|    std                  | 0.921       |
|

Forces on test set: 60113.958618
----------------------------------------
| forces                  | 989.2896   |
| rew_unnormalized        | -8583.708  |
| rollout/                |            |
|    ep_len_mean          | 32.0       |
|    ep_rew_mean          | 1.5689775  |
| test set forces         | 6.01e+04   |
| time/                   |            |
|    fps                  | 254        |
|    iterations           | 378        |
|    time_elapsed         | 4758       |
|    total_timesteps      | 1209600    |
| train/                  |            |
|    approx_kl            | 0.01681896 |
|    clip_fraction        | 0.252      |
|    clip_range           | 0.2        |
|    entropy_loss         | -42.7      |
|    explained_variance   | 0.923      |
|    learning_rate        | 0.0001     |
|    loss                 | -0.00899   |
|    n_updates            | 3770       |
|    policy_gradient_loss | -0.0196    |
|    std                  | 0.92       |
|    value_loss         

Forces on test set: 60962.301758
----------------------------------------
| forces                  | 996.8571   |
| rew_unnormalized        | -8310.204  |
| rollout/                |            |
|    ep_len_mean          | 32.0       |
|    ep_rew_mean          | 1.5693669  |
| test set forces         | 6.1e+04    |
| time/                   |            |
|    fps                  | 254        |
|    iterations           | 386        |
|    time_elapsed         | 4858       |
|    total_timesteps      | 1235200    |
| train/                  |            |
|    approx_kl            | 0.02833517 |
|    clip_fraction        | 0.245      |
|    clip_range           | 0.2        |
|    entropy_loss         | -42.6      |
|    explained_variance   | 0.947      |
|    learning_rate        | 0.0001     |
|    loss                 | -0.0181    |
|    n_updates            | 3850       |
|    policy_gradient_loss | -0.0181    |
|    std                  | 0.917      |
|    value_loss         

Forces on test set: 60840.259033
-----------------------------------------
| forces                  | 987.2076    |
| rew_unnormalized        | -11288.935  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.3783555   |
| test set forces         | 6.08e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 394         |
|    time_elapsed         | 4958        |
|    total_timesteps      | 1260800     |
| train/                  |             |
|    approx_kl            | 0.024062905 |
|    clip_fraction        | 0.25        |
|    clip_range           | 0.2         |
|    entropy_loss         | -42.4       |
|    explained_variance   | 0.931       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0303     |
|    n_updates            | 3930        |
|    policy_gradient_loss | -0.0173     |
|    std                  | 0.91        |
|

Forces on test set: 60376.163574
-----------------------------------------
| forces                  | 986.69666   |
| rew_unnormalized        | -8563.119   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.5254756   |
| test set forces         | 6.04e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 402         |
|    time_elapsed         | 5059        |
|    total_timesteps      | 1286400     |
| train/                  |             |
|    approx_kl            | 0.020885339 |
|    clip_fraction        | 0.245       |
|    clip_range           | 0.2         |
|    entropy_loss         | -42.3       |
|    explained_variance   | 0.927       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.00241     |
|    n_updates            | 4010        |
|    policy_gradient_loss | -0.0186     |
|    std                  | 0.909       |
|

Forces on test set: 59995.359131
-----------------------------------------
| forces                  | 1010.8938   |
| rew_unnormalized        | -10371.553  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.4023663   |
| test set forces         | 6e+04       |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 410         |
|    time_elapsed         | 5159        |
|    total_timesteps      | 1312000     |
| train/                  |             |
|    approx_kl            | 0.018523373 |
|    clip_fraction        | 0.24        |
|    clip_range           | 0.2         |
|    entropy_loss         | -42.3       |
|    explained_variance   | 0.918       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.00159    |
|    n_updates            | 4090        |
|    policy_gradient_loss | -0.0127     |
|    std                  | 0.909       |
|

Forces on test set: 61459.472778
-----------------------------------------
| forces                  | 983.3234    |
| rew_unnormalized        | -9562.045   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.4364067   |
| test set forces         | 6.15e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 418         |
|    time_elapsed         | 5257        |
|    total_timesteps      | 1337600     |
| train/                  |             |
|    approx_kl            | 0.020330189 |
|    clip_fraction        | 0.231       |
|    clip_range           | 0.2         |
|    entropy_loss         | -42.2       |
|    explained_variance   | 0.942       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.00906     |
|    n_updates            | 4170        |
|    policy_gradient_loss | -0.0127     |
|    std                  | 0.905       |
|

Forces on test set: 61751.202026
----------------------------------------
| forces                  | 1014.78564 |
| rew_unnormalized        | -10249.351 |
| rollout/                |            |
|    ep_len_mean          | 32.0       |
|    ep_rew_mean          | 1.3800184  |
| test set forces         | 6.18e+04   |
| time/                   |            |
|    fps                  | 254        |
|    iterations           | 426        |
|    time_elapsed         | 5357       |
|    total_timesteps      | 1363200    |
| train/                  |            |
|    approx_kl            | 0.01857396 |
|    clip_fraction        | 0.245      |
|    clip_range           | 0.2        |
|    entropy_loss         | -42.1      |
|    explained_variance   | 0.926      |
|    learning_rate        | 0.0001     |
|    loss                 | -0.0329    |
|    n_updates            | 4250       |
|    policy_gradient_loss | -0.0119    |
|    std                  | 0.904      |
|    value_loss         

Forces on test set: 61994.127075
-----------------------------------------
| forces                  | 989.98987   |
| rew_unnormalized        | -9439.925   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.4161      |
| test set forces         | 6.2e+04     |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 434         |
|    time_elapsed         | 5457        |
|    total_timesteps      | 1388800     |
| train/                  |             |
|    approx_kl            | 0.017720867 |
|    clip_fraction        | 0.255       |
|    clip_range           | 0.2         |
|    entropy_loss         | -42.1       |
|    explained_variance   | 0.914       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.00456    |
|    n_updates            | 4330        |
|    policy_gradient_loss | -0.0136     |
|    std                  | 0.905       |
|

Forces on test set: 61630.996338
-----------------------------------------
| forces                  | 967.7898    |
| rew_unnormalized        | -7166.373   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.5447806   |
| test set forces         | 6.16e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 442         |
|    time_elapsed         | 5560        |
|    total_timesteps      | 1414400     |
| train/                  |             |
|    approx_kl            | 0.019007344 |
|    clip_fraction        | 0.236       |
|    clip_range           | 0.2         |
|    entropy_loss         | -42.1       |
|    explained_variance   | 0.902       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0228     |
|    n_updates            | 4410        |
|    policy_gradient_loss | -0.0169     |
|    std                  | 0.904       |
|

Forces on test set: 61874.072876
Storing agent and hyperparameters to disk...
----------------------------------------
| forces                  | 1009.34033 |
| rew_unnormalized        | -9041.079  |
| rollout/                |            |
|    ep_len_mean          | 32.0       |
|    ep_rew_mean          | 1.414372   |
| test set forces         | 6.19e+04   |
| time/                   |            |
|    fps                  | 254        |
|    iterations           | 450        |
|    time_elapsed         | 5664       |
|    total_timesteps      | 1440000    |
| train/                  |            |
|    approx_kl            | 0.0390221  |
|    clip_fraction        | 0.249      |
|    clip_range           | 0.2        |
|    entropy_loss         | -41.9      |
|    explained_variance   | 0.917      |
|    learning_rate        | 0.0001     |
|    loss                 | -0.0132    |
|    n_updates            | 4490       |
|    policy_gradient_loss | -0.0144    |
|    std            

Forces on test set: 62870.810303
-----------------------------------------
| forces                  | 1005.79205  |
| rew_unnormalized        | -8855.395   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.4134946   |
| test set forces         | 6.29e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 458         |
|    time_elapsed         | 5766        |
|    total_timesteps      | 1465600     |
| train/                  |             |
|    approx_kl            | 0.027919011 |
|    clip_fraction        | 0.244       |
|    clip_range           | 0.2         |
|    entropy_loss         | -41.8       |
|    explained_variance   | 0.932       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.016      |
|    n_updates            | 4570        |
|    policy_gradient_loss | -0.0122     |
|    std                  | 0.895       |
|

Forces on test set: 63001.527100
-----------------------------------------
| forces                  | 990.65845   |
| rew_unnormalized        | -8698.352   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.4118842   |
| test set forces         | 6.3e+04     |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 466         |
|    time_elapsed         | 5868        |
|    total_timesteps      | 1491200     |
| train/                  |             |
|    approx_kl            | 0.021523282 |
|    clip_fraction        | 0.236       |
|    clip_range           | 0.2         |
|    entropy_loss         | -41.7       |
|    explained_variance   | 0.927       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0191     |
|    n_updates            | 4650        |
|    policy_gradient_loss | -0.016      |
|    std                  | 0.894       |
|

Forces on test set: 61693.287964
-----------------------------------------
| forces                  | 949.0543    |
| rew_unnormalized        | -7952.2637  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.4481952   |
| test set forces         | 6.17e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 474         |
|    time_elapsed         | 5968        |
|    total_timesteps      | 1516800     |
| train/                  |             |
|    approx_kl            | 0.026095707 |
|    clip_fraction        | 0.221       |
|    clip_range           | 0.2         |
|    entropy_loss         | -41.7       |
|    explained_variance   | 0.924       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0037     |
|    n_updates            | 4730        |
|    policy_gradient_loss | -0.0162     |
|    std                  | 0.891       |
|

Forces on test set: 63439.662598
-----------------------------------------
| forces                  | 1006.21893  |
| rew_unnormalized        | -9424.845   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.340242    |
| test set forces         | 6.34e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 482         |
|    time_elapsed         | 6067        |
|    total_timesteps      | 1542400     |
| train/                  |             |
|    approx_kl            | 0.028719718 |
|    clip_fraction        | 0.272       |
|    clip_range           | 0.2         |
|    entropy_loss         | -41.5       |
|    explained_variance   | 0.849       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.00991    |
|    n_updates            | 4810        |
|    policy_gradient_loss | -0.0167     |
|    std                  | 0.887       |
|

Forces on test set: 64068.235718
-----------------------------------------
| forces                  | 984.82806   |
| rew_unnormalized        | -7992.15    |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.4222362   |
| test set forces         | 6.41e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 490         |
|    time_elapsed         | 6169        |
|    total_timesteps      | 1568000     |
| train/                  |             |
|    approx_kl            | 0.019091502 |
|    clip_fraction        | 0.246       |
|    clip_range           | 0.2         |
|    entropy_loss         | -41.4       |
|    explained_variance   | 0.916       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0211     |
|    n_updates            | 4890        |
|    policy_gradient_loss | -0.0187     |
|    std                  | 0.883       |
|

Forces on test set: 63952.112427
-----------------------------------------
| forces                  | 989.4144    |
| rew_unnormalized        | -8510.708   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.3764429   |
| test set forces         | 6.4e+04     |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 498         |
|    time_elapsed         | 6269        |
|    total_timesteps      | 1593600     |
| train/                  |             |
|    approx_kl            | 0.025983002 |
|    clip_fraction        | 0.276       |
|    clip_range           | 0.2         |
|    entropy_loss         | -41.3       |
|    explained_variance   | 0.929       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0331     |
|    n_updates            | 4970        |
|    policy_gradient_loss | -0.0177     |
|    std                  | 0.882       |
|

Forces on test set: 64900.836914
-----------------------------------------
| forces                  | 983.15625   |
| rew_unnormalized        | -8532.921   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.3630481   |
| test set forces         | 6.49e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 506         |
|    time_elapsed         | 6368        |
|    total_timesteps      | 1619200     |
| train/                  |             |
|    approx_kl            | 0.027385706 |
|    clip_fraction        | 0.266       |
|    clip_range           | 0.2         |
|    entropy_loss         | -41.2       |
|    explained_variance   | 0.887       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.00055    |
|    n_updates            | 5050        |
|    policy_gradient_loss | -0.0188     |
|    std                  | 0.879       |
|

Forces on test set: 63995.139526
-----------------------------------------
| forces                  | 993.7971    |
| rew_unnormalized        | -8721.684   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.3393025   |
| test set forces         | 6.4e+04     |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 514         |
|    time_elapsed         | 6467        |
|    total_timesteps      | 1644800     |
| train/                  |             |
|    approx_kl            | 0.024285894 |
|    clip_fraction        | 0.297       |
|    clip_range           | 0.2         |
|    entropy_loss         | -41.1       |
|    explained_variance   | 0.934       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0141     |
|    n_updates            | 5130        |
|    policy_gradient_loss | -0.0129     |
|    std                  | 0.877       |
|

Forces on test set: 63550.245361
-----------------------------------------
| forces                  | 990.478     |
| rew_unnormalized        | -7562.5044  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.4071785   |
| test set forces         | 6.36e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 522         |
|    time_elapsed         | 6564        |
|    total_timesteps      | 1670400     |
| train/                  |             |
|    approx_kl            | 0.022211434 |
|    clip_fraction        | 0.274       |
|    clip_range           | 0.2         |
|    entropy_loss         | -41.2       |
|    explained_variance   | 0.919       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.012      |
|    n_updates            | 5210        |
|    policy_gradient_loss | -0.0188     |
|    std                  | 0.878       |
|

Forces on test set: 64153.151978
-----------------------------------------
| forces                  | 1003.64197  |
| rew_unnormalized        | -7753.228   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.3836684   |
| test set forces         | 6.42e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 530         |
|    time_elapsed         | 6663        |
|    total_timesteps      | 1696000     |
| train/                  |             |
|    approx_kl            | 0.030206775 |
|    clip_fraction        | 0.274       |
|    clip_range           | 0.2         |
|    entropy_loss         | -41.1       |
|    explained_variance   | 0.929       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0348     |
|    n_updates            | 5290        |
|    policy_gradient_loss | -0.0156     |
|    std                  | 0.877       |
|

Forces on test set: 64362.123901
-----------------------------------------
| forces                  | 991.96075   |
| rew_unnormalized        | -7641.7563  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.3811421   |
| test set forces         | 6.44e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 538         |
|    time_elapsed         | 6763        |
|    total_timesteps      | 1721600     |
| train/                  |             |
|    approx_kl            | 0.021531321 |
|    clip_fraction        | 0.255       |
|    clip_range           | 0.2         |
|    entropy_loss         | -41.1       |
|    explained_variance   | 0.949       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.00335    |
|    n_updates            | 5370        |
|    policy_gradient_loss | -0.0165     |
|    std                  | 0.876       |
|

Forces on test set: 63731.033569
-----------------------------------------
| forces                  | 966.3315    |
| rew_unnormalized        | -6930.74    |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.4203606   |
| test set forces         | 6.37e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 546         |
|    time_elapsed         | 6864        |
|    total_timesteps      | 1747200     |
| train/                  |             |
|    approx_kl            | 0.021555629 |
|    clip_fraction        | 0.272       |
|    clip_range           | 0.2         |
|    entropy_loss         | -41.1       |
|    explained_variance   | 0.931       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0274     |
|    n_updates            | 5450        |
|    policy_gradient_loss | -0.019      |
|    std                  | 0.876       |
|

Forces on test set: 63121.158569
-----------------------------------------
| forces                  | 982.954     |
| rew_unnormalized        | -7465.8564  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.372971    |
| test set forces         | 6.31e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 554         |
|    time_elapsed         | 6961        |
|    total_timesteps      | 1772800     |
| train/                  |             |
|    approx_kl            | 0.021636968 |
|    clip_fraction        | 0.274       |
|    clip_range           | 0.2         |
|    entropy_loss         | -40.9       |
|    explained_variance   | 0.891       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.00722    |
|    n_updates            | 5530        |
|    policy_gradient_loss | -0.0144     |
|    std                  | 0.872       |
|

Forces on test set: 62947.254761
----------------------------------------
| forces                  | 1004.50665 |
| rew_unnormalized        | -8359.69   |
| rollout/                |            |
|    ep_len_mean          | 32.0       |
|    ep_rew_mean          | 1.3006144  |
| test set forces         | 6.29e+04   |
| time/                   |            |
|    fps                  | 254        |
|    iterations           | 562        |
|    time_elapsed         | 7058       |
|    total_timesteps      | 1798400    |
| train/                  |            |
|    approx_kl            | 0.02833052 |
|    clip_fraction        | 0.3        |
|    clip_range           | 0.2        |
|    entropy_loss         | -41        |
|    explained_variance   | 0.942      |
|    learning_rate        | 0.0001     |
|    loss                 | 0.0233     |
|    n_updates            | 5610       |
|    policy_gradient_loss | -0.016     |
|    std                  | 0.872      |
|    value_loss         

Forces on test set: 62500.076538
-----------------------------------------
| forces                  | 981.90344   |
| rew_unnormalized        | -8669.52    |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.268746    |
| test set forces         | 6.25e+04    |
| time/                   |             |
|    fps                  | 254         |
|    iterations           | 570         |
|    time_elapsed         | 7155        |
|    total_timesteps      | 1824000     |
| train/                  |             |
|    approx_kl            | 0.021854956 |
|    clip_fraction        | 0.275       |
|    clip_range           | 0.2         |
|    entropy_loss         | -40.9       |
|    explained_variance   | 0.949       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.00868    |
|    n_updates            | 5690        |
|    policy_gradient_loss | -0.0175     |
|    std                  | 0.87        |
|

Forces on test set: 62707.134766
-----------------------------------------
| forces                  | 984.03546   |
| rew_unnormalized        | -7847.891   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.3174529   |
| test set forces         | 6.27e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 578         |
|    time_elapsed         | 7253        |
|    total_timesteps      | 1849600     |
| train/                  |             |
|    approx_kl            | 0.028131848 |
|    clip_fraction        | 0.307       |
|    clip_range           | 0.2         |
|    entropy_loss         | -40.8       |
|    explained_variance   | 0.915       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0242     |
|    n_updates            | 5770        |
|    policy_gradient_loss | -0.0173     |
|    std                  | 0.868       |
|

Forces on test set: 63311.498657
----------------------------------------
| forces                  | 965.0399   |
| rew_unnormalized        | -7119.497  |
| rollout/                |            |
|    ep_len_mean          | 32.0       |
|    ep_rew_mean          | 1.3604907  |
| test set forces         | 6.33e+04   |
| time/                   |            |
|    fps                  | 255        |
|    iterations           | 586        |
|    time_elapsed         | 7351       |
|    total_timesteps      | 1875200    |
| train/                  |            |
|    approx_kl            | 0.02609378 |
|    clip_fraction        | 0.269      |
|    clip_range           | 0.2        |
|    entropy_loss         | -40.7      |
|    explained_variance   | 0.935      |
|    learning_rate        | 0.0001     |
|    loss                 | -0.0144    |
|    n_updates            | 5850       |
|    policy_gradient_loss | -0.017     |
|    std                  | 0.864      |
|    value_loss         

Forces on test set: 63264.786133
-----------------------------------------
| forces                  | 947.2966    |
| rew_unnormalized        | -6602.484   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.3890121   |
| test set forces         | 6.33e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 594         |
|    time_elapsed         | 7447        |
|    total_timesteps      | 1900800     |
| train/                  |             |
|    approx_kl            | 0.030395068 |
|    clip_fraction        | 0.322       |
|    clip_range           | 0.2         |
|    entropy_loss         | -40.6       |
|    explained_variance   | 0.908       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0168     |
|    n_updates            | 5930        |
|    policy_gradient_loss | -0.0139     |
|    std                  | 0.864       |
|

Forces on test set: 64375.143311
-----------------------------------------
| forces                  | 991.0455    |
| rew_unnormalized        | -9096.782   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.1989106   |
| test set forces         | 6.44e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 602         |
|    time_elapsed         | 7544        |
|    total_timesteps      | 1926400     |
| train/                  |             |
|    approx_kl            | 0.014647961 |
|    clip_fraction        | 0.269       |
|    clip_range           | 0.2         |
|    entropy_loss         | -40.5       |
|    explained_variance   | 0.936       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0407     |
|    n_updates            | 6010        |
|    policy_gradient_loss | -0.0173     |
|    std                  | 0.859       |
|

Forces on test set: 64480.298096
-----------------------------------------
| forces                  | 976.6852    |
| rew_unnormalized        | -8772.966   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2138202   |
| test set forces         | 6.45e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 610         |
|    time_elapsed         | 7642        |
|    total_timesteps      | 1952000     |
| train/                  |             |
|    approx_kl            | 0.031029224 |
|    clip_fraction        | 0.31        |
|    clip_range           | 0.2         |
|    entropy_loss         | -40.4       |
|    explained_variance   | 0.889       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0218     |
|    n_updates            | 6090        |
|    policy_gradient_loss | -0.0179     |
|    std                  | 0.857       |
|

Forces on test set: 63167.204346
-----------------------------------------
| forces                  | 991.8211    |
| rew_unnormalized        | -8044.7583  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2583765   |
| test set forces         | 6.32e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 618         |
|    time_elapsed         | 7741        |
|    total_timesteps      | 1977600     |
| train/                  |             |
|    approx_kl            | 0.023992198 |
|    clip_fraction        | 0.293       |
|    clip_range           | 0.2         |
|    entropy_loss         | -40.2       |
|    explained_variance   | 0.928       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0166     |
|    n_updates            | 6170        |
|    policy_gradient_loss | -0.0185     |
|    std                  | 0.852       |
|

Forces on test set: 62916.540527
-----------------------------------------
| forces                  | 936.0259    |
| rew_unnormalized        | -5970.285   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.4039803   |
| test set forces         | 6.29e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 626         |
|    time_elapsed         | 7840        |
|    total_timesteps      | 2003200     |
| train/                  |             |
|    approx_kl            | 0.028715167 |
|    clip_fraction        | 0.313       |
|    clip_range           | 0.2         |
|    entropy_loss         | -40.1       |
|    explained_variance   | 0.811       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.00765    |
|    n_updates            | 6250        |
|    policy_gradient_loss | -0.0168     |
|    std                  | 0.849       |
|

Forces on test set: 62662.813232
-----------------------------------------
| forces                  | 955.8914    |
| rew_unnormalized        | -6564.95    |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.3512857   |
| test set forces         | 6.27e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 634         |
|    time_elapsed         | 7937        |
|    total_timesteps      | 2028800     |
| train/                  |             |
|    approx_kl            | 0.020425655 |
|    clip_fraction        | 0.295       |
|    clip_range           | 0.2         |
|    entropy_loss         | -40.1       |
|    explained_variance   | 0.943       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.00903    |
|    n_updates            | 6330        |
|    policy_gradient_loss | -0.0161     |
|    std                  | 0.849       |
|

Forces on test set: 62811.533447
-----------------------------------------
| forces                  | 954.46686   |
| rew_unnormalized        | -8404.211   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2060232   |
| test set forces         | 6.28e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 642         |
|    time_elapsed         | 8034        |
|    total_timesteps      | 2054400     |
| train/                  |             |
|    approx_kl            | 0.017900782 |
|    clip_fraction        | 0.299       |
|    clip_range           | 0.2         |
|    entropy_loss         | -40         |
|    explained_variance   | 0.891       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0107     |
|    n_updates            | 6410        |
|    policy_gradient_loss | -0.0206     |
|    std                  | 0.847       |
|

Forces on test set: 63461.697388
Storing agent and hyperparameters to disk...
-----------------------------------------
| forces                  | 967.62463   |
| rew_unnormalized        | -7588.34    |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2599288   |
| test set forces         | 6.35e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 650         |
|    time_elapsed         | 8132        |
|    total_timesteps      | 2080000     |
| train/                  |             |
|    approx_kl            | 0.027128486 |
|    clip_fraction        | 0.29        |
|    clip_range           | 0.2         |
|    entropy_loss         | -39.9       |
|    explained_variance   | 0.918       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0363     |
|    n_updates            | 6490        |
|    policy_gradient_loss | -0.0162     

Forces on test set: 64270.312500
-----------------------------------------
| forces                  | 971.2527    |
| rew_unnormalized        | -7404.8066  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.265747    |
| test set forces         | 6.43e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 658         |
|    time_elapsed         | 8231        |
|    total_timesteps      | 2105600     |
| train/                  |             |
|    approx_kl            | 0.029462729 |
|    clip_fraction        | 0.305       |
|    clip_range           | 0.2         |
|    entropy_loss         | -39.7       |
|    explained_variance   | 0.933       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0159     |
|    n_updates            | 6570        |
|    policy_gradient_loss | -0.0176     |
|    std                  | 0.84        |
|

Forces on test set: 63899.554443
-----------------------------------------
| forces                  | 966.0108    |
| rew_unnormalized        | -7137.4175  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.278124    |
| test set forces         | 6.39e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 666         |
|    time_elapsed         | 8329        |
|    total_timesteps      | 2131200     |
| train/                  |             |
|    approx_kl            | 0.025592184 |
|    clip_fraction        | 0.326       |
|    clip_range           | 0.2         |
|    entropy_loss         | -39.7       |
|    explained_variance   | 0.859       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0305     |
|    n_updates            | 6650        |
|    policy_gradient_loss | -0.0181     |
|    std                  | 0.839       |
|

Forces on test set: 63100.742188
-----------------------------------------
| forces                  | 966.4618    |
| rew_unnormalized        | -7280.135   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.25922     |
| test set forces         | 6.31e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 674         |
|    time_elapsed         | 8429        |
|    total_timesteps      | 2156800     |
| train/                  |             |
|    approx_kl            | 0.020002257 |
|    clip_fraction        | 0.321       |
|    clip_range           | 0.2         |
|    entropy_loss         | -39.6       |
|    explained_variance   | 0.87        |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0147     |
|    n_updates            | 6730        |
|    policy_gradient_loss | -0.0122     |
|    std                  | 0.837       |
|

Forces on test set: 62526.998535
-----------------------------------------
| forces                  | 916.6219    |
| rew_unnormalized        | -7189.2017  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.258786    |
| test set forces         | 6.25e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 682         |
|    time_elapsed         | 8529        |
|    total_timesteps      | 2182400     |
| train/                  |             |
|    approx_kl            | 0.029938882 |
|    clip_fraction        | 0.323       |
|    clip_range           | 0.2         |
|    entropy_loss         | -39.4       |
|    explained_variance   | 0.762       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.0103      |
|    n_updates            | 6810        |
|    policy_gradient_loss | -0.00987    |
|    std                  | 0.83        |
|

Forces on test set: 64056.698730
-----------------------------------------
| forces                  | 955.6866    |
| rew_unnormalized        | -6494.6826  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.3054734   |
| test set forces         | 6.41e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 690         |
|    time_elapsed         | 8628        |
|    total_timesteps      | 2208000     |
| train/                  |             |
|    approx_kl            | 0.027333278 |
|    clip_fraction        | 0.307       |
|    clip_range           | 0.2         |
|    entropy_loss         | -39.3       |
|    explained_variance   | 0.865       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0203     |
|    n_updates            | 6890        |
|    policy_gradient_loss | -0.0202     |
|    std                  | 0.829       |
|

Forces on test set: 64422.234497
----------------------------------------
| forces                  | 975.89026  |
| rew_unnormalized        | -7078.9814 |
| rollout/                |            |
|    ep_len_mean          | 32.0       |
|    ep_rew_mean          | 1.2530105  |
| test set forces         | 6.44e+04   |
| time/                   |            |
|    fps                  | 255        |
|    iterations           | 698        |
|    time_elapsed         | 8727       |
|    total_timesteps      | 2233600    |
| train/                  |            |
|    approx_kl            | 0.03457527 |
|    clip_fraction        | 0.316      |
|    clip_range           | 0.2        |
|    entropy_loss         | -39.1      |
|    explained_variance   | 0.883      |
|    learning_rate        | 0.0001     |
|    loss                 | -0.00428   |
|    n_updates            | 6970       |
|    policy_gradient_loss | -0.0183    |
|    std                  | 0.824      |
|    value_loss         

Forces on test set: 64621.833252
----------------------------------------
| forces                  | 948.25476  |
| rew_unnormalized        | -6105.7495 |
| rollout/                |            |
|    ep_len_mean          | 32.0       |
|    ep_rew_mean          | 1.3223641  |
| test set forces         | 6.46e+04   |
| time/                   |            |
|    fps                  | 255        |
|    iterations           | 706        |
|    time_elapsed         | 8826       |
|    total_timesteps      | 2259200    |
| train/                  |            |
|    approx_kl            | 0.03314711 |
|    clip_fraction        | 0.332      |
|    clip_range           | 0.2        |
|    entropy_loss         | -39.1      |
|    explained_variance   | 0.872      |
|    learning_rate        | 0.0001     |
|    loss                 | -0.00868   |
|    n_updates            | 7050       |
|    policy_gradient_loss | -0.0165    |
|    std                  | 0.823      |
|    value_loss         

Forces on test set: 63877.656494
-----------------------------------------
| forces                  | 935.4193    |
| rew_unnormalized        | -6857.9443  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.255986    |
| test set forces         | 6.39e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 714         |
|    time_elapsed         | 8925        |
|    total_timesteps      | 2284800     |
| train/                  |             |
|    approx_kl            | 0.038895167 |
|    clip_fraction        | 0.321       |
|    clip_range           | 0.2         |
|    entropy_loss         | -38.9       |
|    explained_variance   | 0.9         |
|    learning_rate        | 0.0001      |
|    loss                 | -0.039      |
|    n_updates            | 7130        |
|    policy_gradient_loss | -0.0206     |
|    std                  | 0.818       |
|

Forces on test set: 65124.347168
-----------------------------------------
| forces                  | 980.02686   |
| rew_unnormalized        | -6730.338   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2589967   |
| test set forces         | 6.51e+04    |
| time/                   |             |
|    fps                  | 256         |
|    iterations           | 722         |
|    time_elapsed         | 9024        |
|    total_timesteps      | 2310400     |
| train/                  |             |
|    approx_kl            | 0.022043528 |
|    clip_fraction        | 0.317       |
|    clip_range           | 0.2         |
|    entropy_loss         | -38.8       |
|    explained_variance   | 0.77        |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0184     |
|    n_updates            | 7210        |
|    policy_gradient_loss | -0.0146     |
|    std                  | 0.816       |
|

Forces on test set: 64469.297974
-----------------------------------------
| forces                  | 964.608     |
| rew_unnormalized        | -6265.5923  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2890887   |
| test set forces         | 6.45e+04    |
| time/                   |             |
|    fps                  | 256         |
|    iterations           | 730         |
|    time_elapsed         | 9124        |
|    total_timesteps      | 2336000     |
| train/                  |             |
|    approx_kl            | 0.035120122 |
|    clip_fraction        | 0.308       |
|    clip_range           | 0.2         |
|    entropy_loss         | -38.6       |
|    explained_variance   | 0.958       |
|    learning_rate        | 0.0001      |
|    loss                 | 0.00497     |
|    n_updates            | 7290        |
|    policy_gradient_loss | -0.0196     |
|    std                  | 0.811       |
|

Forces on test set: 64412.090088
----------------------------------------
| forces                  | 962.3379   |
| rew_unnormalized        | -6353.6323 |
| rollout/                |            |
|    ep_len_mean          | 32.0       |
|    ep_rew_mean          | 1.2752327  |
| test set forces         | 6.44e+04   |
| time/                   |            |
|    fps                  | 255        |
|    iterations           | 738        |
|    time_elapsed         | 9225       |
|    total_timesteps      | 2361600    |
| train/                  |            |
|    approx_kl            | 0.03441546 |
|    clip_fraction        | 0.297      |
|    clip_range           | 0.2        |
|    entropy_loss         | -38.5      |
|    explained_variance   | 0.938      |
|    learning_rate        | 0.0001     |
|    loss                 | -0.0233    |
|    n_updates            | 7370       |
|    policy_gradient_loss | -0.0156    |
|    std                  | 0.809      |
|    value_loss         

Forces on test set: 65133.815918
-----------------------------------------
| forces                  | 954.69257   |
| rew_unnormalized        | -6680.088   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2421726   |
| test set forces         | 6.51e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 746         |
|    time_elapsed         | 9325        |
|    total_timesteps      | 2387200     |
| train/                  |             |
|    approx_kl            | 0.028184952 |
|    clip_fraction        | 0.313       |
|    clip_range           | 0.2         |
|    entropy_loss         | -38.5       |
|    explained_variance   | 0.799       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.00386    |
|    n_updates            | 7450        |
|    policy_gradient_loss | -0.0198     |
|    std                  | 0.809       |
|

Forces on test set: 64715.579224
----------------------------------------
| forces                  | 956.09534  |
| rew_unnormalized        | -7021.2046 |
| rollout/                |            |
|    ep_len_mean          | 32.0       |
|    ep_rew_mean          | 1.2074254  |
| test set forces         | 6.47e+04   |
| time/                   |            |
|    fps                  | 256        |
|    iterations           | 754        |
|    time_elapsed         | 9423       |
|    total_timesteps      | 2412800    |
| train/                  |            |
|    approx_kl            | 0.03225807 |
|    clip_fraction        | 0.271      |
|    clip_range           | 0.2        |
|    entropy_loss         | -38.4      |
|    explained_variance   | 0.945      |
|    learning_rate        | 0.0001     |
|    loss                 | -0.0427    |
|    n_updates            | 7530       |
|    policy_gradient_loss | -0.0203    |
|    std                  | 0.805      |
|    value_loss         

Forces on test set: 64663.016479
-----------------------------------------
| forces                  | 977.462     |
| rew_unnormalized        | -7533.537   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.1587645   |
| test set forces         | 6.47e+04    |
| time/                   |             |
|    fps                  | 256         |
|    iterations           | 762         |
|    time_elapsed         | 9523        |
|    total_timesteps      | 2438400     |
| train/                  |             |
|    approx_kl            | 0.027924748 |
|    clip_fraction        | 0.304       |
|    clip_range           | 0.2         |
|    entropy_loss         | -38.3       |
|    explained_variance   | 0.91        |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0333     |
|    n_updates            | 7610        |
|    policy_gradient_loss | -0.0172     |
|    std                  | 0.804       |
|

Forces on test set: 64515.574585
-----------------------------------------
| forces                  | 942.915     |
| rew_unnormalized        | -6365.9434  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2477053   |
| test set forces         | 6.45e+04    |
| time/                   |             |
|    fps                  | 256         |
|    iterations           | 770         |
|    time_elapsed         | 9623        |
|    total_timesteps      | 2464000     |
| train/                  |             |
|    approx_kl            | 0.030174747 |
|    clip_fraction        | 0.332       |
|    clip_range           | 0.2         |
|    entropy_loss         | -38.1       |
|    explained_variance   | 0.888       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0406     |
|    n_updates            | 7690        |
|    policy_gradient_loss | -0.0238     |
|    std                  | 0.8         |
|

Forces on test set: 64781.955200
-----------------------------------------
| forces                  | 944.0209    |
| rew_unnormalized        | -5967.0215  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2740842   |
| test set forces         | 6.48e+04    |
| time/                   |             |
|    fps                  | 256         |
|    iterations           | 778         |
|    time_elapsed         | 9722        |
|    total_timesteps      | 2489600     |
| train/                  |             |
|    approx_kl            | 0.019821748 |
|    clip_fraction        | 0.294       |
|    clip_range           | 0.2         |
|    entropy_loss         | -38.1       |
|    explained_variance   | 0.955       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0172     |
|    n_updates            | 7770        |
|    policy_gradient_loss | -0.0205     |
|    std                  | 0.799       |
|

Forces on test set: 66333.735107
----------------------------------------
| forces                  | 959.53595  |
| rew_unnormalized        | -6242.228  |
| rollout/                |            |
|    ep_len_mean          | 32.0       |
|    ep_rew_mean          | 1.2446871  |
| test set forces         | 6.63e+04   |
| time/                   |            |
|    fps                  | 256        |
|    iterations           | 786        |
|    time_elapsed         | 9821       |
|    total_timesteps      | 2515200    |
| train/                  |            |
|    approx_kl            | 0.03166013 |
|    clip_fraction        | 0.34       |
|    clip_range           | 0.2        |
|    entropy_loss         | -38        |
|    explained_variance   | 0.739      |
|    learning_rate        | 0.0001     |
|    loss                 | -0.035     |
|    n_updates            | 7850       |
|    policy_gradient_loss | -0.0208    |
|    std                  | 0.796      |
|    value_loss         

Forces on test set: 66223.759644
-----------------------------------------
| forces                  | 977.18024   |
| rew_unnormalized        | -6513.0337  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2155644   |
| test set forces         | 6.62e+04    |
| time/                   |             |
|    fps                  | 256         |
|    iterations           | 794         |
|    time_elapsed         | 9919        |
|    total_timesteps      | 2540800     |
| train/                  |             |
|    approx_kl            | 0.012599927 |
|    clip_fraction        | 0.293       |
|    clip_range           | 0.2         |
|    entropy_loss         | -37.9       |
|    explained_variance   | 0.945       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.03       |
|    n_updates            | 7930        |
|    policy_gradient_loss | -0.0198     |
|    std                  | 0.793       |
|

Forces on test set: 65745.286865
-----------------------------------------
| forces                  | 952.72186   |
| rew_unnormalized        | -6695.448   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.1938045   |
| test set forces         | 6.57e+04    |
| time/                   |             |
|    fps                  | 256         |
|    iterations           | 802         |
|    time_elapsed         | 10016       |
|    total_timesteps      | 2566400     |
| train/                  |             |
|    approx_kl            | 0.025276031 |
|    clip_fraction        | 0.337       |
|    clip_range           | 0.2         |
|    entropy_loss         | -37.8       |
|    explained_variance   | 0.795       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.015      |
|    n_updates            | 8010        |
|    policy_gradient_loss | -0.016      |
|    std                  | 0.792       |
|

Forces on test set: 65448.494263
-----------------------------------------
| forces                  | 941.2317    |
| rew_unnormalized        | -5952.1143  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2500672   |
| test set forces         | 6.54e+04    |
| time/                   |             |
|    fps                  | 256         |
|    iterations           | 810         |
|    time_elapsed         | 10113       |
|    total_timesteps      | 2592000     |
| train/                  |             |
|    approx_kl            | 0.019230265 |
|    clip_fraction        | 0.357       |
|    clip_range           | 0.2         |
|    entropy_loss         | -37.6       |
|    explained_variance   | 0.802       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0238     |
|    n_updates            | 8090        |
|    policy_gradient_loss | -0.0172     |
|    std                  | 0.788       |
|

Forces on test set: 65359.868164
-----------------------------------------
| forces                  | 931.1305    |
| rew_unnormalized        | -5845.8926  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2528378   |
| test set forces         | 6.54e+04    |
| time/                   |             |
|    fps                  | 256         |
|    iterations           | 818         |
|    time_elapsed         | 10215       |
|    total_timesteps      | 2617600     |
| train/                  |             |
|    approx_kl            | 0.023500308 |
|    clip_fraction        | 0.329       |
|    clip_range           | 0.2         |
|    entropy_loss         | -37.5       |
|    explained_variance   | 0.935       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.015      |
|    n_updates            | 8170        |
|    policy_gradient_loss | -0.0225     |
|    std                  | 0.784       |
|

Forces on test set: 64606.786377
-----------------------------------------
| forces                  | 946.63245   |
| rew_unnormalized        | -6974.775   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.1508192   |
| test set forces         | 6.46e+04    |
| time/                   |             |
|    fps                  | 256         |
|    iterations           | 826         |
|    time_elapsed         | 10316       |
|    total_timesteps      | 2643200     |
| train/                  |             |
|    approx_kl            | 0.021149002 |
|    clip_fraction        | 0.359       |
|    clip_range           | 0.2         |
|    entropy_loss         | -37.3       |
|    explained_variance   | 0.924       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0337     |
|    n_updates            | 8250        |
|    policy_gradient_loss | -0.0175     |
|    std                  | 0.78        |
|

Forces on test set: 65758.780884
-----------------------------------------
| forces                  | 935.5033    |
| rew_unnormalized        | -5945.1826  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2325315   |
| test set forces         | 6.58e+04    |
| time/                   |             |
|    fps                  | 256         |
|    iterations           | 834         |
|    time_elapsed         | 10416       |
|    total_timesteps      | 2668800     |
| train/                  |             |
|    approx_kl            | 0.011448293 |
|    clip_fraction        | 0.32        |
|    clip_range           | 0.2         |
|    entropy_loss         | -37.2       |
|    explained_variance   | 0.471       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0117     |
|    n_updates            | 8330        |
|    policy_gradient_loss | -0.0188     |
|    std                  | 0.777       |
|

Forces on test set: 65208.139404
-----------------------------------------
| forces                  | 944.71606   |
| rew_unnormalized        | -5776.45    |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.241011    |
| test set forces         | 6.52e+04    |
| time/                   |             |
|    fps                  | 256         |
|    iterations           | 842         |
|    time_elapsed         | 10516       |
|    total_timesteps      | 2694400     |
| train/                  |             |
|    approx_kl            | 0.026629556 |
|    clip_fraction        | 0.34        |
|    clip_range           | 0.2         |
|    entropy_loss         | -37.1       |
|    explained_variance   | 0.923       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0259     |
|    n_updates            | 8410        |
|    policy_gradient_loss | -0.0183     |
|    std                  | 0.774       |
|

Forces on test set: 64873.525269
Storing agent and hyperparameters to disk...
-----------------------------------------
| forces                  | 952.73694   |
| rew_unnormalized        | -5858.212   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.228068    |
| test set forces         | 6.49e+04    |
| time/                   |             |
|    fps                  | 256         |
|    iterations           | 850         |
|    time_elapsed         | 10617       |
|    total_timesteps      | 2720000     |
| train/                  |             |
|    approx_kl            | 0.022854365 |
|    clip_fraction        | 0.317       |
|    clip_range           | 0.2         |
|    entropy_loss         | -37         |
|    explained_variance   | 0.757       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0188     |
|    n_updates            | 8490        |
|    policy_gradient_loss | -0.0189     

Forces on test set: 66559.593262
-----------------------------------------
| forces                  | 938.0429    |
| rew_unnormalized        | -5695.7896  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2362018   |
| test set forces         | 6.66e+04    |
| time/                   |             |
|    fps                  | 256         |
|    iterations           | 858         |
|    time_elapsed         | 10719       |
|    total_timesteps      | 2745600     |
| train/                  |             |
|    approx_kl            | 0.023820352 |
|    clip_fraction        | 0.319       |
|    clip_range           | 0.2         |
|    entropy_loss         | -36.9       |
|    explained_variance   | 0.923       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0343     |
|    n_updates            | 8570        |
|    policy_gradient_loss | -0.02       |
|    std                  | 0.768       |
|

Forces on test set: 66192.506836
----------------------------------------
| forces                  | 932.2715   |
| rew_unnormalized        | -5053.622  |
| rollout/                |            |
|    ep_len_mean          | 32.0       |
|    ep_rew_mean          | 1.2865474  |
| test set forces         | 6.62e+04   |
| time/                   |            |
|    fps                  | 256        |
|    iterations           | 866        |
|    time_elapsed         | 10822      |
|    total_timesteps      | 2771200    |
| train/                  |            |
|    approx_kl            | 0.02808817 |
|    clip_fraction        | 0.378      |
|    clip_range           | 0.2        |
|    entropy_loss         | -36.7      |
|    explained_variance   | 0.794      |
|    learning_rate        | 0.0001     |
|    loss                 | -9.81e-05  |
|    n_updates            | 8650       |
|    policy_gradient_loss | -0.016     |
|    std                  | 0.765      |
|    value_loss         

Forces on test set: 67395.594360
-----------------------------------------
| forces                  | 953.4827    |
| rew_unnormalized        | -6689.0737  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.1378088   |
| test set forces         | 6.74e+04    |
| time/                   |             |
|    fps                  | 256         |
|    iterations           | 874         |
|    time_elapsed         | 10924       |
|    total_timesteps      | 2796800     |
| train/                  |             |
|    approx_kl            | 0.030806424 |
|    clip_fraction        | 0.363       |
|    clip_range           | 0.2         |
|    entropy_loss         | -36.6       |
|    explained_variance   | 0.941       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0243     |
|    n_updates            | 8730        |
|    policy_gradient_loss | -0.0157     |
|    std                  | 0.763       |
|

Forces on test set: 67823.075806
-----------------------------------------
| forces                  | 972.0584    |
| rew_unnormalized        | -5972.341   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.1948876   |
| test set forces         | 6.78e+04    |
| time/                   |             |
|    fps                  | 256         |
|    iterations           | 882         |
|    time_elapsed         | 11024       |
|    total_timesteps      | 2822400     |
| train/                  |             |
|    approx_kl            | 0.031099726 |
|    clip_fraction        | 0.372       |
|    clip_range           | 0.2         |
|    entropy_loss         | -36.5       |
|    explained_variance   | 0.889       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.015      |
|    n_updates            | 8810        |
|    policy_gradient_loss | -0.0202     |
|    std                  | 0.761       |
|

Forces on test set: 68016.065552
-----------------------------------------
| forces                  | 960.95306   |
| rew_unnormalized        | -5652.712   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2174965   |
| test set forces         | 6.8e+04     |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 890         |
|    time_elapsed         | 11125       |
|    total_timesteps      | 2848000     |
| train/                  |             |
|    approx_kl            | 0.037399434 |
|    clip_fraction        | 0.343       |
|    clip_range           | 0.2         |
|    entropy_loss         | -36.5       |
|    explained_variance   | 0.945       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0383     |
|    n_updates            | 8890        |
|    policy_gradient_loss | -0.018      |
|    std                  | 0.759       |
|

Forces on test set: 67098.126587
-----------------------------------------
| forces                  | 930.9766    |
| rew_unnormalized        | -5328.5576  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2405893   |
| test set forces         | 6.71e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 898         |
|    time_elapsed         | 11225       |
|    total_timesteps      | 2873600     |
| train/                  |             |
|    approx_kl            | 0.031241065 |
|    clip_fraction        | 0.378       |
|    clip_range           | 0.2         |
|    entropy_loss         | -36.3       |
|    explained_variance   | 0.912       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0253     |
|    n_updates            | 8970        |
|    policy_gradient_loss | -0.0206     |
|    std                  | 0.756       |
|

Forces on test set: 67186.339233
----------------------------------------
| forces                  | 933.1809   |
| rew_unnormalized        | -5710.4873 |
| rollout/                |            |
|    ep_len_mean          | 32.0       |
|    ep_rew_mean          | 1.2012774  |
| test set forces         | 6.72e+04   |
| time/                   |            |
|    fps                  | 255        |
|    iterations           | 906        |
|    time_elapsed         | 11327      |
|    total_timesteps      | 2899200    |
| train/                  |            |
|    approx_kl            | 0.04332125 |
|    clip_fraction        | 0.349      |
|    clip_range           | 0.2        |
|    entropy_loss         | -36.2      |
|    explained_variance   | 0.944      |
|    learning_rate        | 0.0001     |
|    loss                 | -0.0495    |
|    n_updates            | 9050       |
|    policy_gradient_loss | -0.0235    |
|    std                  | 0.754      |
|    value_loss         

Forces on test set: 67610.647583
-----------------------------------------
| forces                  | 945.1387    |
| rew_unnormalized        | -5687.7036  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.1979417   |
| test set forces         | 6.76e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 914         |
|    time_elapsed         | 11429       |
|    total_timesteps      | 2924800     |
| train/                  |             |
|    approx_kl            | 0.040395964 |
|    clip_fraction        | 0.367       |
|    clip_range           | 0.2         |
|    entropy_loss         | -36.2       |
|    explained_variance   | 0.94        |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0514     |
|    n_updates            | 9130        |
|    policy_gradient_loss | -0.0206     |
|    std                  | 0.754       |
|

Forces on test set: 67618.192871
-----------------------------------------
| forces                  | 948.1902    |
| rew_unnormalized        | -4662.6626  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2846338   |
| test set forces         | 6.76e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 922         |
|    time_elapsed         | 11531       |
|    total_timesteps      | 2950400     |
| train/                  |             |
|    approx_kl            | 0.028346455 |
|    clip_fraction        | 0.343       |
|    clip_range           | 0.2         |
|    entropy_loss         | -36.2       |
|    explained_variance   | 0.931       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0363     |
|    n_updates            | 9210        |
|    policy_gradient_loss | -0.0218     |
|    std                  | 0.754       |
|

Forces on test set: 67183.422974
-----------------------------------------
| forces                  | 952.2335    |
| rew_unnormalized        | -6528.2236  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.1108124   |
| test set forces         | 6.72e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 930         |
|    time_elapsed         | 11635       |
|    total_timesteps      | 2976000     |
| train/                  |             |
|    approx_kl            | 0.027594278 |
|    clip_fraction        | 0.332       |
|    clip_range           | 0.2         |
|    entropy_loss         | -36.1       |
|    explained_variance   | 0.966       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0394     |
|    n_updates            | 9290        |
|    policy_gradient_loss | -0.027      |
|    std                  | 0.751       |
|

Forces on test set: 67836.489502
-----------------------------------------
| forces                  | 940.526     |
| rew_unnormalized        | -5039.5234  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2402651   |
| test set forces         | 6.78e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 938         |
|    time_elapsed         | 11741       |
|    total_timesteps      | 3001600     |
| train/                  |             |
|    approx_kl            | 0.030883757 |
|    clip_fraction        | 0.343       |
|    clip_range           | 0.2         |
|    entropy_loss         | -36.1       |
|    explained_variance   | 0.932       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0161     |
|    n_updates            | 9370        |
|    policy_gradient_loss | -0.025      |
|    std                  | 0.75        |
|

Forces on test set: 66722.537476
-----------------------------------------
| forces                  | 916.9893    |
| rew_unnormalized        | -4964.0728  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2418433   |
| test set forces         | 6.67e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 946         |
|    time_elapsed         | 11845       |
|    total_timesteps      | 3027200     |
| train/                  |             |
|    approx_kl            | 0.027110288 |
|    clip_fraction        | 0.339       |
|    clip_range           | 0.2         |
|    entropy_loss         | -36         |
|    explained_variance   | 0.945       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0201     |
|    n_updates            | 9450        |
|    policy_gradient_loss | -0.0186     |
|    std                  | 0.748       |
|

Forces on test set: 66865.092896
-----------------------------------------
| forces                  | 917.8571    |
| rew_unnormalized        | -4674.128   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2632284   |
| test set forces         | 6.69e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 954         |
|    time_elapsed         | 11949       |
|    total_timesteps      | 3052800     |
| train/                  |             |
|    approx_kl            | 0.041056648 |
|    clip_fraction        | 0.393       |
|    clip_range           | 0.2         |
|    entropy_loss         | -35.9       |
|    explained_variance   | 0.891       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0265     |
|    n_updates            | 9530        |
|    policy_gradient_loss | -0.0196     |
|    std                  | 0.745       |
|

Forces on test set: 66006.633545
-----------------------------------------
| forces                  | 942.2854    |
| rew_unnormalized        | -4735.7236  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2524091   |
| test set forces         | 6.6e+04     |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 962         |
|    time_elapsed         | 12054       |
|    total_timesteps      | 3078400     |
| train/                  |             |
|    approx_kl            | 0.036437035 |
|    clip_fraction        | 0.36        |
|    clip_range           | 0.2         |
|    entropy_loss         | -35.8       |
|    explained_variance   | 0.943       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0202     |
|    n_updates            | 9610        |
|    policy_gradient_loss | -0.0228     |
|    std                  | 0.742       |
|

Forces on test set: 67034.364624
-----------------------------------------
| forces                  | 982.74475   |
| rew_unnormalized        | -5660.987   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.1619529   |
| test set forces         | 6.7e+04     |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 970         |
|    time_elapsed         | 12157       |
|    total_timesteps      | 3104000     |
| train/                  |             |
|    approx_kl            | 0.042912778 |
|    clip_fraction        | 0.345       |
|    clip_range           | 0.2         |
|    entropy_loss         | -35.7       |
|    explained_variance   | 0.953       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0406     |
|    n_updates            | 9690        |
|    policy_gradient_loss | -0.0257     |
|    std                  | 0.74        |
|

Forces on test set: 68347.049438
-----------------------------------------
| forces                  | 955.6172    |
| rew_unnormalized        | -4863.553   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.2308255   |
| test set forces         | 6.83e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 978         |
|    time_elapsed         | 12261       |
|    total_timesteps      | 3129600     |
| train/                  |             |
|    approx_kl            | 0.041225195 |
|    clip_fraction        | 0.38        |
|    clip_range           | 0.2         |
|    entropy_loss         | -35.6       |
|    explained_variance   | 0.945       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.0383     |
|    n_updates            | 9770        |
|    policy_gradient_loss | -0.0208     |
|    std                  | 0.738       |
|

Forces on test set: 68810.727295
-----------------------------------------
| forces                  | 990.0867    |
| rew_unnormalized        | -6608.841   |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.0637099   |
| test set forces         | 6.88e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 986         |
|    time_elapsed         | 12365       |
|    total_timesteps      | 3155200     |
| train/                  |             |
|    approx_kl            | 0.049877357 |
|    clip_fraction        | 0.379       |
|    clip_range           | 0.2         |
|    entropy_loss         | -35.4       |
|    explained_variance   | 0.951       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.00366    |
|    n_updates            | 9850        |
|    policy_gradient_loss | -0.023      |
|    std                  | 0.734       |
|

Forces on test set: 67934.624512
-----------------------------------------
| forces                  | 936.6354    |
| rew_unnormalized        | -5056.4937  |
| rollout/                |             |
|    ep_len_mean          | 32.0        |
|    ep_rew_mean          | 1.203294    |
| test set forces         | 6.79e+04    |
| time/                   |             |
|    fps                  | 255         |
|    iterations           | 994         |
|    time_elapsed         | 12469       |
|    total_timesteps      | 3180800     |
| train/                  |             |
|    approx_kl            | 0.038654134 |
|    clip_fraction        | 0.332       |
|    clip_range           | 0.2         |
|    entropy_loss         | -35.3       |
|    explained_variance   | 0.971       |
|    learning_rate        | 0.0001      |
|    loss                 | -0.00474    |
|    n_updates            | 9930        |
|    policy_gradient_loss | -0.0256     |
|    std                  | 0.731       |
|

Let's see how the reward evolved during training

In [None]:
trainer.plot()

Run the cell below to open the tensorboard logs. If necessary, replace the path with the one leading to the log folder of your experiment

In [None]:
%load_ext tensorboard
%tensorboard --logdir rl-models/ControlBurgersBench3/tensorboard-log

Now we can take a look at how the agent is performing:

In [None]:
env = trainer.env

obs = env.reset()
bplt.burgers_figure('RL Reconstruction')
plt.plot(obs[0][:,0], color=bplt.gradient_color(0, step_count+1), linewidth=0.8)
plt.legend(['Initial state in dark red, final state in dark blue,'])
plt.ylim(-2, 2)
for frame in range(1, step_count):
    act = trainer.predict(obs, deterministic=True)
    obs, _, _, _ = env.step(act)
    plt.plot(obs[0][:,0], color=bplt.gradient_color(frame, step_count+1), linewidth=0.8)
plt.plot(env.goal_state.velocity.data[0,:,0])