## **ROBOTICS 24 - AI LEARNS TO WALK :**
### **In this section, the Reinforcement Model is actually trained and evaluated.**
---

### **USING REINFORCEMENT LEARNING ON - ANTBULLET-V0**

In [12]:
import gym, time, pybullet, pybullet_envs
import numpy as np
import torch.nn as nn

from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from tqdm import tqdm

env = gym.make("AntBulletEnv-v0")

policy_kwargs = dict(activation_fn=nn.ReLU,net_arch=[dict(pi=[128, 128], vf=[128, 128])])
model = PPO("MlpPolicy", env, policy_kwargs=policy_kwargs, verbose=1)

max_average_reward = -np.inf
num_episodes = 100

print("Starting Training...")
for i in tqdm(range(1, num_episodes + 1)):
    print(f"\nTraining Episode {i} started.")

    model.learn(total_timesteps=10000)

    mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=10)
    print(f"Episode: {i}, Mean Reward: {mean_reward:.2f}, Std Reward: {std_reward:.2f}, Total Timesteps: {model.num_timesteps}, Time: {time.asctime(time.localtime(time.time()))}")

    if i % 100 == 0:
        model.save(f"AntBulletEnv-v0_PPO_{i}")
        print(f"Model saved at Episode {i}.")
    
    if mean_reward > max_average_reward:
        max_average_reward = mean_reward
        print(f"New best model with Mean Reward: {mean_reward:.2f}. Saving model.")
        model.save("AntBulletEnv-v0_PPO_Best")

print("Training completed.")
env.close()
pybullet.disconnect()

Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.




Starting Training...


  0%|          | 0/100 [00:00<?, ?it/s]


Training Episode 1 started.
-----------------------------
| time/              |      |
|    fps             | 542  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 460         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.009790984 |
|    clip_fraction        | 0.102       |
|    clip_range           | 0.2         |
|    entropy_loss         | -11.3       |
|    explained_variance   | 0.0223      |
|    learning_rate        | 0.0003      |
|    loss                 | 0.599       |
|    n_updates            | 10          |
|    policy_gradient_loss | -0.0113     |
|    std                  | 0.993       |
|    value_loss           | 7           |
-----

  1%|          | 1/100 [00:25<42:10, 25.56s/it]

Episode: 1, Mean Reward: 10.62, Std Reward: 4.02, Total Timesteps: 10240, Time: Tue Jan 28 02:28:45 2025
New best model with Mean Reward: 10.62. Saving model.

Training Episode 2 started.
-----------------------------
| time/              |      |
|    fps             | 534  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 455         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.007976798 |
|    clip_fraction        | 0.0945      |
|    clip_range           | 0.2         |
|    entropy_loss         | -11.3       |
|    explained_variance   | -1.12       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.761       |
|    n_updates

  2%|▏         | 2/100 [00:51<42:16, 25.88s/it]

Episode: 2, Mean Reward: 17.78, Std Reward: 4.11, Total Timesteps: 10240, Time: Tue Jan 28 02:29:11 2025
New best model with Mean Reward: 17.78. Saving model.

Training Episode 3 started.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 266      |
|    ep_rew_mean     | 127      |
| time/              |          |
|    fps             | 546      |
|    iterations      | 1        |
|    time_elapsed    | 3        |
|    total_timesteps | 2048     |
---------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 266         |
|    ep_rew_mean          | 127         |
| time/                   |             |
|    fps                  | 478         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.010884056 |
|    clip_

  3%|▎         | 3/100 [01:17<41:58, 25.97s/it]

Episode: 3, Mean Reward: 78.75, Std Reward: 44.83, Total Timesteps: 10240, Time: Tue Jan 28 02:29:37 2025
New best model with Mean Reward: 78.75. Saving model.

Training Episode 4 started.
-----------------------------
| time/              |      |
|    fps             | 530  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 466         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.011789501 |
|    clip_fraction        | 0.142       |
|    clip_range           | 0.2         |
|    entropy_loss         | -11.1       |
|    explained_variance   | 0.387       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.301       |
|    n_update

  4%|▍         | 4/100 [04:02<2:08:58, 80.61s/it]

Episode: 4, Mean Reward: 2723.15, Std Reward: 372.14, Total Timesteps: 10240, Time: Tue Jan 28 02:32:21 2025
New best model with Mean Reward: 2723.15. Saving model.

Training Episode 5 started.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 620      |
|    ep_rew_mean     | 317      |
| time/              |          |
|    fps             | 552      |
|    iterations      | 1        |
|    time_elapsed    | 3        |
|    total_timesteps | 2048     |
---------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 620         |
|    ep_rew_mean          | 317         |
| time/                   |             |
|    fps                  | 468         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.012265472 |
|   

  5%|▌         | 5/100 [06:43<2:53:54, 109.84s/it]

Episode: 5, Mean Reward: 4846.30, Std Reward: 1397.03, Total Timesteps: 10240, Time: Tue Jan 28 02:35:03 2025
New best model with Mean Reward: 4846.30. Saving model.

Training Episode 6 started.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 707      |
|    ep_rew_mean     | 343      |
| time/              |          |
|    fps             | 514      |
|    iterations      | 1        |
|    time_elapsed    | 3        |
|    total_timesteps | 2048     |
---------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 707         |
|    ep_rew_mean          | 343         |
| time/                   |             |
|    fps                  | 382         |
|    iterations           | 2           |
|    time_elapsed         | 10          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.012139073 |
|  

  6%|▌         | 6/100 [09:38<3:26:54, 132.06s/it]

Episode: 6, Mean Reward: 5974.41, Std Reward: 758.80, Total Timesteps: 10240, Time: Tue Jan 28 02:37:58 2025
New best model with Mean Reward: 5974.41. Saving model.

Training Episode 7 started.
-----------------------------
| time/              |      |
|    fps             | 529  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 467         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.012667354 |
|    clip_fraction        | 0.15        |
|    clip_range           | 0.2         |
|    entropy_loss         | -10.6       |
|    explained_variance   | 0.825       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.288       |
|    n_u

  7%|▋         | 7/100 [12:25<3:42:05, 143.28s/it]

Episode: 7, Mean Reward: 5131.49, Std Reward: 1111.45, Total Timesteps: 10240, Time: Tue Jan 28 02:40:45 2025

Training Episode 8 started.
-----------------------------
| time/              |      |
|    fps             | 533  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 460         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.011537492 |
|    clip_fraction        | 0.183       |
|    clip_range           | 0.2         |
|    entropy_loss         | -10.5       |
|    explained_variance   | 0.113       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.63        |
|    n_updates            | 360         |
|    policy_gradient_

  8%|▊         | 8/100 [15:05<3:47:55, 148.64s/it]

Episode: 8, Mean Reward: 3599.73, Std Reward: 625.96, Total Timesteps: 10240, Time: Tue Jan 28 02:43:25 2025

Training Episode 9 started.
-----------------------------
| time/              |      |
|    fps             | 546  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 481         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.013354063 |
|    clip_fraction        | 0.19        |
|    clip_range           | 0.2         |
|    entropy_loss         | -10.3       |
|    explained_variance   | 0.342       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.404       |
|    n_updates            | 410         |
|    policy_gradient_l

  9%|▉         | 9/100 [17:39<3:48:11, 150.46s/it]

Episode: 9, Mean Reward: 4549.70, Std Reward: 785.58, Total Timesteps: 10240, Time: Tue Jan 28 02:45:59 2025

Training Episode 10 started.
-----------------------------
| time/              |      |
|    fps             | 532  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 467        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.01260944 |
|    clip_fraction        | 0.156      |
|    clip_range           | 0.2        |
|    entropy_loss         | -10.1      |
|    explained_variance   | 0.608      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.211      |
|    n_updates            | 460        |
|    policy_gradient_loss | -0.0229 

 10%|█         | 10/100 [20:20<3:50:09, 153.44s/it]

Episode: 10, Mean Reward: 6185.36, Std Reward: 590.56, Total Timesteps: 10240, Time: Tue Jan 28 02:48:39 2025
New best model with Mean Reward: 6185.36. Saving model.

Training Episode 11 started.
-----------------------------
| time/              |      |
|    fps             | 571  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 494         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.019123645 |
|    clip_fraction        | 0.18        |
|    clip_range           | 0.2         |
|    entropy_loss         | -10.1       |
|    explained_variance   | 0.463       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.259       |
|    n

 11%|█         | 11/100 [22:59<3:50:29, 155.39s/it]

Episode: 11, Mean Reward: 4559.98, Std Reward: 884.98, Total Timesteps: 10240, Time: Tue Jan 28 02:51:19 2025

Training Episode 12 started.
-----------------------------
| time/              |      |
|    fps             | 542  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 466         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.017196454 |
|    clip_fraction        | 0.184       |
|    clip_range           | 0.2         |
|    entropy_loss         | -9.88       |
|    explained_variance   | 0.863       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.296       |
|    n_updates            | 560         |
|    policy_gradient

 12%|█▏        | 12/100 [25:38<3:49:11, 156.27s/it]

Episode: 12, Mean Reward: 6501.94, Std Reward: 962.05, Total Timesteps: 10240, Time: Tue Jan 28 02:53:57 2025
New best model with Mean Reward: 6501.94. Saving model.

Training Episode 13 started.
-----------------------------
| time/              |      |
|    fps             | 527  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 453         |
|    iterations           | 2           |
|    time_elapsed         | 9           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.017762372 |
|    clip_fraction        | 0.196       |
|    clip_range           | 0.2         |
|    entropy_loss         | -9.71       |
|    explained_variance   | 0.878       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.302       |
|    n

 13%|█▎        | 13/100 [28:21<3:49:36, 158.35s/it]

Episode: 13, Mean Reward: 5483.86, Std Reward: 916.65, Total Timesteps: 10240, Time: Tue Jan 28 02:56:41 2025

Training Episode 14 started.
-----------------------------
| time/              |      |
|    fps             | 503  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 444         |
|    iterations           | 2           |
|    time_elapsed         | 9           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.015022311 |
|    clip_fraction        | 0.191       |
|    clip_range           | 0.2         |
|    entropy_loss         | -9.51       |
|    explained_variance   | -0.273      |
|    learning_rate        | 0.0003      |
|    loss                 | 0.657       |
|    n_updates            | 660         |
|    policy_gradient

 14%|█▍        | 14/100 [31:43<4:05:58, 171.61s/it]

Episode: 14, Mean Reward: 6512.86, Std Reward: 306.83, Total Timesteps: 10240, Time: Tue Jan 28 03:00:03 2025
New best model with Mean Reward: 6512.86. Saving model.

Training Episode 15 started.
-----------------------------
| time/              |      |
|    fps             | 522  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 454         |
|    iterations           | 2           |
|    time_elapsed         | 9           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.012602566 |
|    clip_fraction        | 0.177       |
|    clip_range           | 0.2         |
|    entropy_loss         | -9.38       |
|    explained_variance   | 0.749       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.33        |
|    n

 15%|█▌        | 15/100 [34:20<3:57:04, 167.35s/it]

Episode: 15, Mean Reward: 6082.87, Std Reward: 443.80, Total Timesteps: 10240, Time: Tue Jan 28 03:02:40 2025

Training Episode 16 started.
-----------------------------
| time/              |      |
|    fps             | 561  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 473         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.020512624 |
|    clip_fraction        | 0.275       |
|    clip_range           | 0.2         |
|    entropy_loss         | -9.2        |
|    explained_variance   | 0.697       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.454       |
|    n_updates            | 760         |
|    policy_gradient

 16%|█▌        | 16/100 [37:00<3:50:58, 164.98s/it]

Episode: 16, Mean Reward: 6848.73, Std Reward: 906.50, Total Timesteps: 10240, Time: Tue Jan 28 03:05:20 2025
New best model with Mean Reward: 6848.73. Saving model.

Training Episode 17 started.
-----------------------------
| time/              |      |
|    fps             | 556  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 397         |
|    iterations           | 2           |
|    time_elapsed         | 10          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.017035048 |
|    clip_fraction        | 0.214       |
|    clip_range           | 0.2         |
|    entropy_loss         | -9.06       |
|    explained_variance   | 0.38        |
|    learning_rate        | 0.0003      |
|    loss                 | 0.79        |
|    n

 17%|█▋        | 17/100 [39:42<3:46:55, 164.05s/it]

Episode: 17, Mean Reward: 7245.61, Std Reward: 746.61, Total Timesteps: 10240, Time: Tue Jan 28 03:08:02 2025
New best model with Mean Reward: 7245.61. Saving model.

Training Episode 18 started.
-----------------------------
| time/              |      |
|    fps             | 528  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 458         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.020714743 |
|    clip_fraction        | 0.254       |
|    clip_range           | 0.2         |
|    entropy_loss         | -8.85       |
|    explained_variance   | 0.633       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.317       |
|    n

 18%|█▊        | 18/100 [5:34:10<123:32:25, 5423.72s/it]

Episode: 18, Mean Reward: 7053.59, Std Reward: 1019.28, Total Timesteps: 10240, Time: Tue Jan 28 08:02:29 2025

Training Episode 19 started.
-----------------------------
| time/              |      |
|    fps             | 538  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 453         |
|    iterations           | 2           |
|    time_elapsed         | 9           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.024106264 |
|    clip_fraction        | 0.274       |
|    clip_range           | 0.2         |
|    entropy_loss         | -8.68       |
|    explained_variance   | 0.889       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.515       |
|    n_updates            | 910         |
|    policy_gradien

 19%|█▉        | 19/100 [5:36:52<86:28:53, 3843.63s/it] 

Episode: 19, Mean Reward: 6861.67, Std Reward: 986.75, Total Timesteps: 10240, Time: Tue Jan 28 08:05:12 2025

Training Episode 20 started.
-----------------------------
| time/              |      |
|    fps             | 506  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 448         |
|    iterations           | 2           |
|    time_elapsed         | 9           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.023647925 |
|    clip_fraction        | 0.217       |
|    clip_range           | 0.2         |
|    entropy_loss         | -8.48       |
|    explained_variance   | 0.715       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.381       |
|    n_updates            | 960         |
|    policy_gradient

 20%|██        | 20/100 [5:39:32<60:50:02, 2737.54s/it]

Episode: 20, Mean Reward: 7115.31, Std Reward: 820.98, Total Timesteps: 10240, Time: Tue Jan 28 08:07:52 2025

Training Episode 21 started.
-----------------------------
| time/              |      |
|    fps             | 551  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 487        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.02489267 |
|    clip_fraction        | 0.288      |
|    clip_range           | 0.2        |
|    entropy_loss         | -8.33      |
|    explained_variance   | 0.792      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.308      |
|    n_updates            | 1010       |
|    policy_gradient_loss | -0.0245

 21%|██        | 21/100 [7:39:03<89:16:44, 4068.41s/it]

Episode: 21, Mean Reward: 6557.49, Std Reward: 414.67, Total Timesteps: 10240, Time: Tue Jan 28 10:07:23 2025

Training Episode 22 started.
-----------------------------
| time/              |      |
|    fps             | 525  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 433         |
|    iterations           | 2           |
|    time_elapsed         | 9           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.026365094 |
|    clip_fraction        | 0.307       |
|    clip_range           | 0.2         |
|    entropy_loss         | -8.15       |
|    explained_variance   | 0.639       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.401       |
|    n_updates            | 1060        |
|    policy_gradient

 22%|██▏       | 22/100 [7:41:43<62:43:51, 2895.28s/it]

Episode: 22, Mean Reward: 8260.47, Std Reward: 1144.76, Total Timesteps: 10240, Time: Tue Jan 28 10:10:02 2025
New best model with Mean Reward: 8260.47. Saving model.

Training Episode 23 started.
-----------------------------
| time/              |      |
|    fps             | 543  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 452        |
|    iterations           | 2          |
|    time_elapsed         | 9          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.02404967 |
|    clip_fraction        | 0.261      |
|    clip_range           | 0.2        |
|    entropy_loss         | -7.98      |
|    explained_variance   | 0.895      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.226      |
|    n_updates     

 23%|██▎       | 23/100 [7:44:20<44:21:05, 2073.57s/it]

Episode: 23, Mean Reward: 7874.56, Std Reward: 354.12, Total Timesteps: 10240, Time: Tue Jan 28 10:12:40 2025

Training Episode 24 started.
-----------------------------
| time/              |      |
|    fps             | 566  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 486         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.018213924 |
|    clip_fraction        | 0.202       |
|    clip_range           | 0.2         |
|    entropy_loss         | -7.88       |
|    explained_variance   | 0.642       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.747       |
|    n_updates            | 1160        |
|    policy_gradient

 24%|██▍       | 24/100 [7:46:55<31:37:27, 1497.99s/it]

Episode: 24, Mean Reward: 7679.68, Std Reward: 864.06, Total Timesteps: 10240, Time: Tue Jan 28 10:15:15 2025

Training Episode 25 started.
-----------------------------
| time/              |      |
|    fps             | 510  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
---------------------------------------
| time/                   |           |
|    fps                  | 447       |
|    iterations           | 2         |
|    time_elapsed         | 9         |
|    total_timesteps      | 4096      |
| train/                  |           |
|    approx_kl            | 0.0645103 |
|    clip_fraction        | 0.352     |
|    clip_range           | 0.2       |
|    entropy_loss         | -7.71     |
|    explained_variance   | 0.345     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.265     |
|    n_updates            | 1210      |
|    policy_gradient_loss | -0.0125   |
|    std  

 25%|██▌       | 25/100 [7:49:33<22:49:43, 1095.78s/it]

Episode: 25, Mean Reward: 8388.25, Std Reward: 463.36, Total Timesteps: 10240, Time: Tue Jan 28 10:17:52 2025
New best model with Mean Reward: 8388.25. Saving model.

Training Episode 26 started.
-----------------------------
| time/              |      |
|    fps             | 514  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 463         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.024560848 |
|    clip_fraction        | 0.243       |
|    clip_range           | 0.2         |
|    entropy_loss         | -7.53       |
|    explained_variance   | 0.865       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.318       |
|    n

 26%|██▌       | 26/100 [7:52:09<16:43:44, 813.85s/it] 

Episode: 26, Mean Reward: 9069.57, Std Reward: 477.97, Total Timesteps: 10240, Time: Tue Jan 28 10:20:28 2025
New best model with Mean Reward: 9069.57. Saving model.

Training Episode 27 started.
-----------------------------
| time/              |      |
|    fps             | 534  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
---------------------------------------
| time/                   |           |
|    fps                  | 462       |
|    iterations           | 2         |
|    time_elapsed         | 8         |
|    total_timesteps      | 4096      |
| train/                  |           |
|    approx_kl            | 0.0231244 |
|    clip_fraction        | 0.256     |
|    clip_range           | 0.2       |
|    entropy_loss         | -7.34     |
|    explained_variance   | 0.803     |
|    learning_rate        | 0.0003    |
|    loss                 | 0.85      |
|    n_updates            | 1310  

 27%|██▋       | 27/100 [7:54:45<12:30:10, 616.58s/it]

Episode: 27, Mean Reward: 7583.21, Std Reward: 1145.86, Total Timesteps: 10240, Time: Tue Jan 28 10:23:05 2025

Training Episode 28 started.
-----------------------------
| time/              |      |
|    fps             | 549  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 464         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.035544336 |
|    clip_fraction        | 0.283       |
|    clip_range           | 0.2         |
|    entropy_loss         | -7.25       |
|    explained_variance   | 0.946       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.21        |
|    n_updates            | 1360        |
|    policy_gradien

 28%|██▊       | 28/100 [8:09:38<13:59:21, 699.47s/it]

Episode: 28, Mean Reward: 8412.71, Std Reward: 931.45, Total Timesteps: 10240, Time: Tue Jan 28 10:37:58 2025

Training Episode 29 started.
-----------------------------
| time/              |      |
|    fps             | 498  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 418         |
|    iterations           | 2           |
|    time_elapsed         | 9           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.027906088 |
|    clip_fraction        | 0.257       |
|    clip_range           | 0.2         |
|    entropy_loss         | -7.05       |
|    explained_variance   | 0.478       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.412       |
|    n_updates            | 1410        |
|    policy_gradient

 29%|██▉       | 29/100 [8:12:23<10:38:03, 539.20s/it]

Episode: 29, Mean Reward: 8438.98, Std Reward: 886.69, Total Timesteps: 10240, Time: Tue Jan 28 10:40:43 2025

Training Episode 30 started.
-----------------------------
| time/              |      |
|    fps             | 499  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 436        |
|    iterations           | 2          |
|    time_elapsed         | 9          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.03527712 |
|    clip_fraction        | 0.317      |
|    clip_range           | 0.2        |
|    entropy_loss         | -6.79      |
|    explained_variance   | 0.826      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.307      |
|    n_updates            | 1460       |
|    policy_gradient_loss | -0.0322

 30%|███       | 30/100 [8:21:54<10:40:16, 548.80s/it]

Episode: 30, Mean Reward: 8440.65, Std Reward: 1143.42, Total Timesteps: 10240, Time: Tue Jan 28 10:50:14 2025

Training Episode 31 started.
-----------------------------
| time/              |      |
|    fps             | 544  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 474        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.03230104 |
|    clip_fraction        | 0.287      |
|    clip_range           | 0.2        |
|    entropy_loss         | -6.7       |
|    explained_variance   | 0.634      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.2        |
|    n_updates            | 1510       |
|    policy_gradient_loss | -0.040

 31%|███       | 31/100 [8:24:29<8:15:18, 430.71s/it] 

Episode: 31, Mean Reward: 7968.11, Std Reward: 1008.58, Total Timesteps: 10240, Time: Tue Jan 28 10:52:49 2025

Training Episode 32 started.
-----------------------------
| time/              |      |
|    fps             | 544  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 464         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.030596111 |
|    clip_fraction        | 0.295       |
|    clip_range           | 0.2         |
|    entropy_loss         | -6.51       |
|    explained_variance   | 0.791       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.307       |
|    n_updates            | 1560        |
|    policy_gradien

 32%|███▏      | 32/100 [8:27:07<6:35:15, 348.76s/it]

Episode: 32, Mean Reward: 9497.88, Std Reward: 522.37, Total Timesteps: 10240, Time: Tue Jan 28 10:55:27 2025
New best model with Mean Reward: 9497.88. Saving model.

Training Episode 33 started.
-----------------------------
| time/              |      |
|    fps             | 523  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 420         |
|    iterations           | 2           |
|    time_elapsed         | 9           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.028134469 |
|    clip_fraction        | 0.267       |
|    clip_range           | 0.2         |
|    entropy_loss         | -6.31       |
|    explained_variance   | 0.863       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.236       |
|    n

 33%|███▎      | 33/100 [8:29:45<5:25:34, 291.56s/it]

Episode: 33, Mean Reward: 7903.75, Std Reward: 814.11, Total Timesteps: 10240, Time: Tue Jan 28 10:58:05 2025

Training Episode 34 started.
-----------------------------
| time/              |      |
|    fps             | 516  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 434         |
|    iterations           | 2           |
|    time_elapsed         | 9           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.027089842 |
|    clip_fraction        | 0.268       |
|    clip_range           | 0.2         |
|    entropy_loss         | -6.08       |
|    explained_variance   | 0.907       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.291       |
|    n_updates            | 1660        |
|    policy_gradient

 34%|███▍      | 34/100 [8:32:23<4:36:42, 251.55s/it]

Episode: 34, Mean Reward: 8701.78, Std Reward: 504.30, Total Timesteps: 10240, Time: Tue Jan 28 11:00:43 2025

Training Episode 35 started.
-----------------------------
| time/              |      |
|    fps             | 558  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 472         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.046297472 |
|    clip_fraction        | 0.333       |
|    clip_range           | 0.2         |
|    entropy_loss         | -5.88       |
|    explained_variance   | 0.95        |
|    learning_rate        | 0.0003      |
|    loss                 | 0.198       |
|    n_updates            | 1710        |
|    policy_gradient

 35%|███▌      | 35/100 [9:00:10<12:12:22, 676.04s/it]

Episode: 35, Mean Reward: 8632.55, Std Reward: 910.11, Total Timesteps: 10240, Time: Tue Jan 28 11:28:30 2025

Training Episode 36 started.
-----------------------------
| time/              |      |
|    fps             | 317  |
|    iterations      | 1    |
|    time_elapsed    | 6    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 268         |
|    iterations           | 2           |
|    time_elapsed         | 15          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.030683262 |
|    clip_fraction        | 0.266       |
|    clip_range           | 0.2         |
|    entropy_loss         | -5.77       |
|    explained_variance   | 0.885       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.382       |
|    n_updates            | 1760        |
|    policy_gradient

 36%|███▌      | 36/100 [9:03:31<9:29:17, 533.71s/it] 

Episode: 36, Mean Reward: 8887.43, Std Reward: 723.66, Total Timesteps: 10240, Time: Tue Jan 28 11:31:51 2025

Training Episode 37 started.
-----------------------------
| time/              |      |
|    fps             | 522  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 461        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.03622918 |
|    clip_fraction        | 0.302      |
|    clip_range           | 0.2        |
|    entropy_loss         | -5.63      |
|    explained_variance   | 0.854      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.52       |
|    n_updates            | 1810       |
|    policy_gradient_loss | -0.0476

 37%|███▋      | 37/100 [9:06:15<7:23:47, 422.65s/it]

Episode: 37, Mean Reward: 8937.07, Std Reward: 1559.59, Total Timesteps: 10240, Time: Tue Jan 28 11:34:35 2025

Training Episode 38 started.
-----------------------------
| time/              |      |
|    fps             | 501  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 454        |
|    iterations           | 2          |
|    time_elapsed         | 9          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.03843465 |
|    clip_fraction        | 0.322      |
|    clip_range           | 0.2        |
|    entropy_loss         | -5.54      |
|    explained_variance   | 0.887      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.917      |
|    n_updates            | 1860       |
|    policy_gradient_loss | -0.048

 38%|███▊      | 38/100 [9:09:43<6:10:10, 358.24s/it]

Episode: 38, Mean Reward: 7583.48, Std Reward: 1732.06, Total Timesteps: 10240, Time: Tue Jan 28 11:38:03 2025

Training Episode 39 started.
-----------------------------
| time/              |      |
|    fps             | 371  |
|    iterations      | 1    |
|    time_elapsed    | 5    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 337         |
|    iterations           | 2           |
|    time_elapsed         | 12          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.031622883 |
|    clip_fraction        | 0.32        |
|    clip_range           | 0.2         |
|    entropy_loss         | -5.4        |
|    explained_variance   | 0.82        |
|    learning_rate        | 0.0003      |
|    loss                 | 1.84        |
|    n_updates            | 1910        |
|    policy_gradien

 39%|███▉      | 39/100 [9:12:38<5:08:30, 303.45s/it]

Episode: 39, Mean Reward: 8871.97, Std Reward: 2312.50, Total Timesteps: 10240, Time: Tue Jan 28 11:40:58 2025

Training Episode 40 started.
-----------------------------
| time/              |      |
|    fps             | 489  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 423        |
|    iterations           | 2          |
|    time_elapsed         | 9          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.03945414 |
|    clip_fraction        | 0.344      |
|    clip_range           | 0.2        |
|    entropy_loss         | -5.26      |
|    explained_variance   | 0.919      |
|    learning_rate        | 0.0003     |
|    loss                 | 0.974      |
|    n_updates            | 1960       |
|    policy_gradient_loss | -0.041

 40%|████      | 40/100 [9:15:37<4:25:52, 265.87s/it]

Episode: 40, Mean Reward: 9451.94, Std Reward: 2737.11, Total Timesteps: 10240, Time: Tue Jan 28 11:43:56 2025

Training Episode 41 started.
-----------------------------
| time/              |      |
|    fps             | 388  |
|    iterations      | 1    |
|    time_elapsed    | 5    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 331         |
|    iterations           | 2           |
|    time_elapsed         | 12          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.052205555 |
|    clip_fraction        | 0.406       |
|    clip_range           | 0.2         |
|    entropy_loss         | -5.11       |
|    explained_variance   | 0.926       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.818       |
|    n_updates            | 2010        |
|    policy_gradien

 41%|████      | 41/100 [9:18:16<3:50:02, 233.94s/it]

Episode: 41, Mean Reward: 11403.86, Std Reward: 1622.19, Total Timesteps: 10240, Time: Tue Jan 28 11:46:36 2025
New best model with Mean Reward: 11403.86. Saving model.

Training Episode 42 started.
-----------------------------
| time/              |      |
|    fps             | 494  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 442         |
|    iterations           | 2           |
|    time_elapsed         | 9           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.038520712 |
|    clip_fraction        | 0.335       |
|    clip_range           | 0.2         |
|    entropy_loss         | -4.97       |
|    explained_variance   | 0.918       |
|    learning_rate        | 0.0003      |
|    loss                 | 1           |
|  

 42%|████▏     | 42/100 [9:20:53<3:23:40, 210.70s/it]

Episode: 42, Mean Reward: 13501.01, Std Reward: 1749.77, Total Timesteps: 10240, Time: Tue Jan 28 11:49:12 2025
New best model with Mean Reward: 13501.01. Saving model.

Training Episode 43 started.
-----------------------------
| time/              |      |
|    fps             | 561  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 496         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.054635033 |
|    clip_fraction        | 0.371       |
|    clip_range           | 0.2         |
|    entropy_loss         | -4.85       |
|    explained_variance   | 0.906       |
|    learning_rate        | 0.0003      |
|    loss                 | 1.03        |
|  

 43%|████▎     | 43/100 [9:24:25<3:20:39, 211.22s/it]

Episode: 43, Mean Reward: 9998.84, Std Reward: 1824.24, Total Timesteps: 10240, Time: Tue Jan 28 11:52:45 2025

Training Episode 44 started.
-----------------------------
| time/              |      |
|    fps             | 521  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 458        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.03267718 |
|    clip_fraction        | 0.333      |
|    clip_range           | 0.2        |
|    entropy_loss         | -4.82      |
|    explained_variance   | 0.83       |
|    learning_rate        | 0.0003     |
|    loss                 | 1.57       |
|    n_updates            | 2160       |
|    policy_gradient_loss | -0.035

 44%|████▍     | 44/100 [9:26:59<3:01:13, 194.17s/it]

Episode: 44, Mean Reward: 13685.94, Std Reward: 1779.10, Total Timesteps: 10240, Time: Tue Jan 28 11:55:19 2025
New best model with Mean Reward: 13685.94. Saving model.

Training Episode 45 started.
-----------------------------
| time/              |      |
|    fps             | 563  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 483        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.04557129 |
|    clip_fraction        | 0.362      |
|    clip_range           | 0.2        |
|    entropy_loss         | -4.67      |
|    explained_variance   | 0.871      |
|    learning_rate        | 0.0003     |
|    loss                 | 2.88       |
|    n_updates   

 45%|████▌     | 45/100 [9:29:30<2:46:06, 181.21s/it]

Episode: 45, Mean Reward: 12347.40, Std Reward: 2755.31, Total Timesteps: 10240, Time: Tue Jan 28 11:57:50 2025

Training Episode 46 started.
-----------------------------
| time/              |      |
|    fps             | 564  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 485         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.046152845 |
|    clip_fraction        | 0.394       |
|    clip_range           | 0.2         |
|    entropy_loss         | -4.54       |
|    explained_variance   | 0.863       |
|    learning_rate        | 0.0003      |
|    loss                 | 4.51        |
|    n_updates            | 2260        |
|    policy_gradie

 46%|████▌     | 46/100 [9:32:03<2:35:25, 172.69s/it]

Episode: 46, Mean Reward: 15428.97, Std Reward: 1628.05, Total Timesteps: 10240, Time: Tue Jan 28 12:00:23 2025
New best model with Mean Reward: 15428.97. Saving model.

Training Episode 47 started.
-----------------------------
| time/              |      |
|    fps             | 563  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 480        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.04904305 |
|    clip_fraction        | 0.416      |
|    clip_range           | 0.2        |
|    entropy_loss         | -4.43      |
|    explained_variance   | 0.838      |
|    learning_rate        | 0.0003     |
|    loss                 | 3.13       |
|    n_updates   

 47%|████▋     | 47/100 [9:34:35<2:27:05, 166.51s/it]

Episode: 47, Mean Reward: 16699.19, Std Reward: 460.98, Total Timesteps: 10240, Time: Tue Jan 28 12:02:55 2025
New best model with Mean Reward: 16699.19. Saving model.

Training Episode 48 started.
-----------------------------
| time/              |      |
|    fps             | 521  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 441         |
|    iterations           | 2           |
|    time_elapsed         | 9           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.055321038 |
|    clip_fraction        | 0.429       |
|    clip_range           | 0.2         |
|    entropy_loss         | -4.37       |
|    explained_variance   | 0.914       |
|    learning_rate        | 0.0003      |
|    loss                 | 3.41        |
|   

 48%|████▊     | 48/100 [14:59:05<86:03:15, 5957.60s/it]

Episode: 48, Mean Reward: 18099.83, Std Reward: 2975.92, Total Timesteps: 10240, Time: Tue Jan 28 17:27:25 2025
New best model with Mean Reward: 18099.83. Saving model.

Training Episode 49 started.
-----------------------------
| time/              |      |
|    fps             | 510  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 442        |
|    iterations           | 2          |
|    time_elapsed         | 9          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.04943386 |
|    clip_fraction        | 0.424      |
|    clip_range           | 0.2        |
|    entropy_loss         | -4.22      |
|    explained_variance   | 0.919      |
|    learning_rate        | 0.0003     |
|    loss                 | 3.84       |
|    n_updates   

 49%|████▉     | 49/100 [15:01:53<59:47:26, 4220.53s/it]

Episode: 49, Mean Reward: 18876.33, Std Reward: 78.65, Total Timesteps: 10240, Time: Tue Jan 28 17:30:13 2025
New best model with Mean Reward: 18876.33. Saving model.

Training Episode 50 started.
-----------------------------
| time/              |      |
|    fps             | 510  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 441         |
|    iterations           | 2           |
|    time_elapsed         | 9           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.060576845 |
|    clip_fraction        | 0.416       |
|    clip_range           | 0.2         |
|    entropy_loss         | -4.02       |
|    explained_variance   | 0.927       |
|    learning_rate        | 0.0003      |
|    loss                 | 3.45        |
|    

 50%|█████     | 50/100 [15:04:23<41:39:28, 2999.37s/it]

Episode: 50, Mean Reward: 19146.04, Std Reward: 290.18, Total Timesteps: 10240, Time: Tue Jan 28 17:32:43 2025
New best model with Mean Reward: 19146.04. Saving model.

Training Episode 51 started.
-----------------------------
| time/              |      |
|    fps             | 572  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 497         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.050254844 |
|    clip_fraction        | 0.415       |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.87       |
|    explained_variance   | 0.907       |
|    learning_rate        | 0.0003      |
|    loss                 | 2.6         |
|   

 51%|█████     | 51/100 [15:14:33<31:04:15, 2282.77s/it]

Episode: 51, Mean Reward: 17679.76, Std Reward: 99.32, Total Timesteps: 10240, Time: Tue Jan 28 17:42:53 2025

Training Episode 52 started.
-----------------------------
| time/              |      |
|    fps             | 536  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 439        |
|    iterations           | 2          |
|    time_elapsed         | 9          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.06731857 |
|    clip_fraction        | 0.455      |
|    clip_range           | 0.2        |
|    entropy_loss         | -3.7       |
|    explained_variance   | 0.941      |
|    learning_rate        | 0.0003     |
|    loss                 | 1.45       |
|    n_updates            | 2560       |
|    policy_gradient_loss | -0.0448

 52%|█████▏    | 52/100 [15:17:04<21:54:27, 1643.07s/it]

Episode: 52, Mean Reward: 20488.32, Std Reward: 150.54, Total Timesteps: 10240, Time: Tue Jan 28 17:45:24 2025
New best model with Mean Reward: 20488.32. Saving model.

Training Episode 53 started.
-----------------------------
| time/              |      |
|    fps             | 582  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 507        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.05994716 |
|    clip_fraction        | 0.415      |
|    clip_range           | 0.2        |
|    entropy_loss         | -3.54      |
|    explained_variance   | 0.944      |
|    learning_rate        | 0.0003     |
|    loss                 | 1.85       |
|    n_updates    

 53%|█████▎    | 53/100 [18:00:31<53:25:34, 4092.23s/it]

Episode: 53, Mean Reward: 19853.26, Std Reward: 4309.32, Total Timesteps: 10240, Time: Tue Jan 28 20:28:51 2025

Training Episode 54 started.
-----------------------------
| time/              |      |
|    fps             | 551  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 455        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.05433664 |
|    clip_fraction        | 0.451      |
|    clip_range           | 0.2        |
|    entropy_loss         | -3.47      |
|    explained_variance   | 0.94       |
|    learning_rate        | 0.0003     |
|    loss                 | 1.42       |
|    n_updates            | 2660       |
|    policy_gradient_loss | -0.04

 54%|█████▍    | 54/100 [18:03:07<37:12:04, 2911.40s/it]

Episode: 54, Mean Reward: 19744.92, Std Reward: 4363.13, Total Timesteps: 10240, Time: Tue Jan 28 20:31:27 2025

Training Episode 55 started.
-----------------------------
| time/              |      |
|    fps             | 527  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 468        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.07720062 |
|    clip_fraction        | 0.496      |
|    clip_range           | 0.2        |
|    entropy_loss         | -3.4       |
|    explained_variance   | 0.96       |
|    learning_rate        | 0.0003     |
|    loss                 | 1.14       |
|    n_updates            | 2710       |
|    policy_gradient_loss | -0.04

 55%|█████▌    | 55/100 [19:16:39<42:01:13, 3361.64s/it]

Episode: 55, Mean Reward: 18671.61, Std Reward: 5910.99, Total Timesteps: 10240, Time: Tue Jan 28 21:44:59 2025

Training Episode 56 started.
-----------------------------
| time/              |      |
|    fps             | 475  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 412        |
|    iterations           | 2          |
|    time_elapsed         | 9          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.05796849 |
|    clip_fraction        | 0.486      |
|    clip_range           | 0.2        |
|    entropy_loss         | -3.25      |
|    explained_variance   | 0.943      |
|    learning_rate        | 0.0003     |
|    loss                 | 2.25       |
|    n_updates            | 2760       |
|    policy_gradient_loss | -0.03

 56%|█████▌    | 56/100 [19:19:20<29:21:02, 2401.41s/it]

Episode: 56, Mean Reward: 22003.30, Std Reward: 121.14, Total Timesteps: 10240, Time: Tue Jan 28 21:47:40 2025
New best model with Mean Reward: 22003.30. Saving model.

Training Episode 57 started.
-----------------------------
| time/              |      |
|    fps             | 541  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 449         |
|    iterations           | 2           |
|    time_elapsed         | 9           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.066027656 |
|    clip_fraction        | 0.488       |
|    clip_range           | 0.2         |
|    entropy_loss         | -3.15       |
|    explained_variance   | 0.954       |
|    learning_rate        | 0.0003      |
|    loss                 | 1.56        |
|   

 57%|█████▋    | 57/100 [19:22:14<20:42:11, 1733.30s/it]

Episode: 57, Mean Reward: 21706.37, Std Reward: 193.60, Total Timesteps: 10240, Time: Tue Jan 28 21:50:34 2025

Training Episode 58 started.
-----------------------------
| time/              |      |
|    fps             | 555  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 458        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.05388329 |
|    clip_fraction        | 0.416      |
|    clip_range           | 0.2        |
|    entropy_loss         | -3.04      |
|    explained_variance   | 0.927      |
|    learning_rate        | 0.0003     |
|    loss                 | 2.49       |
|    n_updates            | 2860       |
|    policy_gradient_loss | -0.037

 58%|█████▊    | 58/100 [19:24:55<14:43:01, 1261.47s/it]

Episode: 58, Mean Reward: 22454.72, Std Reward: 108.68, Total Timesteps: 10240, Time: Tue Jan 28 21:53:15 2025
New best model with Mean Reward: 22454.72. Saving model.

Training Episode 59 started.
-----------------------------
| time/              |      |
|    fps             | 511  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 388         |
|    iterations           | 2           |
|    time_elapsed         | 10          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.041563578 |
|    clip_fraction        | 0.37        |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.97       |
|    explained_variance   | 0.909       |
|    learning_rate        | 0.0003      |
|    loss                 | 1.21        |
|   

 59%|█████▉    | 59/100 [19:28:10<10:43:28, 941.67s/it] 

Episode: 59, Mean Reward: 22494.68, Std Reward: 848.79, Total Timesteps: 10240, Time: Tue Jan 28 21:56:30 2025
New best model with Mean Reward: 22494.68. Saving model.

Training Episode 60 started.
-----------------------------
| time/              |      |
|    fps             | 375  |
|    iterations      | 1    |
|    time_elapsed    | 5    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 345         |
|    iterations           | 2           |
|    time_elapsed         | 11          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.060816072 |
|    clip_fraction        | 0.445       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.91       |
|    explained_variance   | 0.945       |
|    learning_rate        | 0.0003      |
|    loss                 | 1.03        |
|   

 60%|██████    | 60/100 [19:30:58<7:52:57, 709.45s/it] 

Episode: 60, Mean Reward: 23579.19, Std Reward: 66.72, Total Timesteps: 10240, Time: Tue Jan 28 21:59:18 2025
New best model with Mean Reward: 23579.19. Saving model.

Training Episode 61 started.
-----------------------------
| time/              |      |
|    fps             | 521  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 474        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.06695082 |
|    clip_fraction        | 0.495      |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.85      |
|    explained_variance   | 0.939      |
|    learning_rate        | 0.0003     |
|    loss                 | 1.74       |
|    n_updates     

 61%|██████    | 61/100 [19:33:29<5:52:16, 541.95s/it]

Episode: 61, Mean Reward: 23237.84, Std Reward: 150.32, Total Timesteps: 10240, Time: Tue Jan 28 22:01:49 2025

Training Episode 62 started.
-----------------------------
| time/              |      |
|    fps             | 534  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 475         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.091904834 |
|    clip_fraction        | 0.565       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.7        |
|    explained_variance   | 0.928       |
|    learning_rate        | 0.0003      |
|    loss                 | 1.58        |
|    n_updates            | 3060        |
|    policy_gradien

 62%|██████▏   | 62/100 [19:36:17<4:32:07, 429.67s/it]

Episode: 62, Mean Reward: 23624.80, Std Reward: 975.05, Total Timesteps: 10240, Time: Tue Jan 28 22:04:37 2025
New best model with Mean Reward: 23624.80. Saving model.

Training Episode 63 started.
-----------------------------
| time/              |      |
|    fps             | 569  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 495        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.10996432 |
|    clip_fraction        | 0.503      |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.59      |
|    explained_variance   | 0.956      |
|    learning_rate        | 0.0003     |
|    loss                 | 1.1        |
|    n_updates    

 63%|██████▎   | 63/100 [19:38:58<3:35:13, 349.01s/it]

Episode: 63, Mean Reward: 22980.03, Std Reward: 2406.33, Total Timesteps: 10240, Time: Tue Jan 28 22:07:17 2025

Training Episode 64 started.
-----------------------------
| time/              |      |
|    fps             | 529  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
---------------------------------------
| time/                   |           |
|    fps                  | 454       |
|    iterations           | 2         |
|    time_elapsed         | 9         |
|    total_timesteps      | 4096      |
| train/                  |           |
|    approx_kl            | 0.0579198 |
|    clip_fraction        | 0.423     |
|    clip_range           | 0.2       |
|    entropy_loss         | -2.48     |
|    explained_variance   | 0.947     |
|    learning_rate        | 0.0003    |
|    loss                 | 1.95      |
|    n_updates            | 3160      |
|    policy_gradient_loss | -0.0497   |
|    std

 64%|██████▍   | 64/100 [19:41:29<2:53:48, 289.69s/it]

Episode: 64, Mean Reward: 24232.51, Std Reward: 193.09, Total Timesteps: 10240, Time: Tue Jan 28 22:09:49 2025
New best model with Mean Reward: 24232.51. Saving model.

Training Episode 65 started.
-----------------------------
| time/              |      |
|    fps             | 552  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 489         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.061709568 |
|    clip_fraction        | 0.44        |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.33       |
|    explained_variance   | 0.955       |
|    learning_rate        | 0.0003      |
|    loss                 | 5.21        |
|   

 65%|██████▌   | 65/100 [19:43:59<2:24:37, 247.92s/it]

Episode: 65, Mean Reward: 24199.03, Std Reward: 66.90, Total Timesteps: 10240, Time: Tue Jan 28 22:12:19 2025

Training Episode 66 started.
-----------------------------
| time/              |      |
|    fps             | 538  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 468        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.11225942 |
|    clip_fraction        | 0.536      |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.16      |
|    explained_variance   | 0.978      |
|    learning_rate        | 0.0003     |
|    loss                 | 1.13       |
|    n_updates            | 3260       |
|    policy_gradient_loss | -0.0305

 66%|██████▌   | 66/100 [19:46:32<2:04:20, 219.43s/it]

Episode: 66, Mean Reward: 24710.55, Std Reward: 157.23, Total Timesteps: 10240, Time: Tue Jan 28 22:14:52 2025
New best model with Mean Reward: 24710.55. Saving model.

Training Episode 67 started.
-----------------------------
| time/              |      |
|    fps             | 547  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 467         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.045769513 |
|    clip_fraction        | 0.333       |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.07       |
|    explained_variance   | 0.913       |
|    learning_rate        | 0.0003      |
|    loss                 | 2.61        |
|   

 67%|██████▋   | 67/100 [19:49:03<1:49:22, 198.86s/it]

Episode: 67, Mean Reward: 24927.78, Std Reward: 192.59, Total Timesteps: 10240, Time: Tue Jan 28 22:17:23 2025
New best model with Mean Reward: 24927.78. Saving model.

Training Episode 68 started.
-----------------------------
| time/              |      |
|    fps             | 573  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 497         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.076976225 |
|    clip_fraction        | 0.51        |
|    clip_range           | 0.2         |
|    entropy_loss         | -2.02       |
|    explained_variance   | 0.966       |
|    learning_rate        | 0.0003      |
|    loss                 | 1.43        |
|   

 68%|██████▊   | 68/100 [19:51:38<1:38:59, 185.61s/it]

Episode: 68, Mean Reward: 24016.38, Std Reward: 285.10, Total Timesteps: 10240, Time: Tue Jan 28 22:19:58 2025

Training Episode 69 started.
-----------------------------
| time/              |      |
|    fps             | 436  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 385         |
|    iterations           | 2           |
|    time_elapsed         | 10          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.038991608 |
|    clip_fraction        | 0.373       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.91       |
|    explained_variance   | 0.912       |
|    learning_rate        | 0.0003      |
|    loss                 | 6.51        |
|    n_updates            | 3410        |
|    policy_gradien

 69%|██████▉   | 69/100 [19:54:25<1:33:00, 180.02s/it]

Episode: 69, Mean Reward: 24818.25, Std Reward: 507.05, Total Timesteps: 10240, Time: Tue Jan 28 22:22:45 2025

Training Episode 70 started.
-----------------------------
| time/              |      |
|    fps             | 573  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 499        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.12486246 |
|    clip_fraction        | 0.521      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.73      |
|    explained_variance   | 0.974      |
|    learning_rate        | 0.0003     |
|    loss                 | 3.45       |
|    n_updates            | 3460       |
|    policy_gradient_loss | -0.031

 70%|███████   | 70/100 [19:57:01<1:26:22, 172.76s/it]

Episode: 70, Mean Reward: 23990.39, Std Reward: 304.88, Total Timesteps: 10240, Time: Tue Jan 28 22:25:20 2025

Training Episode 71 started.
-----------------------------
| time/              |      |
|    fps             | 526  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 448         |
|    iterations           | 2           |
|    time_elapsed         | 9           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.123719946 |
|    clip_fraction        | 0.593       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.72       |
|    explained_variance   | 0.968       |
|    learning_rate        | 0.0003      |
|    loss                 | 0.944       |
|    n_updates            | 3510        |
|    policy_gradien

 71%|███████   | 71/100 [19:59:43<1:21:59, 169.65s/it]

Episode: 71, Mean Reward: 24649.96, Std Reward: 464.42, Total Timesteps: 10240, Time: Tue Jan 28 22:28:03 2025

Training Episode 72 started.
-----------------------------
| time/              |      |
|    fps             | 494  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 421        |
|    iterations           | 2          |
|    time_elapsed         | 9          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.17729913 |
|    clip_fraction        | 0.563      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.68      |
|    explained_variance   | 0.979      |
|    learning_rate        | 0.0003     |
|    loss                 | 1.13       |
|    n_updates            | 3560       |
|    policy_gradient_loss | -0.018

 72%|███████▏  | 72/100 [20:02:26<1:18:10, 167.53s/it]

Episode: 72, Mean Reward: 24767.72, Std Reward: 316.49, Total Timesteps: 10240, Time: Tue Jan 28 22:30:45 2025

Training Episode 73 started.
-----------------------------
| time/              |      |
|    fps             | 541  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 450        |
|    iterations           | 2          |
|    time_elapsed         | 9          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.08734022 |
|    clip_fraction        | 0.495      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.56      |
|    explained_variance   | 0.973      |
|    learning_rate        | 0.0003     |
|    loss                 | 1.95       |
|    n_updates            | 3610       |
|    policy_gradient_loss | -0.037

 73%|███████▎  | 73/100 [20:05:09<1:14:48, 166.23s/it]

Episode: 73, Mean Reward: 24546.36, Std Reward: 2039.34, Total Timesteps: 10240, Time: Tue Jan 28 22:33:29 2025

Training Episode 74 started.
-----------------------------
| time/              |      |
|    fps             | 536  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 431        |
|    iterations           | 2          |
|    time_elapsed         | 9          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.10786237 |
|    clip_fraction        | 0.56       |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.53      |
|    explained_variance   | 0.958      |
|    learning_rate        | 0.0003     |
|    loss                 | 2.32       |
|    n_updates            | 3660       |
|    policy_gradient_loss | -0.02

 74%|███████▍  | 74/100 [20:08:15<1:14:36, 172.18s/it]

Episode: 74, Mean Reward: 25735.51, Std Reward: 382.82, Total Timesteps: 10240, Time: Tue Jan 28 22:36:35 2025
New best model with Mean Reward: 25735.51. Saving model.

Training Episode 75 started.
-----------------------------
| time/              |      |
|    fps             | 461  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 429         |
|    iterations           | 2           |
|    time_elapsed         | 9           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.039600536 |
|    clip_fraction        | 0.366       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.5        |
|    explained_variance   | 0.916       |
|    learning_rate        | 0.0003      |
|    loss                 | 13.2        |
|   

 75%|███████▌  | 75/100 [20:11:00<1:10:48, 169.95s/it]

Episode: 75, Mean Reward: 24365.21, Std Reward: 5496.30, Total Timesteps: 10240, Time: Tue Jan 28 22:39:19 2025

Training Episode 76 started.
-----------------------------
| time/              |      |
|    fps             | 360  |
|    iterations      | 1    |
|    time_elapsed    | 5    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 297        |
|    iterations           | 2          |
|    time_elapsed         | 13         |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.07214923 |
|    clip_fraction        | 0.419      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.45      |
|    explained_variance   | 0.944      |
|    learning_rate        | 0.0003     |
|    loss                 | 5.99       |
|    n_updates            | 3760       |
|    policy_gradient_loss | -0.05

 76%|███████▌  | 76/100 [20:13:56<1:08:45, 171.90s/it]

Episode: 76, Mean Reward: 23889.03, Std Reward: 5127.13, Total Timesteps: 10240, Time: Tue Jan 28 22:42:16 2025

Training Episode 77 started.
-----------------------------
| time/              |      |
|    fps             | 466  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 385        |
|    iterations           | 2          |
|    time_elapsed         | 10         |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.07248554 |
|    clip_fraction        | 0.463      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.29      |
|    explained_variance   | 0.965      |
|    learning_rate        | 0.0003     |
|    loss                 | 1.25       |
|    n_updates            | 3810       |
|    policy_gradient_loss | -0.04

 77%|███████▋  | 77/100 [20:16:54<1:06:37, 173.82s/it]

Episode: 77, Mean Reward: 24705.62, Std Reward: 3528.46, Total Timesteps: 10240, Time: Tue Jan 28 22:45:14 2025

Training Episode 78 started.
-----------------------------
| time/              |      |
|    fps             | 563  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
---------------------------------------
| time/                   |           |
|    fps                  | 438       |
|    iterations           | 2         |
|    time_elapsed         | 9         |
|    total_timesteps      | 4096      |
| train/                  |           |
|    approx_kl            | 0.1323882 |
|    clip_fraction        | 0.533     |
|    clip_range           | 0.2       |
|    entropy_loss         | -1.13     |
|    explained_variance   | 0.958     |
|    learning_rate        | 0.0003    |
|    loss                 | 2.87      |
|    n_updates            | 3860      |
|    policy_gradient_loss | -0.0344   |
|    std

 78%|███████▊  | 78/100 [20:20:08<1:05:52, 179.66s/it]

Episode: 78, Mean Reward: 26431.71, Std Reward: 153.04, Total Timesteps: 10240, Time: Tue Jan 28 22:48:27 2025
New best model with Mean Reward: 26431.71. Saving model.

Training Episode 79 started.
-----------------------------
| time/              |      |
|    fps             | 435  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 362        |
|    iterations           | 2          |
|    time_elapsed         | 11         |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.11901548 |
|    clip_fraction        | 0.571      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.08      |
|    explained_variance   | 0.973      |
|    learning_rate        | 0.0003     |
|    loss                 | 1.05       |
|    n_updates    

 79%|███████▉  | 79/100 [20:23:21<1:04:19, 183.78s/it]

Episode: 79, Mean Reward: 25829.05, Std Reward: 1712.33, Total Timesteps: 10240, Time: Tue Jan 28 22:51:41 2025

Training Episode 80 started.
-----------------------------
| time/              |      |
|    fps             | 387  |
|    iterations      | 1    |
|    time_elapsed    | 5    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 317        |
|    iterations           | 2          |
|    time_elapsed         | 12         |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.09670059 |
|    clip_fraction        | 0.53       |
|    clip_range           | 0.2        |
|    entropy_loss         | -1         |
|    explained_variance   | 0.932      |
|    learning_rate        | 0.0003     |
|    loss                 | 2.46       |
|    n_updates            | 3960       |
|    policy_gradient_loss | -0.03

 80%|████████  | 80/100 [20:26:26<1:01:25, 184.26s/it]

Episode: 80, Mean Reward: 24394.86, Std Reward: 4542.25, Total Timesteps: 10240, Time: Tue Jan 28 22:54:46 2025

Training Episode 81 started.
-----------------------------
| time/              |      |
|    fps             | 474  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 400        |
|    iterations           | 2          |
|    time_elapsed         | 10         |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.08609937 |
|    clip_fraction        | 0.501      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.928     |
|    explained_variance   | 0.945      |
|    learning_rate        | 0.0003     |
|    loss                 | 2.93       |
|    n_updates            | 4010       |
|    policy_gradient_loss | -0.02

 81%|████████  | 81/100 [20:29:41<59:19, 187.33s/it]  

Episode: 81, Mean Reward: 24041.74, Std Reward: 5230.50, Total Timesteps: 10240, Time: Tue Jan 28 22:58:01 2025

Training Episode 82 started.
-----------------------------
| time/              |      |
|    fps             | 385  |
|    iterations      | 1    |
|    time_elapsed    | 5    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 338        |
|    iterations           | 2          |
|    time_elapsed         | 12         |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.14494517 |
|    clip_fraction        | 0.417      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.838     |
|    explained_variance   | 0.946      |
|    learning_rate        | 0.0003     |
|    loss                 | 3.91       |
|    n_updates            | 4060       |
|    policy_gradient_loss | -0.04

 82%|████████▏ | 82/100 [20:32:51<56:28, 188.25s/it]

Episode: 82, Mean Reward: 24285.99, Std Reward: 4675.22, Total Timesteps: 10240, Time: Tue Jan 28 23:01:11 2025

Training Episode 83 started.
-----------------------------
| time/              |      |
|    fps             | 485  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 415        |
|    iterations           | 2          |
|    time_elapsed         | 9          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.13529137 |
|    clip_fraction        | 0.544      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.69      |
|    explained_variance   | 0.972      |
|    learning_rate        | 0.0003     |
|    loss                 | 2.07       |
|    n_updates            | 4110       |
|    policy_gradient_loss | -0.02

 83%|████████▎ | 83/100 [20:35:54<52:51, 186.57s/it]

Episode: 83, Mean Reward: 25602.88, Std Reward: 2359.21, Total Timesteps: 10240, Time: Tue Jan 28 23:04:14 2025

Training Episode 84 started.
-----------------------------
| time/              |      |
|    fps             | 417  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 412        |
|    iterations           | 2          |
|    time_elapsed         | 9          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.15036972 |
|    clip_fraction        | 0.628      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.619     |
|    explained_variance   | 0.963      |
|    learning_rate        | 0.0003     |
|    loss                 | 1.05       |
|    n_updates            | 4160       |
|    policy_gradient_loss | 0.004

 84%|████████▍ | 84/100 [20:38:50<48:56, 183.53s/it]

Episode: 84, Mean Reward: 26398.33, Std Reward: 295.14, Total Timesteps: 10240, Time: Tue Jan 28 23:07:10 2025

Training Episode 85 started.
-----------------------------
| time/              |      |
|    fps             | 530  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 453        |
|    iterations           | 2          |
|    time_elapsed         | 9          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.08907469 |
|    clip_fraction        | 0.538      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.58      |
|    explained_variance   | 0.96       |
|    learning_rate        | 0.0003     |
|    loss                 | 4.59       |
|    n_updates            | 4210       |
|    policy_gradient_loss | -0.033

 85%|████████▌ | 85/100 [20:42:00<46:20, 185.35s/it]

Episode: 85, Mean Reward: 26361.39, Std Reward: 245.60, Total Timesteps: 10240, Time: Tue Jan 28 23:10:20 2025

Training Episode 86 started.
-----------------------------
| time/              |      |
|    fps             | 433  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
---------------------------------------
| time/                   |           |
|    fps                  | 402       |
|    iterations           | 2         |
|    time_elapsed         | 10        |
|    total_timesteps      | 4096      |
| train/                  |           |
|    approx_kl            | 0.0705942 |
|    clip_fraction        | 0.507     |
|    clip_range           | 0.2       |
|    entropy_loss         | -0.503    |
|    explained_variance   | 0.964     |
|    learning_rate        | 0.0003    |
|    loss                 | 2.06      |
|    n_updates            | 4260      |
|    policy_gradient_loss | -0.0375   |
|    std 

 86%|████████▌ | 86/100 [20:45:06<43:18, 185.64s/it]

Episode: 86, Mean Reward: 24961.79, Std Reward: 5731.70, Total Timesteps: 10240, Time: Tue Jan 28 23:13:26 2025

Training Episode 87 started.
-----------------------------
| time/              |      |
|    fps             | 467  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 395        |
|    iterations           | 2          |
|    time_elapsed         | 10         |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.14580593 |
|    clip_fraction        | 0.526      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.494     |
|    explained_variance   | 0.976      |
|    learning_rate        | 0.0003     |
|    loss                 | 1.23       |
|    n_updates            | 4310       |
|    policy_gradient_loss | -0.03

 87%|████████▋ | 87/100 [20:48:09<40:03, 184.86s/it]

Episode: 87, Mean Reward: 26403.23, Std Reward: 497.01, Total Timesteps: 10240, Time: Tue Jan 28 23:16:29 2025

Training Episode 88 started.
-----------------------------
| time/              |      |
|    fps             | 453  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 378         |
|    iterations           | 2           |
|    time_elapsed         | 10          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.082687706 |
|    clip_fraction        | 0.431       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.425      |
|    explained_variance   | 0.942       |
|    learning_rate        | 0.0003      |
|    loss                 | 3.87        |
|    n_updates            | 4360        |
|    policy_gradien

 88%|████████▊ | 88/100 [20:51:17<37:09, 185.80s/it]

Episode: 88, Mean Reward: 26555.42, Std Reward: 271.62, Total Timesteps: 10240, Time: Tue Jan 28 23:19:37 2025
New best model with Mean Reward: 26555.42. Saving model.

Training Episode 89 started.
-----------------------------
| time/              |      |
|    fps             | 476  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 406         |
|    iterations           | 2           |
|    time_elapsed         | 10          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.086645745 |
|    clip_fraction        | 0.502       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.364      |
|    explained_variance   | 0.975       |
|    learning_rate        | 0.0003      |
|    loss                 | 3.22        |
|   

 89%|████████▉ | 89/100 [20:54:29<34:24, 187.65s/it]

Episode: 89, Mean Reward: 24941.87, Std Reward: 5327.63, Total Timesteps: 10240, Time: Tue Jan 28 23:22:49 2025

Training Episode 90 started.
-----------------------------
| time/              |      |
|    fps             | 427  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 383        |
|    iterations           | 2          |
|    time_elapsed         | 10         |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.16223201 |
|    clip_fraction        | 0.58       |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.302     |
|    explained_variance   | 0.946      |
|    learning_rate        | 0.0003     |
|    loss                 | 1.51       |
|    n_updates            | 4460       |
|    policy_gradient_loss | -0.02

 90%|█████████ | 90/100 [20:57:31<30:57, 185.71s/it]

Episode: 90, Mean Reward: 25623.78, Std Reward: 3541.98, Total Timesteps: 10240, Time: Tue Jan 28 23:25:50 2025

Training Episode 91 started.
-----------------------------
| time/              |      |
|    fps             | 548  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 466        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.09075736 |
|    clip_fraction        | 0.527      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.164     |
|    explained_variance   | 0.951      |
|    learning_rate        | 0.0003     |
|    loss                 | 5.16       |
|    n_updates            | 4510       |
|    policy_gradient_loss | -0.03

 91%|█████████ | 91/100 [20:59:56<26:03, 173.78s/it]

Episode: 91, Mean Reward: 24125.85, Std Reward: 6929.90, Total Timesteps: 10240, Time: Tue Jan 28 23:28:16 2025

Training Episode 92 started.
-----------------------------
| time/              |      |
|    fps             | 557  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 479         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.075420566 |
|    clip_fraction        | 0.433       |
|    clip_range           | 0.2         |
|    entropy_loss         | -0.107      |
|    explained_variance   | 0.948       |
|    learning_rate        | 0.0003      |
|    loss                 | 5.42        |
|    n_updates            | 4560        |
|    policy_gradie

 92%|█████████▏| 92/100 [21:07:35<34:33, 259.20s/it]

Episode: 92, Mean Reward: 25210.64, Std Reward: 3549.29, Total Timesteps: 10240, Time: Tue Jan 28 23:35:55 2025

Training Episode 93 started.
-----------------------------
| time/              |      |
|    fps             | 529  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 448        |
|    iterations           | 2          |
|    time_elapsed         | 9          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.23842713 |
|    clip_fraction        | 0.52       |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.0568    |
|    explained_variance   | 0.937      |
|    learning_rate        | 0.0003     |
|    loss                 | 5.36       |
|    n_updates            | 4610       |
|    policy_gradient_loss | -0.04

 93%|█████████▎| 93/100 [21:10:22<27:00, 231.49s/it]

Episode: 93, Mean Reward: 24062.19, Std Reward: 5890.88, Total Timesteps: 10240, Time: Tue Jan 28 23:38:42 2025

Training Episode 94 started.
-----------------------------
| time/              |      |
|    fps             | 565  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 493        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.17076567 |
|    clip_fraction        | 0.598      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.0142    |
|    explained_variance   | 0.979      |
|    learning_rate        | 0.0003     |
|    loss                 | 1.75       |
|    n_updates            | 4660       |
|    policy_gradient_loss | -0.01

 94%|█████████▍| 94/100 [21:12:53<20:44, 207.47s/it]

Episode: 94, Mean Reward: 24296.99, Std Reward: 4926.79, Total Timesteps: 10240, Time: Tue Jan 28 23:41:13 2025

Training Episode 95 started.
-----------------------------
| time/              |      |
|    fps             | 577  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
-----------------------------------------
| time/                   |             |
|    fps                  | 502         |
|    iterations           | 2           |
|    time_elapsed         | 8           |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.055132233 |
|    clip_fraction        | 0.443       |
|    clip_range           | 0.2         |
|    entropy_loss         | 0.0318      |
|    explained_variance   | 0.906       |
|    learning_rate        | 0.0003      |
|    loss                 | 13.3        |
|    n_updates            | 4710        |
|    policy_gradie

 95%|█████████▌| 95/100 [21:15:36<16:09, 193.98s/it]

Episode: 95, Mean Reward: 27252.33, Std Reward: 156.14, Total Timesteps: 10240, Time: Tue Jan 28 23:43:56 2025
New best model with Mean Reward: 27252.33. Saving model.

Training Episode 96 started.
-----------------------------
| time/              |      |
|    fps             | 479  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 395        |
|    iterations           | 2          |
|    time_elapsed         | 10         |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.25605333 |
|    clip_fraction        | 0.579      |
|    clip_range           | 0.2        |
|    entropy_loss         | 0.12       |
|    explained_variance   | 0.965      |
|    learning_rate        | 0.0003     |
|    loss                 | 3.31       |
|    n_updates    

 96%|█████████▌| 96/100 [21:18:33<12:35, 188.83s/it]

Episode: 96, Mean Reward: 18758.76, Std Reward: 8943.69, Total Timesteps: 10240, Time: Tue Jan 28 23:46:52 2025

Training Episode 97 started.
-----------------------------
| time/              |      |
|    fps             | 556  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 476        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.09111738 |
|    clip_fraction        | 0.426      |
|    clip_range           | 0.2        |
|    entropy_loss         | 0.0957     |
|    explained_variance   | 0.957      |
|    learning_rate        | 0.0003     |
|    loss                 | 2.74       |
|    n_updates            | 4810       |
|    policy_gradient_loss | -0.03

 97%|█████████▋| 97/100 [21:21:10<08:58, 179.36s/it]

Episode: 97, Mean Reward: 20556.61, Std Reward: 9425.71, Total Timesteps: 10240, Time: Tue Jan 28 23:49:30 2025

Training Episode 98 started.
-----------------------------
| time/              |      |
|    fps             | 564  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
---------------------------------------
| time/                   |           |
|    fps                  | 491       |
|    iterations           | 2         |
|    time_elapsed         | 8         |
|    total_timesteps      | 4096      |
| train/                  |           |
|    approx_kl            | 0.1406777 |
|    clip_fraction        | 0.511     |
|    clip_range           | 0.2       |
|    entropy_loss         | 0.101     |
|    explained_variance   | 0.987     |
|    learning_rate        | 0.0003    |
|    loss                 | 3.41      |
|    n_updates            | 4860      |
|    policy_gradient_loss | -0.0339   |
|    std

 98%|█████████▊| 98/100 [21:23:48<05:46, 173.05s/it]

Episode: 98, Mean Reward: 22455.14, Std Reward: 7610.76, Total Timesteps: 10240, Time: Tue Jan 28 23:52:08 2025

Training Episode 99 started.
-----------------------------
| time/              |      |
|    fps             | 557  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 463        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.08963385 |
|    clip_fraction        | 0.391      |
|    clip_range           | 0.2        |
|    entropy_loss         | 0.188      |
|    explained_variance   | 0.977      |
|    learning_rate        | 0.0003     |
|    loss                 | 4.61       |
|    n_updates            | 4910       |
|    policy_gradient_loss | -0.04

 99%|█████████▉| 99/100 [21:26:29<02:49, 169.33s/it]

Episode: 99, Mean Reward: 22837.91, Std Reward: 7830.82, Total Timesteps: 10240, Time: Tue Jan 28 23:54:49 2025

Training Episode 100 started.
-----------------------------
| time/              |      |
|    fps             | 534  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 480        |
|    iterations           | 2          |
|    time_elapsed         | 8          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.13267602 |
|    clip_fraction        | 0.566      |
|    clip_range           | 0.2        |
|    entropy_loss         | 0.322      |
|    explained_variance   | 0.996      |
|    learning_rate        | 0.0003     |
|    loss                 | 1.37       |
|    n_updates            | 4960       |
|    policy_gradient_loss | -0.0

100%|██████████| 100/100 [21:29:04<00:00, 773.44s/it]

Episode: 100, Mean Reward: 20909.02, Std Reward: 7116.64, Total Timesteps: 10240, Time: Tue Jan 28 23:57:24 2025
Model saved at Episode 100.
Training completed.





error: Not connected to physics server.