# Basis _reinforcement learning_

## CartPole

CartPole is een klassiek controle probleem waarbij een staaf rechtop moet blijven op een kar die heen en weer kan bewegen.

[![CartPole](https://gymnasium.farama.org/_images/cart_pole.gif)](https://gymnasium.farama.org/)

### Het Probleem
- **State**: 4 continue waarden (positie kar, snelheid kar, hoek staaf, hoeksnelheid staaf)
- **Actions**: 2 discrete acties (duw naar links of rechts)
- **Reward**: +1 voor elke tijdstap waarbij de staaf rechtop blijft
- **Doel**: Hou de staaf zo lang mogelijk rechtop (max 500 tijdstappen)

### Setup
We gebruiken:
- [Gymnasium](https://gymnasium.farama.org/index.html): Een framework voor RL omgevingen (oorspronkelijk van OpenAI)
- [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/#): Kwalitatieve PyTorch implementaties van RL algoritmes

In [1]:
import gymnasium as gym
import numpy as np
import pandas as pd
import plotly.express as px
import torch
# from matplotlib import animation


# print(f"CUDA available: {torch.cuda.is_available()}")

### Stap 1: De Omgeving Verkennen

In [2]:
# Create the CartPole environment
env = gym.make("CartPole-v1", render_mode="rgb_array")

# Reset environment to get initial state
state, info = env.reset(seed=42)

print("=== CartPole Environment ===")
print(f"State space: {env.observation_space}")
print(f"Action space: {env.action_space}")
print(f"\nInitial state: {state}")
print("\nState components:")
print(f"  [0] Cart Position: {state[0]:.3f}")
print(f"  [1] Cart Velocity: {state[1]:.3f}")
print(f"  [2] Pole Angle: {state[2]:.3f}")
print(f"  [3] Pole Angular Velocity: {state[3]:.3f}")
print("\nPossible actions:")
print("  0: Push cart to the LEFT")
print("  1: Push cart to the RIGHT")

=== CartPole Environment ===
State space: Box([-4.8               -inf -0.41887903        -inf], [4.8               inf 0.41887903        inf], (4,), float32)
Action space: Discrete(2)

Initial state: [ 0.0273956  -0.00611216  0.03585979  0.0197368 ]

State components:
  [0] Cart Position: 0.027
  [1] Cart Velocity: -0.006
  [2] Pole Angle: 0.036
  [3] Pole Angular Velocity: 0.020

Possible actions:
  0: Push cart to the LEFT
  1: Push cart to the RIGHT


### Stap 2: Baseline agent

Voordat we een intelligent model trainen, kijken we eerst hoe een **random agent** (die willekeurige acties neemt) presteert. Dit geeft ons een baseline.

In [3]:
# Test random agent
def evaluate_random_agent(env, n_episodes=10, seed=42):
    """
    Evaluate a random agent that takes random actions.

    Args:
        env: Gymnasium environment
        n_episodes: Number of episodes to run
        seed: Random seed for reproducibility

    Returns
    -------
        List of episode rewards
    """
    episode_rewards = []

    for episode in range(n_episodes):
        state, info = env.reset(seed=seed + episode)
        episode_reward = 0
        done = False
        truncated = False

        while not (done or truncated):
            # Random action
            action = env.action_space.sample()
            state, reward, done, truncated, info = env.step(action)
            episode_reward += reward

        episode_rewards.append(episode_reward)

    return episode_rewards


# Evaluate random agent
random_rewards = evaluate_random_agent(env, n_episodes=100)

print("=== Random Agent Performance ===")
print(f"Average reward: {np.mean(random_rewards):.2f} ± {np.std(random_rewards):.2f}")
print(f"Min reward: {np.min(random_rewards):.2f}")
print(f"Max reward: {np.max(random_rewards):.2f}")

px.histogram(random_rewards, nbins=20, title="Random Agent: Reward Distribution").add_vline(
    x=np.mean(random_rewards),
    line_dash="dash",
    line_color="red",
    annotation_text=f"Mean: {np.mean(random_rewards):.1f}",
).show()

=== Random Agent Performance ===
Average reward: 21.69 ± 10.88
Min reward: 9.00
Max reward: 65.00


In [4]:
px.line(
    y=random_rewards,
    title="Random Agent: Reward per Episode",
    labels={"x": "Episode", "y": "Reward"},
).add_hline(
    y=np.mean(random_rewards),
    line_dash="dash",
    line_color="red",
    annotation_text=f"Mean: {np.mean(random_rewards):.1f}",
).show()


### Stap 3: Training met Deep Q-Network (DQN)

Nu gaan we een **Deep Q-Network (DQN)** trainen om een intelligente policy te leren. DQN is een value-based methode die een neural network gebruikt om de optimale $Q$-functie $Q^*(s,a)$ te benaderen.

In [5]:
from stable_baselines3 import DQN
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.monitor import Monitor

In [6]:
# Create a fresh environment for training
env = gym.make("CartPole-v1")

# Create DQN model with better hyperparameters
# The neural network will learn Q(s,a) for each state-action pair
model = DQN(
    "MlpPolicy",  # Multi-Layer Perceptron policy network
    env,
    learning_rate=1e-3,
    buffer_size=50000,
    learning_starts=1000,  # Start learning after more experiences
    batch_size=64,  # Larger batch size for more stable learning
    tau=1.0,
    gamma=0.99,  # Discount factor
    train_freq=4,
    target_update_interval=250,
    exploration_fraction=0.1,
    exploration_initial_eps=1.0,
    exploration_final_eps=0.02,  # Lower final exploration
    verbose=1,  # Show training progress
    tensorboard_log=None,
)

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.


In [7]:
# Train the agent for longer
model.learn(total_timesteps=100000, progress_bar=True)

# Evaluate the trained model (wrap env with Monitor to avoid warning)
eval_env = Monitor(gym.make("CartPole-v1"))
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=100, deterministic=True)
eval_env.close()

print("\n=== Trained DQN Agent Performance ===")
print(f"Mean reward: {mean_reward:.2f} ± {std_reward:.2f}")
print("\nComparison:")
print(f"  Random Agent: {np.mean(random_rewards):.2f} ± {np.std(random_rewards):.2f}")
print(f"  Trained DQN:  {mean_reward:.2f} ± {std_reward:.2f}")
if mean_reward > np.mean(random_rewards):
    print(f"  Improvement:  {((mean_reward / np.mean(random_rewards)) - 1) * 100:.1f}%")
else:
    print(f"  Performance:  {(mean_reward / np.mean(random_rewards)) * 100:.1f}% of random agent")
    print("  ⚠️  Model needs more training or hyperparameter tuning!")

Output()

----------------------------------
| rollout/            |          |
|    ep_len_mean      | 31       |
|    ep_rew_mean      | 31       |
|    exploration_rate | 0.988    |
| time/               |          |
|    episodes         | 4        |
|    fps              | 1531     |
|    time_elapsed     | 0        |
|    total_timesteps  | 124      |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 26.1     |
|    ep_rew_mean      | 26.1     |
|    exploration_rate | 0.98     |
| time/               |          |
|    episodes         | 8        |
|    fps              | 2191     |
|    time_elapsed     | 0        |
|    total_timesteps  | 209      |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 23.4     |
|    ep_rew_mean      | 23.4     |
|    exploration_rate | 0.972    |
| time/               |          |
|    episodes       

----------------------------------
| rollout/            |          |
|    ep_len_mean      | 23.5     |
|    ep_rew_mean      | 23.5     |
|    exploration_rate | 0.954    |
| time/               |          |
|    episodes         | 20       |
|    fps              | 3103     |
|    time_elapsed     | 0        |
|    total_timesteps  | 470      |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 23.7     |
|    ep_rew_mean      | 23.7     |
|    exploration_rate | 0.944    |
| time/               |          |
|    episodes         | 24       |
|    fps              | 3440     |
|    time_elapsed     | 0        |
|    total_timesteps  | 569      |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 23.7     |
|    ep_rew_mean      | 23.7     |
|    exploration_rate | 0.944    |
| time/               |          |
|    episodes       

----------------------------------
| rollout/            |          |
|    ep_len_mean      | 25       |
|    ep_rew_mean      | 25       |
|    exploration_rate | 0.892    |
| time/               |          |
|    episodes         | 44       |
|    fps              | 1228     |
|    time_elapsed     | 0        |
|    total_timesteps  | 1102     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0545   |
|    n_updates        | 25       |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 24.7     |
|    ep_rew_mean      | 24.7     |
|    exploration_rate | 0.884    |
| time/               |          |
|    episodes         | 48       |
|    fps              | 974      |
|    time_elapsed     | 1        |
|    total_timesteps  | 1187     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0109   |
|    n_updates        | 46       |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 24.3     |
|    ep_rew_mean      | 24.3     |
|    exploration_rate | 0.876    |
| time/               |          |
|    episodes         | 52       |
|    fps              | 843      |
|    time_elapsed     | 1        |
|    total_timesteps  | 1262     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.428    |
|    n_updates        | 65       |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 25.3     |
|    ep_rew_mean      | 25.3     |
|    exploration_rate | 0.861    |
| time/               |          |
|    episodes         | 56       |
|    fps              | 778      |
|    time_elapsed     | 1        |
|    total_timesteps  | 1416     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0639   |
|    n_updates        | 103      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 24.6     |
|    ep_rew_mean      | 24.6     |
|    exploration_rate | 0.855    |
| time/               |          |
|    episodes         | 60       |
|    fps              | 763      |
|    time_elapsed     | 1        |
|    total_timesteps  | 1478     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0501   |
|    n_updates        | 119      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 23.7     |
|    ep_rew_mean      | 23.7     |
|    exploration_rate | 0.833    |
| time/               |          |
|    episodes         | 72       |
|    fps              | 716      |
|    time_elapsed     | 2        |
|    total_timesteps  | 1703     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.169    |
|    n_updates        | 175      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 23.4     |
|    ep_rew_mean      | 23.4     |
|    exploration_rate | 0.826    |
| time/               |          |
|    episodes         | 76       |
|    fps              | 711      |
|    time_elapsed     | 2        |
|    total_timesteps  | 1775     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.221    |
|    n_updates        | 193      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 22.9     |
|    ep_rew_mean      | 22.9     |
|    exploration_rate | 0.821    |
| time/               |          |
|    episodes         | 80       |
|    fps              | 683      |
|    time_elapsed     | 2        |
|    total_timesteps  | 1831     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.11     |
|    n_updates        | 207      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 22.8     |
|    ep_rew_mean      | 22.8     |
|    exploration_rate | 0.812    |
| time/               |          |
|    episodes         | 84       |
|    fps              | 660      |
|    time_elapsed     | 2        |
|    total_timesteps  | 1915     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.206    |
|    n_updates        | 228      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 22.5     |
|    ep_rew_mean      | 22.5     |
|    exploration_rate | 0.806    |
| time/               |          |
|    episodes         | 88       |
|    fps              | 613      |
|    time_elapsed     | 3        |
|    total_timesteps  | 1984     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0418   |
|    n_updates        | 245      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 22.2     |
|    ep_rew_mean      | 22.2     |
|    exploration_rate | 0.799    |
| time/               |          |
|    episodes         | 92       |
|    fps              | 603      |
|    time_elapsed     | 3        |
|    total_timesteps  | 2047     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.112    |
|    n_updates        | 261      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 22.1     |
|    ep_rew_mean      | 22.1     |
|    exploration_rate | 0.792    |
| time/               |          |
|    episodes         | 96       |
|    fps              | 595      |
|    time_elapsed     | 3        |
|    total_timesteps  | 2124     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0669   |
|    n_updates        | 280      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 21.9     |
|    ep_rew_mean      | 21.9     |
|    exploration_rate | 0.785    |
| time/               |          |
|    episodes         | 100      |
|    fps              | 591      |
|    time_elapsed     | 3        |
|    total_timesteps  | 2194     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.139    |
|    n_updates        | 298      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 21.5     |
|    ep_rew_mean      | 21.5     |
|    exploration_rate | 0.778    |
| time/               |          |
|    episodes         | 104      |
|    fps              | 595      |
|    time_elapsed     | 3        |
|    total_timesteps  | 2270     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.189    |
|    n_updates        | 317      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 21.8     |
|    ep_rew_mean      | 21.8     |
|    exploration_rate | 0.766    |
| time/               |          |
|    episodes         | 108      |
|    fps              | 591      |
|    time_elapsed     | 4        |
|    total_timesteps  | 2388     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0661   |
|    n_updates        | 346      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 22       |
|    ep_rew_mean      | 22       |
|    exploration_rate | 0.757    |
| time/               |          |
|    episodes         | 112      |
|    fps              | 580      |
|    time_elapsed     | 4        |
|    total_timesteps  | 2482     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0618   |
|    n_updates        | 370      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 22.5     |
|    ep_rew_mean      | 22.5     |
|    exploration_rate | 0.741    |
| time/               |          |
|    episodes         | 116      |
|    fps              | 593      |
|    time_elapsed     | 4        |
|    total_timesteps  | 2647     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0783   |
|    n_updates        | 411      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 22.6     |
|    ep_rew_mean      | 22.6     |
|    exploration_rate | 0.732    |
| time/               |          |
|    episodes         | 120      |
|    fps              | 594      |
|    time_elapsed     | 4        |
|    total_timesteps  | 2732     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.205    |
|    n_updates        | 432      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 23.7     |
|    ep_rew_mean      | 23.7     |
|    exploration_rate | 0.712    |
| time/               |          |
|    episodes         | 124      |
|    fps              | 579      |
|    time_elapsed     | 5        |
|    total_timesteps  | 2943     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.186    |
|    n_updates        | 485      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 24.1     |
|    ep_rew_mean      | 24.1     |
|    exploration_rate | 0.696    |
| time/               |          |
|    episodes         | 128      |
|    fps              | 579      |
|    time_elapsed     | 5        |
|    total_timesteps  | 3103     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.221    |
|    n_updates        | 525      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 23.7     |
|    ep_rew_mean      | 23.7     |
|    exploration_rate | 0.689    |
| time/               |          |
|    episodes         | 132      |
|    fps              | 566      |
|    time_elapsed     | 5        |
|    total_timesteps  | 3169     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.098    |
|    n_updates        | 542      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 24.6     |
|    ep_rew_mean      | 24.6     |
|    exploration_rate | 0.672    |
| time/               |          |
|    episodes         | 136      |
|    fps              | 553      |
|    time_elapsed     | 6        |
|    total_timesteps  | 3344     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.233    |
|    n_updates        | 585      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 25.1     |
|    ep_rew_mean      | 25.1     |
|    exploration_rate | 0.656    |
| time/               |          |
|    episodes         | 140      |
|    fps              | 508      |
|    time_elapsed     | 6        |
|    total_timesteps  | 3507     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.825    |
|    n_updates        | 626      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 25.6     |
|    ep_rew_mean      | 25.6     |
|    exploration_rate | 0.641    |
| time/               |          |
|    episodes         | 144      |
|    fps              | 486      |
|    time_elapsed     | 7        |
|    total_timesteps  | 3664     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.111    |
|    n_updates        | 665      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 26.4     |
|    ep_rew_mean      | 26.4     |
|    exploration_rate | 0.625    |
| time/               |          |
|    episodes         | 148      |
|    fps              | 487      |
|    time_elapsed     | 7        |
|    total_timesteps  | 3827     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.953    |
|    n_updates        | 706      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 27.5     |
|    ep_rew_mean      | 27.5     |
|    exploration_rate | 0.607    |
| time/               |          |
|    episodes         | 152      |
|    fps              | 482      |
|    time_elapsed     | 8        |
|    total_timesteps  | 4011     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.658    |
|    n_updates        | 752      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 31.7     |
|    ep_rew_mean      | 31.7     |
|    exploration_rate | 0.551    |
| time/               |          |
|    episodes         | 156      |
|    fps              | 470      |
|    time_elapsed     | 9        |
|    total_timesteps  | 4584     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.312    |
|    n_updates        | 895      |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 36.5     |
|    ep_rew_mean      | 36.5     |
|    exploration_rate | 0.498    |
| time/               |          |
|    episodes         | 160      |
|    fps              | 469      |
|    time_elapsed     | 10       |
|    total_timesteps  | 5124     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.363    |
|    n_updates        | 1030     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 39       |
|    ep_rew_mean      | 39       |
|    exploration_rate | 0.466    |
| time/               |          |
|    episodes         | 164      |
|    fps              | 469      |
|    time_elapsed     | 11       |
|    total_timesteps  | 5451     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.159    |
|    n_updates        | 1112     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 45.8     |
|    ep_rew_mean      | 45.8     |
|    exploration_rate | 0.393    |
| time/               |          |
|    episodes         | 168      |
|    fps              | 462      |
|    time_elapsed     | 13       |
|    total_timesteps  | 6197     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.271    |
|    n_updates        | 1299     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 54.4     |
|    ep_rew_mean      | 54.4     |
|    exploration_rate | 0.3      |
| time/               |          |
|    episodes         | 172      |
|    fps              | 472      |
|    time_elapsed     | 15       |
|    total_timesteps  | 7138     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.282    |
|    n_updates        | 1534     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 60.6     |
|    ep_rew_mean      | 60.6     |
|    exploration_rate | 0.232    |
| time/               |          |
|    episodes         | 176      |
|    fps              | 479      |
|    time_elapsed     | 16       |
|    total_timesteps  | 7832     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.382    |
|    n_updates        | 1707     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 69.2     |
|    ep_rew_mean      | 69.2     |
|    exploration_rate | 0.143    |
| time/               |          |
|    episodes         | 180      |
|    fps              | 487      |
|    time_elapsed     | 17       |
|    total_timesteps  | 8747     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.439    |
|    n_updates        | 1936     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 78.5     |
|    ep_rew_mean      | 78.5     |
|    exploration_rate | 0.043    |
| time/               |          |
|    episodes         | 184      |
|    fps              | 491      |
|    time_elapsed     | 19       |
|    total_timesteps  | 9765     |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.477    |
|    n_updates        | 2191     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 88.7     |
|    ep_rew_mean      | 88.7     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 188      |
|    fps              | 496      |
|    time_elapsed     | 21       |
|    total_timesteps  | 10857    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0332   |
|    n_updates        | 2464     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 110      |
|    ep_rew_mean      | 110      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 196      |
|    fps              | 509      |
|    time_elapsed     | 25       |
|    total_timesteps  | 13166    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.627    |
|    n_updates        | 3041     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 118      |
|    ep_rew_mean      | 118      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 200      |
|    fps              | 488      |
|    time_elapsed     | 28       |
|    total_timesteps  | 13999    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0229   |
|    n_updates        | 3249     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 128      |
|    ep_rew_mean      | 128      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 204      |
|    fps              | 493      |
|    time_elapsed     | 30       |
|    total_timesteps  | 15041    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.3      |
|    n_updates        | 3510     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 138      |
|    ep_rew_mean      | 138      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 208      |
|    fps              | 494      |
|    time_elapsed     | 32       |
|    total_timesteps  | 16233    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0187   |
|    n_updates        | 3808     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 149      |
|    ep_rew_mean      | 149      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 212      |
|    fps              | 501      |
|    time_elapsed     | 34       |
|    total_timesteps  | 17369    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.571    |
|    n_updates        | 4092     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 156      |
|    ep_rew_mean      | 156      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 216      |
|    fps              | 505      |
|    time_elapsed     | 36       |
|    total_timesteps  | 18258    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.829    |
|    n_updates        | 4314     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 161      |
|    ep_rew_mean      | 161      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 220      |
|    fps              | 507      |
|    time_elapsed     | 37       |
|    total_timesteps  | 18847    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0218   |
|    n_updates        | 4461     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 165      |
|    ep_rew_mean      | 165      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 224      |
|    fps              | 507      |
|    time_elapsed     | 38       |
|    total_timesteps  | 19434    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.858    |
|    n_updates        | 4608     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 170      |
|    ep_rew_mean      | 170      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 228      |
|    fps              | 506      |
|    time_elapsed     | 39       |
|    total_timesteps  | 20053    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0388   |
|    n_updates        | 4763     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 175      |
|    ep_rew_mean      | 175      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 232      |
|    fps              | 509      |
|    time_elapsed     | 40       |
|    total_timesteps  | 20674    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.823    |
|    n_updates        | 4918     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 183      |
|    ep_rew_mean      | 183      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 236      |
|    fps              | 519      |
|    time_elapsed     | 41       |
|    total_timesteps  | 21644    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.818    |
|    n_updates        | 5160     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 187      |
|    ep_rew_mean      | 187      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 240      |
|    fps              | 522      |
|    time_elapsed     | 42       |
|    total_timesteps  | 22159    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.831    |
|    n_updates        | 5289     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 190      |
|    ep_rew_mean      | 190      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 244      |
|    fps              | 519      |
|    time_elapsed     | 43       |
|    total_timesteps  | 22700    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0383   |
|    n_updates        | 5424     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 193      |
|    ep_rew_mean      | 193      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 248      |
|    fps              | 518      |
|    time_elapsed     | 44       |
|    total_timesteps  | 23118    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.07     |
|    n_updates        | 5529     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 196      |
|    ep_rew_mean      | 196      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 252      |
|    fps              | 518      |
|    time_elapsed     | 45       |
|    total_timesteps  | 23634    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0156   |
|    n_updates        | 5658     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 194      |
|    ep_rew_mean      | 194      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 256      |
|    fps              | 520      |
|    time_elapsed     | 46       |
|    total_timesteps  | 23977    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.77     |
|    n_updates        | 5744     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 189      |
|    ep_rew_mean      | 189      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 260      |
|    fps              | 521      |
|    time_elapsed     | 46       |
|    total_timesteps  | 24053    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.75     |
|    n_updates        | 5763     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 188      |
|    ep_rew_mean      | 188      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 264      |
|    fps              | 520      |
|    time_elapsed     | 46       |
|    total_timesteps  | 24215    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0102   |
|    n_updates        | 5803     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 183      |
|    ep_rew_mean      | 183      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 268      |
|    fps              | 520      |
|    time_elapsed     | 46       |
|    total_timesteps  | 24450    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0226   |
|    n_updates        | 5862     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 176      |
|    ep_rew_mean      | 176      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 272      |
|    fps              | 520      |
|    time_elapsed     | 47       |
|    total_timesteps  | 24690    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0154   |
|    n_updates        | 5922     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 173      |
|    ep_rew_mean      | 173      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 276      |
|    fps              | 515      |
|    time_elapsed     | 48       |
|    total_timesteps  | 25089    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0198   |
|    n_updates        | 6022     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 169      |
|    ep_rew_mean      | 169      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 280      |
|    fps              | 515      |
|    time_elapsed     | 49       |
|    total_timesteps  | 25629    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0476   |
|    n_updates        | 6157     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 163      |
|    ep_rew_mean      | 163      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 284      |
|    fps              | 513      |
|    time_elapsed     | 50       |
|    total_timesteps  | 26101    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0151   |
|    n_updates        | 6275     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 156      |
|    ep_rew_mean      | 156      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 288      |
|    fps              | 512      |
|    time_elapsed     | 51       |
|    total_timesteps  | 26475    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0316   |
|    n_updates        | 6368     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 145      |
|    ep_rew_mean      | 145      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 292      |
|    fps              | 513      |
|    time_elapsed     | 52       |
|    total_timesteps  | 26825    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0203   |
|    n_updates        | 6456     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 141      |
|    ep_rew_mean      | 141      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 296      |
|    fps              | 511      |
|    time_elapsed     | 53       |
|    total_timesteps  | 27248    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.19     |
|    n_updates        | 6561     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 135      |
|    ep_rew_mean      | 135      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 300      |
|    fps              | 505      |
|    time_elapsed     | 54       |
|    total_timesteps  | 27483    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.864    |
|    n_updates        | 6620     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 127      |
|    ep_rew_mean      | 127      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 304      |
|    fps              | 501      |
|    time_elapsed     | 55       |
|    total_timesteps  | 27719    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0232   |
|    n_updates        | 6679     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 118      |
|    ep_rew_mean      | 118      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 308      |
|    fps              | 491      |
|    time_elapsed     | 57       |
|    total_timesteps  | 28044    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0191   |
|    n_updates        | 6760     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 110      |
|    ep_rew_mean      | 110      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 312      |
|    fps              | 487      |
|    time_elapsed     | 58       |
|    total_timesteps  | 28409    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.902    |
|    n_updates        | 6852     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 105      |
|    ep_rew_mean      | 105      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 316      |
|    fps              | 487      |
|    time_elapsed     | 59       |
|    total_timesteps  | 28754    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0854   |
|    n_updates        | 6938     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 101      |
|    ep_rew_mean      | 101      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 320      |
|    fps              | 482      |
|    time_elapsed     | 60       |
|    total_timesteps  | 28943    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.15     |
|    n_updates        | 6985     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 97       |
|    ep_rew_mean      | 97       |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 324      |
|    fps              | 480      |
|    time_elapsed     | 60       |
|    total_timesteps  | 29130    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.08     |
|    n_updates        | 7032     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 93.4     |
|    ep_rew_mean      | 93.4     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 328      |
|    fps              | 475      |
|    time_elapsed     | 61       |
|    total_timesteps  | 29392    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0304   |
|    n_updates        | 7097     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 90.2     |
|    ep_rew_mean      | 90.2     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 332      |
|    fps              | 471      |
|    time_elapsed     | 63       |
|    total_timesteps  | 29694    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.27     |
|    n_updates        | 7173     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 84.8     |
|    ep_rew_mean      | 84.8     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 336      |
|    fps              | 470      |
|    time_elapsed     | 64       |
|    total_timesteps  | 30121    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.05     |
|    n_updates        | 7280     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 83.4     |
|    ep_rew_mean      | 83.4     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 340      |
|    fps              | 471      |
|    time_elapsed     | 64       |
|    total_timesteps  | 30503    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0889   |
|    n_updates        | 7375     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 82.9     |
|    ep_rew_mean      | 82.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 344      |
|    fps              | 471      |
|    time_elapsed     | 65       |
|    total_timesteps  | 30992    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0175   |
|    n_updates        | 7497     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 83.3     |
|    ep_rew_mean      | 83.3     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 348      |
|    fps              | 473      |
|    time_elapsed     | 66       |
|    total_timesteps  | 31444    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.85     |
|    n_updates        | 7610     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 81.9     |
|    ep_rew_mean      | 81.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 352      |
|    fps              | 472      |
|    time_elapsed     | 67       |
|    total_timesteps  | 31828    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.02     |
|    n_updates        | 7706     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 82.9     |
|    ep_rew_mean      | 82.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 356      |
|    fps              | 465      |
|    time_elapsed     | 69       |
|    total_timesteps  | 32264    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.31     |
|    n_updates        | 7815     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 85.4     |
|    ep_rew_mean      | 85.4     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 360      |
|    fps              | 461      |
|    time_elapsed     | 70       |
|    total_timesteps  | 32596    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0129   |
|    n_updates        | 7898     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 87.5     |
|    ep_rew_mean      | 87.5     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 364      |
|    fps              | 460      |
|    time_elapsed     | 71       |
|    total_timesteps  | 32966    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.61     |
|    n_updates        | 7991     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 88.3     |
|    ep_rew_mean      | 88.3     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 368      |
|    fps              | 459      |
|    time_elapsed     | 72       |
|    total_timesteps  | 33280    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 3.19     |
|    n_updates        | 8069     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 89.1     |
|    ep_rew_mean      | 89.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 372      |
|    fps              | 459      |
|    time_elapsed     | 73       |
|    total_timesteps  | 33599    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0132   |
|    n_updates        | 8149     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 89.3     |
|    ep_rew_mean      | 89.3     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 376      |
|    fps              | 460      |
|    time_elapsed     | 73       |
|    total_timesteps  | 34021    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.22     |
|    n_updates        | 8255     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 87.5     |
|    ep_rew_mean      | 87.5     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 380      |
|    fps              | 460      |
|    time_elapsed     | 74       |
|    total_timesteps  | 34374    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0263   |
|    n_updates        | 8343     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 87.2     |
|    ep_rew_mean      | 87.2     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 384      |
|    fps              | 457      |
|    time_elapsed     | 76       |
|    total_timesteps  | 34825    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.27     |
|    n_updates        | 8456     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 89       |
|    ep_rew_mean      | 89       |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 388      |
|    fps              | 448      |
|    time_elapsed     | 78       |
|    total_timesteps  | 35379    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0262   |
|    n_updates        | 8594     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 90.9     |
|    ep_rew_mean      | 90.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 392      |
|    fps              | 447      |
|    time_elapsed     | 80       |
|    total_timesteps  | 35918    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0139   |
|    n_updates        | 8729     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 89.5     |
|    ep_rew_mean      | 89.5     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 396      |
|    fps              | 448      |
|    time_elapsed     | 80       |
|    total_timesteps  | 36194    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0272   |
|    n_updates        | 8798     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 90.7     |
|    ep_rew_mean      | 90.7     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 400      |
|    fps              | 448      |
|    time_elapsed     | 81       |
|    total_timesteps  | 36554    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0405   |
|    n_updates        | 8888     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 92.9     |
|    ep_rew_mean      | 92.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 404      |
|    fps              | 449      |
|    time_elapsed     | 82       |
|    total_timesteps  | 37009    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0401   |
|    n_updates        | 9002     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 93.5     |
|    ep_rew_mean      | 93.5     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 408      |
|    fps              | 450      |
|    time_elapsed     | 82       |
|    total_timesteps  | 37394    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.31     |
|    n_updates        | 9098     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 92.4     |
|    ep_rew_mean      | 92.4     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 412      |
|    fps              | 450      |
|    time_elapsed     | 83       |
|    total_timesteps  | 37647    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0403   |
|    n_updates        | 9161     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 90.6     |
|    ep_rew_mean      | 90.6     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 416      |
|    fps              | 450      |
|    time_elapsed     | 83       |
|    total_timesteps  | 37818    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.11     |
|    n_updates        | 9204     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 89.6     |
|    ep_rew_mean      | 89.6     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 420      |
|    fps              | 450      |
|    time_elapsed     | 84       |
|    total_timesteps  | 37906    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.09     |
|    n_updates        | 9226     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 90.1     |
|    ep_rew_mean      | 90.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 424      |
|    fps              | 450      |
|    time_elapsed     | 84       |
|    total_timesteps  | 38142    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 3.44     |
|    n_updates        | 9285     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 90       |
|    ep_rew_mean      | 90       |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 428      |
|    fps              | 450      |
|    time_elapsed     | 85       |
|    total_timesteps  | 38387    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.947    |
|    n_updates        | 9346     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 88.6     |
|    ep_rew_mean      | 88.6     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 432      |
|    fps              | 450      |
|    time_elapsed     | 85       |
|    total_timesteps  | 38556    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.28     |
|    n_updates        | 9388     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 85.1     |
|    ep_rew_mean      | 85.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 436      |
|    fps              | 451      |
|    time_elapsed     | 85       |
|    total_timesteps  | 38630    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.37     |
|    n_updates        | 9407     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 82.1     |
|    ep_rew_mean      | 82.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 440      |
|    fps              | 451      |
|    time_elapsed     | 85       |
|    total_timesteps  | 38711    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0312   |
|    n_updates        | 9427     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 78.9     |
|    ep_rew_mean      | 78.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 444      |
|    fps              | 451      |
|    time_elapsed     | 86       |
|    total_timesteps  | 38884    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.48     |
|    n_updates        | 9470     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 76.1     |
|    ep_rew_mean      | 76.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 448      |
|    fps              | 451      |
|    time_elapsed     | 86       |
|    total_timesteps  | 39053    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.3      |
|    n_updates        | 9513     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 74.7     |
|    ep_rew_mean      | 74.7     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 452      |
|    fps              | 448      |
|    time_elapsed     | 87       |
|    total_timesteps  | 39300    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.12     |
|    n_updates        | 9574     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 71.3     |
|    ep_rew_mean      | 71.3     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 456      |
|    fps              | 445      |
|    time_elapsed     | 88       |
|    total_timesteps  | 39390    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.98     |
|    n_updates        | 9597     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 68.8     |
|    ep_rew_mean      | 68.8     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 460      |
|    fps              | 445      |
|    time_elapsed     | 88       |
|    total_timesteps  | 39478    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.14     |
|    n_updates        | 9619     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 67.7     |
|    ep_rew_mean      | 67.7     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 464      |
|    fps              | 446      |
|    time_elapsed     | 88       |
|    total_timesteps  | 39737    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.25     |
|    n_updates        | 9684     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 67.9     |
|    ep_rew_mean      | 67.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 468      |
|    fps              | 445      |
|    time_elapsed     | 89       |
|    total_timesteps  | 40066    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.926    |
|    n_updates        | 9766     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 66.3     |
|    ep_rew_mean      | 66.3     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 472      |
|    fps              | 445      |
|    time_elapsed     | 90       |
|    total_timesteps  | 40226    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.08     |
|    n_updates        | 9806     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 62.9     |
|    ep_rew_mean      | 62.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 476      |
|    fps              | 445      |
|    time_elapsed     | 90       |
|    total_timesteps  | 40310    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 4.97     |
|    n_updates        | 9827     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 63.1     |
|    ep_rew_mean      | 63.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 480      |
|    fps              | 444      |
|    time_elapsed     | 91       |
|    total_timesteps  | 40689    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.1      |
|    n_updates        | 9922     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 60.9     |
|    ep_rew_mean      | 60.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 484      |
|    fps              | 445      |
|    time_elapsed     | 91       |
|    total_timesteps  | 40918    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.11     |
|    n_updates        | 9979     |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 59.7     |
|    ep_rew_mean      | 59.7     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 488      |
|    fps              | 445      |
|    time_elapsed     | 92       |
|    total_timesteps  | 41351    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0829   |
|    n_updates        | 10087    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 59.1     |
|    ep_rew_mean      | 59.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 492      |
|    fps              | 447      |
|    time_elapsed     | 93       |
|    total_timesteps  | 41828    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.07     |
|    n_updates        | 10206    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 60.9     |
|    ep_rew_mean      | 60.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 496      |
|    fps              | 447      |
|    time_elapsed     | 94       |
|    total_timesteps  | 42284    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.17     |
|    n_updates        | 10320    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 62.1     |
|    ep_rew_mean      | 62.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 500      |
|    fps              | 447      |
|    time_elapsed     | 95       |
|    total_timesteps  | 42761    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0367   |
|    n_updates        | 10440    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 62.5     |
|    ep_rew_mean      | 62.5     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 504      |
|    fps              | 446      |
|    time_elapsed     | 96       |
|    total_timesteps  | 43262    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0495   |
|    n_updates        | 10565    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 63.4     |
|    ep_rew_mean      | 63.4     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 508      |
|    fps              | 446      |
|    time_elapsed     | 98       |
|    total_timesteps  | 43729    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0212   |
|    n_updates        | 10682    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 65.6     |
|    ep_rew_mean      | 65.6     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 512      |
|    fps              | 445      |
|    time_elapsed     | 99       |
|    total_timesteps  | 44210    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.04     |
|    n_updates        | 10802    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 74.8     |
|    ep_rew_mean      | 74.8     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 520      |
|    fps              | 445      |
|    time_elapsed     | 101      |
|    total_timesteps  | 45391    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.1      |
|    n_updates        | 11097    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 78.9     |
|    ep_rew_mean      | 78.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 524      |
|    fps              | 447      |
|    time_elapsed     | 102      |
|    total_timesteps  | 46030    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.38     |
|    n_updates        | 11257    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 87.2     |
|    ep_rew_mean      | 87.2     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 528      |
|    fps              | 446      |
|    time_elapsed     | 105      |
|    total_timesteps  | 47111    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.025    |
|    n_updates        | 11527    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 91.9     |
|    ep_rew_mean      | 91.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 532      |
|    fps              | 446      |
|    time_elapsed     | 106      |
|    total_timesteps  | 47743    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.976    |
|    n_updates        | 11685    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 96.7     |
|    ep_rew_mean      | 96.7     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 536      |
|    fps              | 444      |
|    time_elapsed     | 108      |
|    total_timesteps  | 48295    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0254   |
|    n_updates        | 11823    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 101      |
|    ep_rew_mean      | 101      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 540      |
|    fps              | 445      |
|    time_elapsed     | 109      |
|    total_timesteps  | 48818    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0172   |
|    n_updates        | 11954    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 104      |
|    ep_rew_mean      | 104      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 544      |
|    fps              | 444      |
|    time_elapsed     | 111      |
|    total_timesteps  | 49320    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.935    |
|    n_updates        | 12079    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 106      |
|    ep_rew_mean      | 106      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 548      |
|    fps              | 442      |
|    time_elapsed     | 112      |
|    total_timesteps  | 49701    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0273   |
|    n_updates        | 12175    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 109      |
|    ep_rew_mean      | 109      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 552      |
|    fps              | 443      |
|    time_elapsed     | 113      |
|    total_timesteps  | 50220    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0162   |
|    n_updates        | 12304    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 114      |
|    ep_rew_mean      | 114      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 556      |
|    fps              | 443      |
|    time_elapsed     | 114      |
|    total_timesteps  | 50751    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.48     |
|    n_updates        | 12437    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 120      |
|    ep_rew_mean      | 120      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 560      |
|    fps              | 442      |
|    time_elapsed     | 116      |
|    total_timesteps  | 51454    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.35     |
|    n_updates        | 12613    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 124      |
|    ep_rew_mean      | 124      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 564      |
|    fps              | 444      |
|    time_elapsed     | 117      |
|    total_timesteps  | 52107    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.29     |
|    n_updates        | 12776    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 133      |
|    ep_rew_mean      | 133      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 568      |
|    fps              | 443      |
|    time_elapsed     | 120      |
|    total_timesteps  | 53352    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0172   |
|    n_updates        | 13087    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 145      |
|    ep_rew_mean      | 145      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 572      |
|    fps              | 435      |
|    time_elapsed     | 125      |
|    total_timesteps  | 54754    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0356   |
|    n_updates        | 13438    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 160      |
|    ep_rew_mean      | 160      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 576      |
|    fps              | 412      |
|    time_elapsed     | 136      |
|    total_timesteps  | 56262    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.48     |
|    n_updates        | 13815    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 174      |
|    ep_rew_mean      | 174      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 580      |
|    fps              | 407      |
|    time_elapsed     | 142      |
|    total_timesteps  | 58045    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.03     |
|    n_updates        | 14261    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 182      |
|    ep_rew_mean      | 182      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 584      |
|    fps              | 399      |
|    time_elapsed     | 147      |
|    total_timesteps  | 59105    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.376    |
|    n_updates        | 14526    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 188      |
|    ep_rew_mean      | 188      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 588      |
|    fps              | 397      |
|    time_elapsed     | 151      |
|    total_timesteps  | 60132    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0143   |
|    n_updates        | 14782    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 203      |
|    ep_rew_mean      | 203      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 592      |
|    fps              | 395      |
|    time_elapsed     | 157      |
|    total_timesteps  | 62132    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.5      |
|    n_updates        | 15282    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 218      |
|    ep_rew_mean      | 218      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 596      |
|    fps              | 396      |
|    time_elapsed     | 161      |
|    total_timesteps  | 64132    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0222   |
|    n_updates        | 15782    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 234      |
|    ep_rew_mean      | 234      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 600      |
|    fps              | 397      |
|    time_elapsed     | 166      |
|    total_timesteps  | 66132    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0228   |
|    n_updates        | 16282    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 249      |
|    ep_rew_mean      | 249      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 604      |
|    fps              | 402      |
|    time_elapsed     | 169      |
|    total_timesteps  | 68132    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0318   |
|    n_updates        | 16782    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 264      |
|    ep_rew_mean      | 264      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 608      |
|    fps              | 402      |
|    time_elapsed     | 174      |
|    total_timesteps  | 70132    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0518   |
|    n_updates        | 17282    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 279      |
|    ep_rew_mean      | 279      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 612      |
|    fps              | 402      |
|    time_elapsed     | 179      |
|    total_timesteps  | 72132    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0297   |
|    n_updates        | 17782    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 294      |
|    ep_rew_mean      | 294      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 616      |
|    fps              | 401      |
|    time_elapsed     | 184      |
|    total_timesteps  | 74132    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0411   |
|    n_updates        | 18282    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 303      |
|    ep_rew_mean      | 303      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 620      |
|    fps              | 402      |
|    time_elapsed     | 188      |
|    total_timesteps  | 75660    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0225   |
|    n_updates        | 18664    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 307      |
|    ep_rew_mean      | 307      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 624      |
|    fps              | 399      |
|    time_elapsed     | 192      |
|    total_timesteps  | 76685    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.512    |
|    n_updates        | 18921    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 299      |
|    ep_rew_mean      | 299      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 628      |
|    fps              | 399      |
|    time_elapsed     | 192      |
|    total_timesteps  | 77056    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0784   |
|    n_updates        | 19013    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 294      |
|    ep_rew_mean      | 294      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 632      |
|    fps              | 399      |
|    time_elapsed     | 192      |
|    total_timesteps  | 77181    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0514   |
|    n_updates        | 19045    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 294      |
|    ep_rew_mean      | 294      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 636      |
|    fps              | 400      |
|    time_elapsed     | 193      |
|    total_timesteps  | 77694    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0196   |
|    n_updates        | 19173    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 291      |
|    ep_rew_mean      | 291      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 640      |
|    fps              | 400      |
|    time_elapsed     | 194      |
|    total_timesteps  | 77873    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0321   |
|    n_updates        | 19218    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 288      |
|    ep_rew_mean      | 288      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 644      |
|    fps              | 398      |
|    time_elapsed     | 196      |
|    total_timesteps  | 78165    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.046    |
|    n_updates        | 19291    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 285      |
|    ep_rew_mean      | 285      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 648      |
|    fps              | 398      |
|    time_elapsed     | 196      |
|    total_timesteps  | 78240    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0208   |
|    n_updates        | 19309    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 281      |
|    ep_rew_mean      | 281      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 652      |
|    fps              | 398      |
|    time_elapsed     | 196      |
|    total_timesteps  | 78345    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0363   |
|    n_updates        | 19336    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 279      |
|    ep_rew_mean      | 279      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 656      |
|    fps              | 398      |
|    time_elapsed     | 197      |
|    total_timesteps  | 78615    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0394   |
|    n_updates        | 19403    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 272      |
|    ep_rew_mean      | 272      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 660      |
|    fps              | 398      |
|    time_elapsed     | 197      |
|    total_timesteps  | 78686    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0249   |
|    n_updates        | 19421    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 266      |
|    ep_rew_mean      | 266      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 664      |
|    fps              | 398      |
|    time_elapsed     | 197      |
|    total_timesteps  | 78751    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 3.08     |
|    n_updates        | 19437    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 255      |
|    ep_rew_mean      | 255      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 668      |
|    fps              | 399      |
|    time_elapsed     | 197      |
|    total_timesteps  | 78821    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.022    |
|    n_updates        | 19455    |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 241      |
|    ep_rew_mean      | 241      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 672      |
|    fps              | 399      |
|    time_elapsed     | 197      |
|    total_timesteps  | 78884    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0598   |
|    n_updates      

----------------------------------
| rollout/            |          |
|    ep_len_mean      | 227      |
|    ep_rew_mean      | 227      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 676      |
|    fps              | 399      |
|    time_elapsed     | 197      |
|    total_timesteps  | 78949    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0331   |
|    n_updates        | 19487    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 210      |
|    ep_rew_mean      | 210      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 680      |
|    fps              | 399      |
|    time_elapsed     | 197      |
|    total_timesteps  | 79008    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0536   |
|    n_updates        | 19501    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 200      |
|    ep_rew_mean      | 200      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 684      |
|    fps              | 399      |
|    time_elapsed     | 197      |
|    total_timesteps  | 79074    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.769    |
|    n_updates        | 19518    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 190      |
|    ep_rew_mean      | 190      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 688      |
|    fps              | 399      |
|    time_elapsed     | 198      |
|    total_timesteps  | 79133    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.65     |
|    n_updates        | 19533    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 171      |
|    ep_rew_mean      | 171      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 692      |
|    fps              | 399      |
|    time_elapsed     | 198      |
|    total_timesteps  | 79200    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.4      |
|    n_updates        | 19549    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 151      |
|    ep_rew_mean      | 151      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 696      |
|    fps              | 398      |
|    time_elapsed     | 198      |
|    total_timesteps  | 79260    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.67     |
|    n_updates        | 19564    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 132      |
|    ep_rew_mean      | 132      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 700      |
|    fps              | 398      |
|    time_elapsed     | 198      |
|    total_timesteps  | 79324    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 4.5      |
|    n_updates        | 19580    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 112      |
|    ep_rew_mean      | 112      |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 704      |
|    fps              | 397      |
|    time_elapsed     | 199      |
|    total_timesteps  | 79382    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.59     |
|    n_updates        | 19595    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 93.1     |
|    ep_rew_mean      | 93.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 708      |
|    fps              | 397      |
|    time_elapsed     | 199      |
|    total_timesteps  | 79439    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0294   |
|    n_updates        | 19609    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 73.8     |
|    ep_rew_mean      | 73.8     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 712      |
|    fps              | 396      |
|    time_elapsed     | 200      |
|    total_timesteps  | 79509    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0455   |
|    n_updates        | 19627    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 54.4     |
|    ep_rew_mean      | 54.4     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 716      |
|    fps              | 396      |
|    time_elapsed     | 200      |
|    total_timesteps  | 79570    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.63     |
|    n_updates        | 19642    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 39.7     |
|    ep_rew_mean      | 39.7     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 720      |
|    fps              | 396      |
|    time_elapsed     | 200      |
|    total_timesteps  | 79633    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.27     |
|    n_updates        | 19658    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 30.1     |
|    ep_rew_mean      | 30.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 724      |
|    fps              | 396      |
|    time_elapsed     | 201      |
|    total_timesteps  | 79693    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.57     |
|    n_updates        | 19673    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 27.5     |
|    ep_rew_mean      | 27.5     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 728      |
|    fps              | 396      |
|    time_elapsed     | 201      |
|    total_timesteps  | 79807    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0486   |
|    n_updates        | 19701    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 26.9     |
|    ep_rew_mean      | 26.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 732      |
|    fps              | 396      |
|    time_elapsed     | 201      |
|    total_timesteps  | 79866    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0306   |
|    n_updates        | 19716    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 23.7     |
|    ep_rew_mean      | 23.7     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 736      |
|    fps              | 396      |
|    time_elapsed     | 202      |
|    total_timesteps  | 80067    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0452   |
|    n_updates        | 19766    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 24.2     |
|    ep_rew_mean      | 24.2     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 740      |
|    fps              | 396      |
|    time_elapsed     | 202      |
|    total_timesteps  | 80296    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.045    |
|    n_updates        | 19823    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 25.1     |
|    ep_rew_mean      | 25.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 744      |
|    fps              | 396      |
|    time_elapsed     | 203      |
|    total_timesteps  | 80675    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0349   |
|    n_updates        | 19918    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 28.2     |
|    ep_rew_mean      | 28.2     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 748      |
|    fps              | 395      |
|    time_elapsed     | 204      |
|    total_timesteps  | 81059    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.648    |
|    n_updates        | 20014    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 31.1     |
|    ep_rew_mean      | 31.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 752      |
|    fps              | 395      |
|    time_elapsed     | 205      |
|    total_timesteps  | 81451    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0511   |
|    n_updates        | 20112    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 32.3     |
|    ep_rew_mean      | 32.3     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 756      |
|    fps              | 395      |
|    time_elapsed     | 206      |
|    total_timesteps  | 81843    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.067    |
|    n_updates        | 20210    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 35.7     |
|    ep_rew_mean      | 35.7     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 760      |
|    fps              | 393      |
|    time_elapsed     | 208      |
|    total_timesteps  | 82255    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.72     |
|    n_updates        | 20313    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 39.3     |
|    ep_rew_mean      | 39.3     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 764      |
|    fps              | 395      |
|    time_elapsed     | 209      |
|    total_timesteps  | 82685    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0807   |
|    n_updates        | 20421    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 42.9     |
|    ep_rew_mean      | 42.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 768      |
|    fps              | 396      |
|    time_elapsed     | 209      |
|    total_timesteps  | 83115    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0892   |
|    n_updates        | 20528    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 46.6     |
|    ep_rew_mean      | 46.6     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 772      |
|    fps              | 397      |
|    time_elapsed     | 210      |
|    total_timesteps  | 83541    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.965    |
|    n_updates        | 20635    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 50.3     |
|    ep_rew_mean      | 50.3     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 776      |
|    fps              | 397      |
|    time_elapsed     | 211      |
|    total_timesteps  | 83980    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.106    |
|    n_updates        | 20744    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 54.3     |
|    ep_rew_mean      | 54.3     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 780      |
|    fps              | 398      |
|    time_elapsed     | 211      |
|    total_timesteps  | 84441    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.108    |
|    n_updates        | 20860    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 58.3     |
|    ep_rew_mean      | 58.3     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 784      |
|    fps              | 398      |
|    time_elapsed     | 213      |
|    total_timesteps  | 84905    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.69     |
|    n_updates        | 20976    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 62.3     |
|    ep_rew_mean      | 62.3     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 788      |
|    fps              | 397      |
|    time_elapsed     | 214      |
|    total_timesteps  | 85362    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.03     |
|    n_updates        | 21090    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 66.1     |
|    ep_rew_mean      | 66.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 792      |
|    fps              | 397      |
|    time_elapsed     | 215      |
|    total_timesteps  | 85808    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.03     |
|    n_updates        | 21201    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 70.1     |
|    ep_rew_mean      | 70.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 796      |
|    fps              | 399      |
|    time_elapsed     | 216      |
|    total_timesteps  | 86266    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.104    |
|    n_updates        | 21316    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 73.8     |
|    ep_rew_mean      | 73.8     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 800      |
|    fps              | 399      |
|    time_elapsed     | 216      |
|    total_timesteps  | 86704    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0426   |
|    n_updates        | 21425    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 77.5     |
|    ep_rew_mean      | 77.5     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 804      |
|    fps              | 400      |
|    time_elapsed     | 217      |
|    total_timesteps  | 87137    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0568   |
|    n_updates        | 21534    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 80.2     |
|    ep_rew_mean      | 80.2     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 808      |
|    fps              | 401      |
|    time_elapsed     | 218      |
|    total_timesteps  | 87463    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0633   |
|    n_updates        | 21615    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 82.7     |
|    ep_rew_mean      | 82.7     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 812      |
|    fps              | 401      |
|    time_elapsed     | 218      |
|    total_timesteps  | 87777    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 3.72     |
|    n_updates        | 21694    |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 82.5     |
|    ep_rew_mean      | 82.5     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 816      |
|    fps              | 401      |
|    time_elapsed     | 218      |
|    total_timesteps  | 87820    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.071    |
|    n_updates      

----------------------------------
| rollout/            |          |
|    ep_len_mean      | 82.1     |
|    ep_rew_mean      | 82.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 824      |
|    fps              | 401      |
|    time_elapsed     | 218      |
|    total_timesteps  | 87903    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0644   |
|    n_updates        | 21725    |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 81.4     |
|    ep_rew_mean      | 81.4     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 828      |
|    fps              | 401      |
|    time_elapsed     | 219      |
|    total_timesteps  | 87946    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0846   |
|    n_updates      

----------------------------------
| rollout/            |          |
|    ep_len_mean      | 82.1     |
|    ep_rew_mean      | 82.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 832      |
|    fps              | 401      |
|    time_elapsed     | 219      |
|    total_timesteps  | 88079    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0593   |
|    n_updates        | 21769    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 80.5     |
|    ep_rew_mean      | 80.5     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 836      |
|    fps              | 401      |
|    time_elapsed     | 219      |
|    total_timesteps  | 88122    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 3.1      |
|    n_updates        | 21780    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 78.7     |
|    ep_rew_mean      | 78.7     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 840      |
|    fps              | 401      |
|    time_elapsed     | 219      |
|    total_timesteps  | 88167    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0547   |
|    n_updates        | 21791    |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 75.4     |
|    ep_rew_mean      | 75.4     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 844      |
|    fps              | 401      |
|    time_elapsed     | 219      |
|    total_timesteps  | 88212    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0456   |
|    n_updates      

----------------------------------
| rollout/            |          |
|    ep_len_mean      | 71.9     |
|    ep_rew_mean      | 71.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 848      |
|    fps              | 401      |
|    time_elapsed     | 219      |
|    total_timesteps  | 88252    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.61     |
|    n_updates        | 21812    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 68.5     |
|    ep_rew_mean      | 68.5     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 852      |
|    fps              | 401      |
|    time_elapsed     | 219      |
|    total_timesteps  | 88300    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0644   |
|    n_updates        | 21824    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 65.8     |
|    ep_rew_mean      | 65.8     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 856      |
|    fps              | 401      |
|    time_elapsed     | 220      |
|    total_timesteps  | 88427    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.13     |
|    n_updates        | 21856    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 63       |
|    ep_rew_mean      | 63       |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 860      |
|    fps              | 401      |
|    time_elapsed     | 220      |
|    total_timesteps  | 88554    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0442   |
|    n_updates        | 21888    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 60       |
|    ep_rew_mean      | 60       |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 864      |
|    fps              | 401      |
|    time_elapsed     | 220      |
|    total_timesteps  | 88680    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0546   |
|    n_updates        | 21919    |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 56.1     |
|    ep_rew_mean      | 56.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 868      |
|    fps              | 401      |
|    time_elapsed     | 220      |
|    total_timesteps  | 88726    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0809   |
|    n_updates      

----------------------------------
| rollout/            |          |
|    ep_len_mean      | 53.2     |
|    ep_rew_mean      | 53.2     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 872      |
|    fps              | 401      |
|    time_elapsed     | 221      |
|    total_timesteps  | 88859    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0727   |
|    n_updates        | 21964    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 52.7     |
|    ep_rew_mean      | 52.7     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 876      |
|    fps              | 403      |
|    time_elapsed     | 221      |
|    total_timesteps  | 89253    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.155    |
|    n_updates        | 22063    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 52.4     |
|    ep_rew_mean      | 52.4     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 880      |
|    fps              | 401      |
|    time_elapsed     | 223      |
|    total_timesteps  | 89683    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0688   |
|    n_updates        | 22170    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 52.5     |
|    ep_rew_mean      | 52.5     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 884      |
|    fps              | 401      |
|    time_elapsed     | 224      |
|    total_timesteps  | 90160    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.075    |
|    n_updates        | 22289    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 52.7     |
|    ep_rew_mean      | 52.7     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 888      |
|    fps              | 401      |
|    time_elapsed     | 225      |
|    total_timesteps  | 90630    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.15     |
|    n_updates        | 22407    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 53       |
|    ep_rew_mean      | 53       |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 892      |
|    fps              | 400      |
|    time_elapsed     | 227      |
|    total_timesteps  | 91106    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0719   |
|    n_updates        | 22526    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 52.9     |
|    ep_rew_mean      | 52.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 896      |
|    fps              | 400      |
|    time_elapsed     | 228      |
|    total_timesteps  | 91553    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0689   |
|    n_updates        | 22638    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 52.7     |
|    ep_rew_mean      | 52.7     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 900      |
|    fps              | 400      |
|    time_elapsed     | 229      |
|    total_timesteps  | 91971    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 3        |
|    n_updates        | 22742    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 50.6     |
|    ep_rew_mean      | 50.6     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 904      |
|    fps              | 401      |
|    time_elapsed     | 229      |
|    total_timesteps  | 92200    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0415   |
|    n_updates        | 22799    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 49.6     |
|    ep_rew_mean      | 49.6     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 908      |
|    fps              | 400      |
|    time_elapsed     | 230      |
|    total_timesteps  | 92427    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0486   |
|    n_updates        | 22856    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 47.8     |
|    ep_rew_mean      | 47.8     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 912      |
|    fps              | 401      |
|    time_elapsed     | 230      |
|    total_timesteps  | 92558    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0404   |
|    n_updates        | 22889    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 49.5     |
|    ep_rew_mean      | 49.5     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 916      |
|    fps              | 400      |
|    time_elapsed     | 231      |
|    total_timesteps  | 92770    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.101    |
|    n_updates        | 22942    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 52.1     |
|    ep_rew_mean      | 52.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 920      |
|    fps              | 399      |
|    time_elapsed     | 233      |
|    total_timesteps  | 93072    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.135    |
|    n_updates        | 23017    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 53.8     |
|    ep_rew_mean      | 53.8     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 924      |
|    fps              | 399      |
|    time_elapsed     | 233      |
|    total_timesteps  | 93283    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0521   |
|    n_updates        | 23070    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 53.8     |
|    ep_rew_mean      | 53.8     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 928      |
|    fps              | 399      |
|    time_elapsed     | 233      |
|    total_timesteps  | 93328    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.16     |
|    n_updates        | 23081    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 54.7     |
|    ep_rew_mean      | 54.7     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 932      |
|    fps              | 400      |
|    time_elapsed     | 233      |
|    total_timesteps  | 93546    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0475   |
|    n_updates        | 23136    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 56.4     |
|    ep_rew_mean      | 56.4     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 936      |
|    fps              | 400      |
|    time_elapsed     | 234      |
|    total_timesteps  | 93763    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.37     |
|    n_updates        | 23190    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 58.2     |
|    ep_rew_mean      | 58.2     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 940      |
|    fps              | 400      |
|    time_elapsed     | 234      |
|    total_timesteps  | 93987    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.19     |
|    n_updates        | 23246    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 61       |
|    ep_rew_mean      | 61       |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 944      |
|    fps              | 400      |
|    time_elapsed     | 235      |
|    total_timesteps  | 94310    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.06     |
|    n_updates        | 23327    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 64.9     |
|    ep_rew_mean      | 64.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 948      |
|    fps              | 401      |
|    time_elapsed     | 236      |
|    total_timesteps  | 94744    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0559   |
|    n_updates        | 23435    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 69       |
|    ep_rew_mean      | 69       |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 952      |
|    fps              | 401      |
|    time_elapsed     | 236      |
|    total_timesteps  | 95202    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.047    |
|    n_updates        | 23550    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 72.4     |
|    ep_rew_mean      | 72.4     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 956      |
|    fps              | 402      |
|    time_elapsed     | 237      |
|    total_timesteps  | 95670    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 6.47     |
|    n_updates        | 23667    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 75.9     |
|    ep_rew_mean      | 75.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 960      |
|    fps              | 402      |
|    time_elapsed     | 238      |
|    total_timesteps  | 96142    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 6.57     |
|    n_updates        | 23785    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 77.9     |
|    ep_rew_mean      | 77.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 964      |
|    fps              | 403      |
|    time_elapsed     | 239      |
|    total_timesteps  | 96469    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 1.17     |
|    n_updates        | 23867    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 79.6     |
|    ep_rew_mean      | 79.6     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 968      |
|    fps              | 402      |
|    time_elapsed     | 239      |
|    total_timesteps  | 96685    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0562   |
|    n_updates        | 23921    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 82.6     |
|    ep_rew_mean      | 82.6     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 972      |
|    fps              | 402      |
|    time_elapsed     | 241      |
|    total_timesteps  | 97123    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0665   |
|    n_updates        | 24030    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 82.2     |
|    ep_rew_mean      | 82.2     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 976      |
|    fps              | 402      |
|    time_elapsed     | 242      |
|    total_timesteps  | 97476    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0523   |
|    n_updates        | 24118    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 79.5     |
|    ep_rew_mean      | 79.5     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 980      |
|    fps              | 402      |
|    time_elapsed     | 242      |
|    total_timesteps  | 97632    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 5.54     |
|    n_updates        | 24157    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 76.3     |
|    ep_rew_mean      | 76.3     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 984      |
|    fps              | 402      |
|    time_elapsed     | 242      |
|    total_timesteps  | 97788    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0291   |
|    n_updates        | 24196    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 73.1     |
|    ep_rew_mean      | 73.1     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 988      |
|    fps              | 402      |
|    time_elapsed     | 243      |
|    total_timesteps  | 97944    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0477   |
|    n_updates        | 24235    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 69.7     |
|    ep_rew_mean      | 69.7     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 992      |
|    fps              | 403      |
|    time_elapsed     | 243      |
|    total_timesteps  | 98075    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.17     |
|    n_updates        | 24268    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 66.7     |
|    ep_rew_mean      | 66.7     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 996      |
|    fps              | 403      |
|    time_elapsed     | 243      |
|    total_timesteps  | 98221    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0266   |
|    n_updates        | 24305    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 63.8     |
|    ep_rew_mean      | 63.8     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 1000     |
|    fps              | 403      |
|    time_elapsed     | 243      |
|    total_timesteps  | 98352    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.19     |
|    n_updates        | 24337    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 62.8     |
|    ep_rew_mean      | 62.8     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 1004     |
|    fps              | 403      |
|    time_elapsed     | 244      |
|    total_timesteps  | 98483    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0211   |
|    n_updates        | 24370    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 62       |
|    ep_rew_mean      | 62       |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 1008     |
|    fps              | 402      |
|    time_elapsed     | 244      |
|    total_timesteps  | 98628    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0272   |
|    n_updates        | 24406    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 62.9     |
|    ep_rew_mean      | 62.9     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 1012     |
|    fps              | 402      |
|    time_elapsed     | 245      |
|    total_timesteps  | 98851    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.04     |
|    n_updates        | 24462    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 64.5     |
|    ep_rew_mean      | 64.5     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 1016     |
|    fps              | 402      |
|    time_elapsed     | 246      |
|    total_timesteps  | 99218    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 2.19     |
|    n_updates        | 24554    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 63.4     |
|    ep_rew_mean      | 63.4     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 1020     |
|    fps              | 402      |
|    time_elapsed     | 247      |
|    total_timesteps  | 99408    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0351   |
|    n_updates        | 24601    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 63.8     |
|    ep_rew_mean      | 63.8     |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 1024     |
|    fps              | 402      |
|    time_elapsed     | 247      |
|    total_timesteps  | 99662    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0571   |
|    n_updates        | 24665    |
----------------------------------


----------------------------------
| rollout/            |          |
|    ep_len_mean      | 66       |
|    ep_rew_mean      | 66       |
|    exploration_rate | 0.02     |
| time/               |          |
|    episodes         | 1028     |
|    fps              | 402      |
|    time_elapsed     | 248      |
|    total_timesteps  | 99929    |
| train/              |          |
|    learning_rate    | 0.001    |
|    loss             | 0.0913   |
|    n_updates        | 24732    |
----------------------------------



=== Trained DQN Agent Performance ===
Mean reward: 42.14 ± 3.44

Comparison:
  Random Agent: 21.69 ± 10.88
  Trained DQN:  42.14 ± 3.44
  Improvement:  94.3%


### Stap 4: Visualisatie van de Getrainde Agent

Laten we nu kijken hoe de getrainde agent presteert door een episode te visualiseren.

In [10]:
# Visualize trained agent
def visualize_agent(model, env, n_steps=500):
    """
    Run agent in environment and collect frames for visualization.

    Args:
        model: Trained RL model
        env: Gymnasium environment
        n_steps: Maximum number of steps

    Returns
    -------
        frames, rewards, actions
    """
    frames = []
    rewards_list = []
    actions_list = []

    state, info = env.reset(seed=42)
    frames.append(env.render())

    for _ in range(n_steps):
        # Get action from trained policy (deterministic)
        action, _states = model.predict(state, deterministic=True)
        actions_list.append(int(action))

        # Take action in environment
        state, reward, done, truncated, info = env.step(action)
        rewards_list.append(reward)
        frames.append(env.render())

        if done or truncated:
            break

    return frames, rewards_list, actions_list


# Create environment with rendering
env_render = gym.make("CartPole-v1", render_mode="rgb_array")
frames, rewards_list, actions_list = visualize_agent(model, env_render, n_steps=500)
env_render.close()

print(f"\nEpisode lasted {len(rewards_list)} steps")
print(f"Total reward: {sum(rewards_list):.0f}")
print(f"Action distribution: LEFT={actions_list.count(0)}, RIGHT={actions_list.count(1)}")

# If performance is poor, remind user to retrain
if sum(rewards_list) < 100:
    print("\n⚠️  Poor performance detected!")
    print("Make sure you've run the training cell (cell 10) with the updated hyperparameters.")
    print("The model should achieve close to 500 steps after proper training.")



Episode lasted 39 steps
Total reward: 39
Action distribution: LEFT=17, RIGHT=22

⚠️  Poor performance detected!
Make sure you've run the training cell (cell 10) with the updated hyperparameters.
The model should achieve close to 500 steps after proper training.


Authorization required, but no authorization protocol specified



### Analyse: Hoe Werkt DQN?

DQN leert een **Q-function** die voor elke state-action combinatie schat wat de verwachte return is. Laten we dit visualiseren.

In [None]:
# Analyze Q-values for different states
def analyze_q_values(model, env, n_samples=100):
    """
    Sample random states and analyze Q-values.

    Args:
        model: Trained DQN model
        env: Gymnasium environment
        n_samples: Number of states to sample

    Returns:
        states, q_values, actions
    """
    states = []
    q_values_left = []
    q_values_right = []
    chosen_actions = []

    for _ in range(n_samples):
        state, _ = env.reset()
        states.append(state)

        # Get Q-values for both actions
        with torch.no_grad():
            q_values = model.q_net(torch.FloatTensor(state).unsqueeze(0))
            q_values_left.append(q_values[0, 0].item())
            q_values_right.append(q_values[0, 1].item())
            chosen_actions.append(torch.argmax(q_values).item())

    return np.array(states), q_values_left, q_values_right, chosen_actions


# Analyze Q-values
states, q_left, q_right, actions = analyze_q_values(model, env, n_samples=200)

print("\n=== Q-Value Analysis ===")
print(f"Average Q-value for LEFT: {np.mean(q_left):.2f}")
print(f"Average Q-value for RIGHT: {np.mean(q_right):.2f}")
print(f"Q-value range: [{min(q_left + q_right):.2f}, {max(q_left + q_right):.2f}]")

# Q-values vs Cart Position
df_pos = pd.DataFrame(
    {
        "Cart Position": np.concatenate([states[:, 0], states[:, 0]]),
        "Q-value": q_left + q_right,
        "Action": ["LEFT"] * len(q_left) + ["RIGHT"] * len(q_right),
    }
)
px.scatter(
    df_pos,
    x="Cart Position",
    y="Q-value",
    color="Action",
    title="Q-values vs Cart Position",
    color_discrete_map={"LEFT": "blue", "RIGHT": "red"},
).show()

# Q-values vs Pole Angle
df_angle = pd.DataFrame(
    {
        "Pole Angle (radians)": np.concatenate([states[:, 2], states[:, 2]]),
        "Q-value": q_left + q_right,
        "Action": ["LEFT"] * len(q_left) + ["RIGHT"] * len(q_right),
    }
)
px.scatter(
    df_angle,
    x="Pole Angle (radians)",
    y="Q-value",
    color="Action",
    title="Q-values vs Pole Angle",
    color_discrete_map={"LEFT": "blue", "RIGHT": "red"},
).show()

# Action Preference vs Pole Angle
q_diff = np.array(q_right) - np.array(q_left)
df_diff = pd.DataFrame(
    {
        "Pole Angle (radians)": states[:, 2],
        "Q(RIGHT) - Q(LEFT)": q_diff,
        "Chosen Action": ["LEFT" if a == 0 else "RIGHT" for a in actions],
    }
)
px.scatter(
    df_diff,
    x="Pole Angle (radians)",
    y="Q(RIGHT) - Q(LEFT)",
    color="Chosen Action",
    title="Action Preference vs Pole Angle",
    color_discrete_map={"LEFT": "blue", "RIGHT": "red"},
).show()

# Action Distribution
action_counts = pd.DataFrame(
    {"Action": ["LEFT", "RIGHT"], "Frequency": [actions.count(0), actions.count(1)]}
)
px.bar(
    action_counts,
    x="Action",
    y="Frequency",
    title="Action Distribution in Sampled States",
    color="Action",
    color_discrete_map={"LEFT": "blue", "RIGHT": "red"},
).show()


## Vergelijking met Andere RL Algoritmes

Laten we nu ook **PPO (Proximal Policy Optimization)** trainen - een policy-based methode - om te zien hoe verschillende RL algoritmes presteren op hetzelfde probleem.

In [None]:
# Import PPO
from stable_baselines3 import PPO

# Create fresh environment
env_ppo = gym.make("CartPole-v1")

# Create PPO model
# PPO learns a policy π(a|s) directly, not Q-values
model_ppo = PPO(
    "MlpPolicy",
    env_ppo,
    learning_rate=3e-4,
    n_steps=2048,
    batch_size=64,
    n_epochs=10,
    gamma=0.99,
    gae_lambda=0.95,
    clip_range=0.2,
    verbose=0,
)

print("=== Training PPO ===")
print("Training for 50,000 timesteps...")

# Train PPO
model_ppo.learn(total_timesteps=50000, progress_bar=True)

print("\n✓ PPO Training completed!")
print("\nEvaluating PPO model...")

# Evaluate PPO (wrap env with Monitor to avoid warning)
eval_env_ppo = Monitor(gym.make("CartPole-v1"))
mean_reward_ppo, std_reward_ppo = evaluate_policy(
    model_ppo, eval_env_ppo, n_eval_episodes=100, deterministic=True
)
eval_env_ppo.close()

print(f"\n=== Algorithm Comparison ===")
print(f"Random Agent: {np.mean(random_rewards):.2f} ± {np.std(random_rewards):.2f}")
print(f"DQN Agent:    {mean_reward:.2f} ± {std_reward:.2f}")
print(f"PPO Agent:    {mean_reward_ppo:.2f} ± {std_reward_ppo:.2f}")

# Visualize comparison
df_comparison = pd.DataFrame(
    {
        "Algorithm": ["Random", "DQN", "PPO"],
        "Mean Reward": [np.mean(random_rewards), mean_reward, mean_reward_ppo],
        "Std": [np.std(random_rewards), std_reward, std_reward_ppo],
    }
)
fig = px.bar(
    df_comparison,
    x="Algorithm",
    y="Mean Reward",
    error_y="Std",
    title="Algorithm Performance Comparison on CartPole-v1",
    color="Algorithm",
    color_discrete_map={"Random": "gray", "DQN": "blue", "PPO": "green"},
    text=[f"{m:.1f}±{s:.1f}" for m, s in zip(df_comparison["Mean Reward"], df_comparison["Std"])],
)
fig.add_hline(y=500, line_dash="dash", line_color="red", annotation_text="Maximum possible (500)")
fig.update_layout(yaxis_range=[0, 550])
fig.show()

## Meer Complexe RL Omgeving: LunarLander

Laten we nu een complexer probleem bekijken: **LunarLander-v2**. Hier moet een maanlander veilig landen op een landingsplatform.

![LunarLander](https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/lunarLander.gif)

*Bron: [Hugging Face Deep RL Course](https://huggingface.co/learn/deep-rl-course)*

### Het Probleem
- **State**: 8 continue waarden (positie, snelheid, hoek, hoeksnelheid, been-contact)
- **Actions**: 4 discrete acties (niets, linker motor, hoofd motor, rechter motor)
- **Rewards**: 
  - +100 tot +140 voor succesvolle landing
  - -100 voor crash
  - Kleine negatieve rewards voor brandstofverbruik
  - Positieve rewards voor dichter bij landingszone
- **Doel**: Land veilig met minimaal brandstofverbruik


In [None]:
# Create LunarLander environment
env_lunar = gym.make("LunarLander-v2")

# Explore the environment
state, info = env_lunar.reset(seed=42)

print("=== LunarLander-v2 Environment ===")
print(f"State space: {env_lunar.observation_space}")
print(f"Action space: {env_lunar.action_space}")
print(f"\nInitial state shape: {state.shape}")
print(f"State: {state}")
print("\nState components:")
print("  [0] X position")
print("  [1] Y position")
print("  [2] X velocity")
print("  [3] Y velocity")
print("  [4] Angle")
print("  [5] Angular velocity")
print("  [6] Left leg contact (0=no, 1=yes)")
print("  [7] Right leg contact (0=no, 1=yes)")
print("\nActions:")
print("  0: Do nothing")
print("  1: Fire left engine")
print("  2: Fire main engine")
print("  3: Fire right engine")

# Test random agent on LunarLander
print("\n=== Testing Random Agent ===")
random_rewards_lunar = evaluate_random_agent(env_lunar, n_episodes=20, seed=42)
print(f"Random Agent: {np.mean(random_rewards_lunar):.2f} ± {np.std(random_rewards_lunar):.2f}")
print("(Note: Negative rewards mean crashes!)")


### Training PPO op LunarLander

Voor dit complexere probleem gebruiken we PPO, dat goed presteert op continue control taken.

In [None]:
# Train PPO on LunarLander
print("=== Training PPO on LunarLander ===")
print("This will take a few minutes...")

model_lunar = PPO(
    "MlpPolicy",
    env_lunar,
    learning_rate=3e-4,
    n_steps=1024,
    batch_size=64,
    n_epochs=4,
    gamma=0.999,
    gae_lambda=0.98,
    clip_range=0.2,
    verbose=0,
)

# Train for longer since this is more complex
model_lunar.learn(total_timesteps=500000, progress_bar=True)

print("\n✓ Training completed!")

# Evaluate (wrap env with Monitor to avoid warning)
eval_env_lunar = Monitor(gym.make("LunarLander-v2"))
mean_reward_lunar, std_reward_lunar = evaluate_policy(
    model_lunar, eval_env_lunar, n_eval_episodes=50, deterministic=True
)
eval_env_lunar.close()

print(f"\n=== LunarLander Results ===")
print(f"Random Agent: {np.mean(random_rewards_lunar):.2f} ± {np.std(random_rewards_lunar):.2f}")
print(f"Trained PPO:  {mean_reward_lunar:.2f} ± {std_reward_lunar:.2f}")
print(f"\nNote: Score > 200 is considered solved!")
status = "SOLVED ✓" if mean_reward_lunar > 200 else "Needs more training"
print(f"Status: {status}")

# Visualize performance
df_lunar = pd.DataFrame(
    {
        "Algorithm": ["Random", "PPO"],
        "Mean Reward": [np.mean(random_rewards_lunar), mean_reward_lunar],
        "Std": [np.std(random_rewards_lunar), std_reward_lunar],
    }
)
fig = px.bar(
    df_lunar,
    x="Algorithm",
    y="Mean Reward",
    error_y="Std",
    title="LunarLander-v2 Performance",
    color="Algorithm",
    color_discrete_map={"Random": "gray", "PPO": "green"},
)
fig.add_hline(y=200, line_dash="dash", line_color="red", annotation_text="Solved threshold (200)")
fig.add_hline(y=0, line_color="black")
fig.show()

### Visualisatie van Getrainde LunarLander Agent

In [None]:
# Visualize trained LunarLander agent
env_lunar_render = gym.make("LunarLander-v2", render_mode="rgb_array")
frames_lunar, rewards_lunar, actions_lunar = visualize_agent(
    model_lunar, env_lunar_render, n_steps=500
)
env_lunar_render.close()

print(f"\n=== Episode Analysis ===")
print(f"Episode length: {len(rewards_lunar)} steps")
print(f"Total reward: {sum(rewards_lunar):.1f}")
print(
    f"Final outcome: {'SUCCESS ✓' if sum(rewards_lunar) > 200 else 'CRASH' if sum(rewards_lunar) < 0 else 'PARTIAL'}"
)
print(f"\nAction usage:")
action_names = ["Do nothing", "Left engine", "Main engine", "Right engine"]
for action_id, action_name in enumerate(action_names):
    count = actions_lunar.count(action_id)
    percentage = (count / len(actions_lunar)) * 100
    print(f"  {action_name}: {count} times ({percentage:.1f}%)")

# Show action distribution
df_actions = pd.DataFrame(
    {"Action": action_names, "Count": [actions_lunar.count(i) for i in range(4)]}
)
px.bar(
    df_actions,
    x="Action",
    y="Count",
    title=f"LunarLander Action Distribution (Total Reward: {sum(rewards_lunar):.1f})",
).show()
