## Reinforcement Learning Project: Highway-Env

Dieses Projekt ist entstanden in Anlehnung an die Implementierung von [Edouard Leurent](https://github.com/Farama-Foundation/HighwayEnv/blob/master/scripts/sb3_highway_dqn.ipynb).

### 1. Importing Libraries

In [1]:
# Vollständiger Import
import highway_env
import pathlib
# Alisas-Import
import gymnasium as gym
# Spezifischer Import
from stable_baselines3 import DQN
from tqdm.notebook import trange

In [2]:
%load_ext tensorboard

### 2. Defining of Constants and Paths

In [2]:
# Definition der Konstanten
BASE_MODEL_LIB = "stable_baselines3"
MODEL_TYPE = "DQN"
MODEL_VERSION = "v2"
MODEL_POLICY = "MlpPolicy"
ENVIRONMENT_NAME = "highway-v0"
RENDER_MODE = "rgb_array"

In [3]:
# Definition der Pfade
CURRENT_FILE_DIR_PATH = pathlib.Path().parent.absolute()
REPOSITORY_PATH = CURRENT_FILE_DIR_PATH.parent.absolute()
LOG_DIR_PATH = REPOSITORY_PATH.joinpath("logs", BASE_MODEL_LIB, MODEL_TYPE)
VIDEOS_DIR_PATH = REPOSITORY_PATH.joinpath("videos", BASE_MODEL_LIB, MODEL_TYPE)
MODELS_DIR_PATH = REPOSITORY_PATH.joinpath("highway_models", MODEL_TYPE, BASE_MODEL_LIB, MODEL_VERSION)

### 3. Defining of Functions

In [24]:
def train_dqn_model(
        model_policy: str,
        environment: gym.Env,
        tb_log_dir_path: str = LOG_DIR_PATH,
        model_dir_path: pathlib.Path = MODELS_DIR_PATH,
        verbose: int = 1,
        training_steps: int = 10000
) -> DQN:
    """
    Trainiert ein DQN-Modell mit der Policy MODEL_POLICY auf der Umgebung ENVIRONMENT_NAME.

    :param model_policy: Die Policy, die für das Modell verwendet werden soll.
    :param environment: Der Name der Umgebung, auf der das Modell trainiert werden soll.
    :param tb_log_dir_path: Der Pfad, in dem die Tensorboard-Logs gespeichert werden sollen.
    :param model_dir_path: Der Pfad, in dem das Modell gespeichert werden soll.
    :param verbose: Die Detailstufe der Ausgabe.
    :param training_steps: Die Anzahl der Trainingsschritte.

    :return: Das trainierte Modell.
    """
    model = DQN(model_policy,
                environment,
                policy_kwargs=dict(net_arch=[256, 256]),
                learning_rate=5e-4,
                buffer_size=15000,
                learning_starts=200,
                batch_size=32,
                gamma=0.8,
                train_freq=1,
                gradient_steps=1,
                target_update_interval=50,
                exploration_fraction=0.7,
                verbose=verbose,
                tensorboard_log=tb_log_dir_path)
    # Training des Modells
    model.learn(training_steps)
    # Speichern des Modells
    model.save(model_dir_path.joinpath(f'model_{MODEL_TYPE}_{model_policy}_{MODEL_VERSION}_{training_steps}'))
    # Rückgabe des Modells
    return model

In [8]:
def record_env(env: gym.Env, dqn_model: DQN,  video_output_folder: pathlib.Path, training: int = 10000):
    """
    Erstellt ein Video der Umgebung.
    :param env: Umgebung, die aufgezeichnet werden soll.
    :param dqn_model: Modell, das die Aktionen vorgibt.
    :param video_output_folder: Pfad, in dem das Video gespeichert werden soll.
    :param training: Anzahl der Trainingsschritte.
    :return:
    """
    # Erstellen des Video-Recorders
    wrapped_env = gym.wrappers.RecordVideo(env, video_folder=str(video_output_folder.joinpath(f'video_{MODEL_TYPE}_{MODEL_POLICY}_{MODEL_VERSION}_{training}')), episode_trigger=lambda x: True)
    env.unwrapped.set_record_video_wrapper(wrapped_env)

    for episode in trange(3, desc='Test episodes'):
        obs, info = wrapped_env.reset()
        done = truncated = False
        while not (done or truncated):
            action = dqn_model.predict(obs, deterministic=True)[0]
            obs, reward, done, truncated, info = wrapped_env.step(action)
    wrapped_env.close()

### 4. Defining of Environment

In [5]:
env_train = gym.make(ENVIRONMENT_NAME)
env_eval = gym.make(ENVIRONMENT_NAME, render_mode=RENDER_MODE)

  logger.warn(
  logger.warn(


### 5. Tensorboard Logging

In [47]:
%tensorboard --logdir "./logs/stable_baselines3/DQN"

Launching TensorBoard...

### 6. Training of Model

In [20]:
# Beispieltraining mit 100 Schritten
# model_100 = train_dqn_model("MlpPolicy", env, training_steps=100)

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Logging to D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\logs\stable_baselines3\DQN\DQN_2
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 12       |
|    ep_rew_mean      | 8.8      |
|    exploration_rate | 0.349    |
| time/               |          |
|    episodes         | 4        |
|    fps              | 2        |
|    time_elapsed     | 20       |
|    total_timesteps  | 48       |
----------------------------------


In [10]:
# Definition der einzelnen Training schritte
train_step_list_v1 = [500, 1000, 2000, 4000, 8000, 16000]
train_step_list_v2 = [32000, 48000, 64000]

In [23]:
# Definition der Modell_Speicher
model_list = []

In [25]:
# Automatisiertes Training mit den definierten Training schritten (v1)
for train_steps in train_step_list_v1:
    model = train_dqn_model(MODEL_POLICY, env_train, training_steps=train_steps)
    model_list.append(model)

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Logging to D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\logs\stable_baselines3\DQN\DQN_9
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 7.25     |
|    ep_rew_mean      | 5.27     |
|    exploration_rate | 0.999    |
| time/               |          |
|    episodes         | 4        |
|    fps              | 2        |
|    time_elapsed     | 11       |
|    total_timesteps  | 29       |
----------------------------------
----------------------------------
| rollout/            |          |
|    ep_len_mean      | 9.5      |
|    ep_rew_mean      | 7.03     |
|    exploration_rate | 0.997    |
| time/               |          |
|    episodes         | 8        |
|    fps              | 2        |
|    time_elapsed     | 32       |
|    total_timesteps  | 76       |
----------------------------------
----------------------------------
| rollout/ 

KeyboardInterrupt: 

In [None]:
# Automatisiertes Training mit den definierten Training schritten (v2)
for train_steps in train_step_list_v2:
    model = train_dqn_model(MODEL_POLICY, env_train, training_steps=train_steps)
    model_list.append(model)
    # Abbruch bei 64000 Schritten, da die Trainingszeit zu lange dauert und die Ergebnisse nicht mehr signifikant besser wurden (Dauer: 8h)

### 7. Evaluation of Model
#### 7.1. Loading of Model

In [6]:
# Definition des Modells entsprechend der Anzahl der Trainingsschritte
training_steps: int = 32000
# Laden des Modells
model_dqn_sb3 = DQN.load(MODELS_DIR_PATH.joinpath(f'model_{MODEL_TYPE}_{MODEL_POLICY}_{MODEL_VERSION}_{training_steps}'))

#### 7.2. Rendering of Model

In [None]:
# Rendering des Modells
obs, info = env_eval.reset()
done = truncated = False
while not (done or truncated):
    action = model_dqn_sb3.predict(obs, deterministic=True)[0]
    obs, reward, done, truncated, info = env_eval.step(action)
    env_eval.render()

#### 7.3. Video of Model
##### Manuelle Erstellung eines Videos des Modells

In [9]:
# Erstellen eines Videos des Modells
record_env(env_eval, model_dqn_sb3, VIDEOS_DIR_PATH, training=training_steps)

  logger.warn(


Test episodes:   0%|          | 0/3 [00:00<?, ?it/s]

Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_32000\rl-video-episode-0.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_32000\rl-video-episode-0.mp4




t:   0%|          | 0/451 [00:00<?, ?it/s, now=None][A
t:   1%|          | 3/451 [00:00<00:16, 27.64it/s, now=None][A
t:   9%|▊         | 39/451 [00:00<00:01, 213.33it/s, now=None][A
t:  17%|█▋        | 75/451 [00:00<00:01, 277.85it/s, now=None][A
t:  25%|██▌       | 115/451 [00:00<00:01, 324.83it/s, now=None][A
t:  34%|███▍      | 155/451 [00:00<00:00, 349.67it/s, now=None][A
t:  44%|████▎     | 197/451 [00:00<00:00, 373.27it/s, now=None][A
t:  53%|█████▎    | 240/451 [00:00<00:00, 388.86it/s, now=None][A
t:  63%|██████▎   | 284/451 [00:00<00:00, 401.59it/s, now=None][A
t:  72%|███████▏  | 325/451 [00:00<00:00, 393.18it/s, now=None][A
t:  81%|████████  | 365/451 [00:01<00:00, 384.10it/s, now=None][A
t:  90%|████████▉ | 404/451 [00:01<00:00, 366.59it/s, now=None][A
t:  98%|█████████▊| 441/451 [00:01<00:00, 365.44it/s, now=None][A
                                                               [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_32000\rl-video-episode-0.mp4
Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_32000\rl-video-episode-1.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_32000\rl-video-episode-1.mp4




t:   0%|          | 0/421 [00:00<?, ?it/s, now=None][A
t:   0%|          | 2/421 [00:00<00:21, 19.82it/s, now=None][A
t:  11%|█         | 46/421 [00:00<00:01, 263.15it/s, now=None][A
t:  20%|█▉        | 83/421 [00:00<00:01, 309.19it/s, now=None][A
t:  28%|██▊       | 119/421 [00:00<00:00, 326.21it/s, now=None][A
t:  36%|███▌      | 152/421 [00:00<00:00, 321.77it/s, now=None][A
t:  44%|████▍     | 185/421 [00:00<00:00, 322.82it/s, now=None][A
t:  53%|█████▎    | 222/421 [00:00<00:00, 336.22it/s, now=None][A
t:  61%|██████▏   | 258/421 [00:00<00:00, 341.51it/s, now=None][A
t:  70%|██████▉   | 294/421 [00:00<00:00, 346.13it/s, now=None][A
t:  78%|███████▊  | 329/421 [00:01<00:00, 334.45it/s, now=None][A
t:  86%|████████▌ | 363/421 [00:01<00:00, 328.90it/s, now=None][A
t:  94%|█████████▍| 397/421 [00:01<00:00, 330.23it/s, now=None][A
                                                               [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_32000\rl-video-episode-1.mp4
Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_32000\rl-video-episode-2.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_32000\rl-video-episode-2.mp4




t:   0%|          | 0/512 [00:00<?, ?it/s, now=None][A
t:   0%|          | 2/512 [00:00<00:26, 19.40it/s, now=None][A
t:   9%|▉         | 48/512 [00:00<00:01, 250.73it/s, now=None][A
t:  18%|█▊        | 90/512 [00:00<00:01, 320.68it/s, now=None][A
t:  25%|██▍       | 127/512 [00:00<00:01, 338.72it/s, now=None][A
t:  32%|███▏      | 162/512 [00:00<00:01, 340.47it/s, now=None][A
t:  38%|███▊      | 196/512 [00:00<00:00, 333.71it/s, now=None][A
t:  45%|████▌     | 232/512 [00:00<00:00, 339.83it/s, now=None][A
t:  52%|█████▏    | 266/512 [00:00<00:00, 330.03it/s, now=None][A
t:  59%|█████▊    | 300/512 [00:00<00:00, 321.73it/s, now=None][A
t:  65%|██████▌   | 333/512 [00:01<00:00, 310.44it/s, now=None][A
t:  71%|███████▏  | 365/512 [00:01<00:00, 302.16it/s, now=None][A
t:  77%|███████▋  | 396/512 [00:01<00:00, 302.85it/s, now=None][A
t:  84%|████████▍ | 431/512 [00:01<00:00, 315.69it/s, now=None][A
t:  92%|█████████▏| 471/512 [00:01<00:00, 338.78it/s, now=None][A
t: 100%|██

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_32000\rl-video-episode-2.mp4


##### Automatisierte Erstellung von Videos der Modelle

In [11]:
# Definition der Modell_Speicher
loaded_model_list = {}

In [12]:
# Automatisiertes Laden der Modelle
for training_steps in train_step_list_v1:
    try:
        # Laden des Modells
        model_dqn_sb3 = DQN.load(MODELS_DIR_PATH.joinpath(f'model_{MODEL_TYPE}_{MODEL_POLICY}_{MODEL_VERSION}_{training_steps}'))
        loaded_model_list[training_steps] = model_dqn_sb3
    except:
        print(f"Das Modell mit {training_steps} Trainingsschritten konnte nicht geladen werden.")
loaded_model_list

{500: <stable_baselines3.dqn.dqn.DQN at 0x21c4a6faf70>,
 1000: <stable_baselines3.dqn.dqn.DQN at 0x21c4070e7f0>,
 2000: <stable_baselines3.dqn.dqn.DQN at 0x21c40985130>,
 4000: <stable_baselines3.dqn.dqn.DQN at 0x21c40985fa0>,
 8000: <stable_baselines3.dqn.dqn.DQN at 0x21c4a742580>,
 16000: <stable_baselines3.dqn.dqn.DQN at 0x21c4a758250>}

In [13]:
# Automatisiertes Erstellen der Videos pro Modell
for key, value in loaded_model_list.items():
    record_env(env_eval, value, VIDEOS_DIR_PATH, training=key)

  logger.warn(


Test episodes:   0%|          | 0/3 [00:00<?, ?it/s]

Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_500\rl-video-episode-0.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_500\rl-video-episode-0.mp4




t:   0%|          | 0/196 [00:00<?, ?it/s, now=None][A
t:   2%|▏         | 3/196 [00:00<00:07, 26.71it/s, now=None][A
t:  24%|██▍       | 48/196 [00:00<00:00, 261.32it/s, now=None][A
t:  44%|████▍     | 87/196 [00:00<00:00, 317.33it/s, now=None][A
t:  64%|██████▍   | 126/196 [00:00<00:00, 342.13it/s, now=None][A
t:  83%|████████▎ | 163/196 [00:00<00:00, 351.65it/s, now=None][A
                                                               [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_500\rl-video-episode-0.mp4
Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_500\rl-video-episode-1.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_500\rl-video-episode-1.mp4




t:   0%|          | 0/106 [00:00<?, ?it/s, now=None][A
t:   2%|▏         | 2/106 [00:00<00:05, 19.33it/s, now=None][A
t:  45%|████▌     | 48/106 [00:00<00:00, 252.70it/s, now=None][A
t:  89%|████████▊ | 94/106 [00:00<00:00, 340.54it/s, now=None][A
                                                              [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_500\rl-video-episode-1.mp4
Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_500\rl-video-episode-2.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_500\rl-video-episode-2.mp4




t:   0%|          | 0/106 [00:00<?, ?it/s, now=None][A
t:   3%|▎         | 3/106 [00:00<00:03, 28.44it/s, now=None][A
t:  45%|████▌     | 48/106 [00:00<00:00, 257.30it/s, now=None][A
t:  83%|████████▎ | 88/106 [00:00<00:00, 318.45it/s, now=None][A
                                                              [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_500\rl-video-episode-2.mp4


  logger.warn(


Test episodes:   0%|          | 0/3 [00:00<?, ?it/s]

Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_1000\rl-video-episode-0.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_1000\rl-video-episode-0.mp4




t:   0%|          | 0/241 [00:00<?, ?it/s, now=None][A
t:   1%|          | 2/241 [00:00<00:12, 19.09it/s, now=None][A
t:  14%|█▎        | 33/241 [00:00<00:01, 184.61it/s, now=None][A
t:  27%|██▋       | 64/241 [00:00<00:00, 239.08it/s, now=None][A
t:  41%|████      | 98/241 [00:00<00:00, 276.33it/s, now=None][A
t:  54%|█████▎    | 129/241 [00:00<00:00, 287.98it/s, now=None][A
t:  67%|██████▋   | 162/241 [00:00<00:00, 301.11it/s, now=None][A
t:  83%|████████▎ | 201/241 [00:00<00:00, 327.27it/s, now=None][A
t: 100%|█████████▉| 240/241 [00:00<00:00, 346.77it/s, now=None][A
                                                               [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_1000\rl-video-episode-0.mp4
Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_1000\rl-video-episode-1.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_1000\rl-video-episode-1.mp4




t:   0%|          | 0/166 [00:00<?, ?it/s, now=None][A
t:   1%|          | 2/166 [00:00<00:08, 18.64it/s, now=None][A
t:  26%|██▌       | 43/166 [00:00<00:00, 240.95it/s, now=None][A
t:  48%|████▊     | 79/166 [00:00<00:00, 289.05it/s, now=None][A
t:  68%|██████▊   | 113/166 [00:00<00:00, 305.65it/s, now=None][A
t:  88%|████████▊ | 146/166 [00:00<00:00, 311.52it/s, now=None][A
                                                               [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_1000\rl-video-episode-1.mp4
Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_1000\rl-video-episode-2.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_1000\rl-video-episode-2.mp4




t:   0%|          | 0/106 [00:00<?, ?it/s, now=None][A
t:   2%|▏         | 2/106 [00:00<00:06, 17.19it/s, now=None][A
t:  45%|████▌     | 48/106 [00:00<00:00, 230.17it/s, now=None][A
t:  79%|███████▉  | 84/106 [00:00<00:00, 283.05it/s, now=None][A
                                                              [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_1000\rl-video-episode-2.mp4


  logger.warn(


Test episodes:   0%|          | 0/3 [00:00<?, ?it/s]

Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_2000\rl-video-episode-0.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_2000\rl-video-episode-0.mp4




t:   0%|          | 0/77 [00:00<?, ?it/s, now=None][A
t:   3%|▎         | 2/77 [00:00<00:03, 18.79it/s, now=None][A
t:  52%|█████▏    | 40/77 [00:00<00:00, 225.07it/s, now=None][A
t:  99%|█████████▊| 76/77 [00:00<00:00, 285.04it/s, now=None][A
                                                             [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_2000\rl-video-episode-0.mp4
Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_2000\rl-video-episode-1.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_2000\rl-video-episode-1.mp4




t:   0%|          | 0/181 [00:00<?, ?it/s, now=None][A
t:   1%|          | 2/181 [00:00<00:09, 18.70it/s, now=None][A
t:  27%|██▋       | 48/181 [00:00<00:00, 250.88it/s, now=None][A
t:  48%|████▊     | 87/181 [00:00<00:00, 310.39it/s, now=None][A
t:  67%|██████▋   | 121/181 [00:00<00:00, 319.54it/s, now=None][A
t:  87%|████████▋ | 158/181 [00:00<00:00, 335.14it/s, now=None][A
                                                               [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_2000\rl-video-episode-1.mp4
Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_2000\rl-video-episode-2.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_2000\rl-video-episode-2.mp4




t:   0%|          | 0/286 [00:00<?, ?it/s, now=None][A
t:   1%|▏         | 4/286 [00:00<00:07, 39.29it/s, now=None][A
t:  19%|█▉        | 54/286 [00:00<00:00, 307.81it/s, now=None][A
t:  37%|███▋      | 105/286 [00:00<00:00, 396.71it/s, now=None][A
t:  56%|█████▌    | 159/286 [00:00<00:00, 449.73it/s, now=None][A
t:  74%|███████▍  | 212/286 [00:00<00:00, 476.13it/s, now=None][A
t:  91%|█████████▏| 261/286 [00:00<00:00, 478.30it/s, now=None][A
                                                               [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_2000\rl-video-episode-2.mp4


  logger.warn(


Test episodes:   0%|          | 0/3 [00:00<?, ?it/s]

Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_4000\rl-video-episode-0.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_4000\rl-video-episode-0.mp4




t:   0%|          | 0/316 [00:00<?, ?it/s, now=None][A
t:   3%|▎         | 8/316 [00:00<00:03, 79.56it/s, now=None][A
t:  17%|█▋        | 55/316 [00:00<00:00, 307.59it/s, now=None][A
t:  35%|███▍      | 110/316 [00:00<00:00, 415.16it/s, now=None][A
t:  53%|█████▎    | 167/316 [00:00<00:00, 474.42it/s, now=None][A
t:  68%|██████▊   | 215/316 [00:00<00:00, 471.01it/s, now=None][A
t:  85%|████████▌ | 270/316 [00:00<00:00, 497.19it/s, now=None][A
                                                               [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_4000\rl-video-episode-0.mp4
Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_4000\rl-video-episode-1.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_4000\rl-video-episode-1.mp4




t:   0%|          | 0/451 [00:00<?, ?it/s, now=None][A
t:   2%|▏         | 9/451 [00:00<00:05, 88.29it/s, now=None][A
t:  15%|█▍        | 67/451 [00:00<00:01, 373.12it/s, now=None][A
t:  27%|██▋       | 122/451 [00:00<00:00, 450.55it/s, now=None][A
t:  39%|███▉      | 177/451 [00:00<00:00, 486.46it/s, now=None][A
t:  52%|█████▏    | 233/451 [00:00<00:00, 511.41it/s, now=None][A
t:  63%|██████▎   | 286/451 [00:00<00:00, 516.32it/s, now=None][A
t:  75%|███████▍  | 338/451 [00:00<00:00, 494.69it/s, now=None][A
t:  86%|████████▋ | 389/451 [00:00<00:00, 498.51it/s, now=None][A
t:  98%|█████████▊| 442/451 [00:00<00:00, 508.04it/s, now=None][A
                                                               [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_4000\rl-video-episode-1.mp4
Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_4000\rl-video-episode-2.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_4000\rl-video-episode-2.mp4




t:   0%|          | 0/286 [00:00<?, ?it/s, now=None][A
t:   3%|▎         | 8/286 [00:00<00:03, 79.21it/s, now=None][A
t:  24%|██▍       | 68/286 [00:00<00:00, 379.35it/s, now=None][A
t:  40%|███▉      | 114/286 [00:00<00:00, 413.72it/s, now=None][A
t:  58%|█████▊    | 165/286 [00:00<00:00, 448.96it/s, now=None][A
t:  77%|███████▋  | 221/286 [00:00<00:00, 487.20it/s, now=None][A
t:  97%|█████████▋| 276/286 [00:00<00:00, 506.11it/s, now=None][A
                                                               [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_4000\rl-video-episode-2.mp4


  logger.warn(


Test episodes:   0%|          | 0/3 [00:00<?, ?it/s]

Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_8000\rl-video-episode-0.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_8000\rl-video-episode-0.mp4




t:   0%|          | 0/241 [00:00<?, ?it/s, now=None][A
t:   5%|▌         | 13/241 [00:00<00:01, 129.24it/s, now=None][A
t:  29%|██▉       | 70/241 [00:00<00:00, 385.66it/s, now=None][A
t:  54%|█████▍    | 131/241 [00:00<00:00, 486.00it/s, now=None][A
t:  77%|███████▋  | 186/241 [00:00<00:00, 510.34it/s, now=None][A
                                                               [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_8000\rl-video-episode-0.mp4
Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_8000\rl-video-episode-1.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_8000\rl-video-episode-1.mp4




t:   0%|          | 0/136 [00:00<?, ?it/s, now=None][A
t:  12%|█▏        | 16/136 [00:00<00:00, 159.42it/s, now=None][A
t:  53%|█████▎    | 72/136 [00:00<00:00, 394.01it/s, now=None][A
t:  98%|█████████▊| 133/136 [00:00<00:00, 491.83it/s, now=None][A
                                                               [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_8000\rl-video-episode-1.mp4
Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_8000\rl-video-episode-2.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_8000\rl-video-episode-2.mp4




t:   0%|          | 0/527 [00:00<?, ?it/s, now=None][A
t:   3%|▎         | 18/527 [00:00<00:02, 178.80it/s, now=None][A
t:  14%|█▍        | 75/527 [00:00<00:01, 406.69it/s, now=None][A
t:  23%|██▎       | 122/527 [00:00<00:00, 429.19it/s, now=None][A
t:  34%|███▍      | 181/527 [00:00<00:00, 490.65it/s, now=None][A
t:  45%|████▌     | 239/527 [00:00<00:00, 521.93it/s, now=None][A
t:  56%|█████▌    | 294/527 [00:00<00:00, 530.18it/s, now=None][A
t:  66%|██████▌   | 349/527 [00:00<00:00, 536.07it/s, now=None][A
t:  76%|███████▋  | 403/527 [00:00<00:00, 526.06it/s, now=None][A
t:  87%|████████▋ | 456/527 [00:00<00:00, 521.33it/s, now=None][A
t:  97%|█████████▋| 509/527 [00:01<00:00, 514.31it/s, now=None][A
                                                               [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_8000\rl-video-episode-2.mp4


  logger.warn(


Test episodes:   0%|          | 0/3 [00:00<?, ?it/s]

Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_16000\rl-video-episode-0.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_16000\rl-video-episode-0.mp4




t:   0%|          | 0/196 [00:00<?, ?it/s, now=None][A
t:   5%|▌         | 10/196 [00:00<00:01, 98.86it/s, now=None][A
t:  34%|███▎      | 66/196 [00:00<00:00, 368.63it/s, now=None][A
t:  65%|██████▌   | 128/196 [00:00<00:00, 479.80it/s, now=None][A
t:  92%|█████████▏| 180/196 [00:00<00:00, 494.57it/s, now=None][A
                                                               [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_16000\rl-video-episode-0.mp4
Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_16000\rl-video-episode-1.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_16000\rl-video-episode-1.mp4




t:   0%|          | 0/211 [00:00<?, ?it/s, now=None][A
t:   9%|▊         | 18/211 [00:00<00:01, 178.83it/s, now=None][A
t:  36%|███▌      | 76/211 [00:00<00:00, 412.45it/s, now=None][A
t:  63%|██████▎   | 133/211 [00:00<00:00, 479.98it/s, now=None][A
t:  89%|████████▊ | 187/211 [00:00<00:00, 503.41it/s, now=None][A
                                                               [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_16000\rl-video-episode-1.mp4
Moviepy - Building video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_16000\rl-video-episode-2.mp4.
Moviepy - Writing video D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_16000\rl-video-episode-2.mp4




t:   0%|          | 0/602 [00:00<?, ?it/s, now=None][A
t:   2%|▏         | 15/602 [00:00<00:03, 149.80it/s, now=None][A
t:  12%|█▏        | 71/602 [00:00<00:01, 388.80it/s, now=None][A
t:  21%|██▏       | 129/602 [00:00<00:01, 471.78it/s, now=None][A
t:  30%|██▉       | 179/602 [00:00<00:00, 482.11it/s, now=None][A
t:  38%|███▊      | 231/602 [00:00<00:00, 494.87it/s, now=None][A
t:  47%|████▋     | 281/602 [00:00<00:00, 493.83it/s, now=None][A
t:  56%|█████▌    | 336/602 [00:00<00:00, 509.73it/s, now=None][A
t:  65%|██████▍   | 389/602 [00:00<00:00, 515.01it/s, now=None][A
t:  73%|███████▎  | 441/602 [00:00<00:00, 512.49it/s, now=None][A
t:  82%|████████▏ | 496/602 [00:01<00:00, 523.09it/s, now=None][A
t:  91%|█████████ | 549/602 [00:01<00:00, 513.75it/s, now=None][A
                                                               [A

Moviepy - Done !
Moviepy - video ready D:\DHBW\JetBrains\Hand-on-Reinforced-Learning\videos\stable_baselines3\DQN\video_DQN_MlpPolicy_v2_16000\rl-video-episode-2.mp4
