# Visual evaluation

### Imports

## IMPORANT: This notebook ran in Colab and might not run on a local machine in this configuration!

DQN models can be downloaded at: https://drive.google.com/drive/folders/1lGXklN8rJmDaZjx3BCEQuaX0mqTSGc2y?usp=sharing
SAC models can be downloaded at: https://drive.google.com/drive/folders/1NfbXXmA2UX4tO_jsM420l2cu1Qn3wCRH?usp=sharing 

In [1]:
# Install environment and agent
!pip install highway-env
# TODO: we use the bleeding edge version because the current stable version does not support the latest gym>=0.21 versions. Revert back to stable at the next SB3 release.
#!pip install git+https://github.com/DLR-RM/stable-baselines3
!pip install stable_baselines3[extra]

# Environment
import gymnasium as gym
import highway_env

# Agent
from stable_baselines3 import DQN, SAC

# Visualization utils
import sys
from tqdm.notebook import trange
!pip install tensorboardx gym pyvirtualdisplay
!apt-get install -y xvfb ffmpeg
!git clone https://github.com/Farama-Foundation/HighwayEnv.git 2> /dev/null
sys.path.insert(0, '/content/HighwayEnv/scripts/')
from utils import record_videos, show_videos
!pip install moviepy

Collecting highway-env
  Downloading highway_env-1.8.2-py3-none-any.whl (104 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m104.0/104.0 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting gymnasium>=0.27 (from highway-env)
  Downloading gymnasium-0.29.1-py3-none-any.whl (953 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m953.9/953.9 kB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
Collecting farama-notifications>=0.0.1 (from gymnasium>=0.27->highway-env)
  Downloading Farama_Notifications-0.0.4-py3-none-any.whl (2.5 kB)
Installing collected packages: farama-notifications, gymnasium, highway-env
Successfully installed farama-notifications-0.0.4 gymnasium-0.29.1 highway-env-1.8.2
Collecting stable_baselines3[extra]
  Downloading stable_baselines3-2.3.2-py3-none-any.whl (182 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m182.3/182.3 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
Collecting shimmy[atari]~=1.3.0 (from st

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Loading and evaluating progress of trained models
Our trained models are compared visually in this notebook. We want to compare their final performance as well as analyze the progress of performance over training steps, which is why we have trained the same model configuration for both DQN and SAC for a different amount of timesteps to track the progress at the end. For DQN the progress tracking is actually only a side effect though, since the relative exploration_rate (as mentioned in the report) causes the train curves to be different for the different number of total steps, yielding interesting observations in terms of performance.

### DQN

In [4]:
config = {
    "observation": {
        "type": "GrayscaleObservation",
        "observation_shape": (128, 64),
        "stack_size": 4,
        "weights": [0.2989, 0.5870, 0.1140],  # weights for RGB conversion
        "scaling": 1.75,
    }
}

#### DQN 10 000

Loading model

In [5]:
model_path = "/content/drive/MyDrive/DQN_models/dqn_final_10000_steps_RL.zip"
model = DQN.load(model_path)

Exception: Can't get attribute '_function_setstate' on <module 'cloudpickle.cloudpickle' from '/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py'>
Exception: Can't get attribute '_function_setstate' on <module 'cloudpickle.cloudpickle' from '/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py'>


Creating env and visualizations

In [6]:
# Initializing the environment
env = gym.make('highway-fast-v0', render_mode='rgb_array')
env.configure(config)
env.config["show_trajectories"] = True
env = record_videos(env)

# Running the first test
(obs, info), done, truncated = env.reset(seed=200), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

# Running the second test on a different seed
(obs, info), done, truncated = env.reset(seed=22), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

env.close()
show_videos()

  and should_run_async(code)
  logger.warn(
  logger.warn(


Moviepy - Building video /content/videos/rl-video-episode-0.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-0.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-0.mp4
Moviepy - Building video /content/videos/rl-video-episode-1.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-1.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-1.mp4


#### DQN 20 000

In [7]:
model_path = "/content/drive/MyDrive/DQN_models/dqn_final_20000_steps_RL.zip"
model = DQN.load(model_path)

Exception: Can't get attribute '_function_setstate' on <module 'cloudpickle.cloudpickle' from '/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py'>
Exception: Can't get attribute '_function_setstate' on <module 'cloudpickle.cloudpickle' from '/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py'>


In [8]:
# Initializing the environment
env = gym.make('highway-fast-v0', render_mode='rgb_array')
env.configure(config)
env.config["show_trajectories"] = True
env = record_videos(env)

# Running the first test
(obs, info), done, truncated = env.reset(seed=200), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

# Running the second test on a different seed
(obs, info), done, truncated = env.reset(seed=22), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

env.close()
show_videos()

  logger.warn(
  logger.warn(
  logger.warn(


Moviepy - Building video /content/videos/rl-video-episode-0.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-0.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-0.mp4
Moviepy - Building video /content/videos/rl-video-episode-1.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-1.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-1.mp4


#### DQN 40 000

In [9]:
model_path = "/content/drive/MyDrive/DQN_models/dqn_final_40000_steps_RL.zip"
model = DQN.load(model_path)

  and should_run_async(code)
Exception: Can't get attribute '_function_setstate' on <module 'cloudpickle.cloudpickle' from '/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py'>
Exception: Can't get attribute '_function_setstate' on <module 'cloudpickle.cloudpickle' from '/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py'>


In [10]:
# Initializing the environment
env = gym.make('highway-fast-v0', render_mode='rgb_array')
env.configure(config)
env.config["show_trajectories"] = True
env = record_videos(env)

# Running the first test
(obs, info), done, truncated = env.reset(seed=200), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

# Running the second test on a different seed
(obs, info), done, truncated = env.reset(seed=22), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

env.close()
show_videos()

  logger.warn(
  logger.warn(
  logger.warn(


Moviepy - Building video /content/videos/rl-video-episode-0.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-0.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-0.mp4
Moviepy - Building video /content/videos/rl-video-episode-1.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-1.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-1.mp4


#### DQN 100 000

In [11]:
model_path = "/content/drive/MyDrive/DQN_models/dqn_final_100000_steps_RL.zip"
model = DQN.load(model_path)

Exception: Can't get attribute '_function_setstate' on <module 'cloudpickle.cloudpickle' from '/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py'>
Exception: Can't get attribute '_function_setstate' on <module 'cloudpickle.cloudpickle' from '/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py'>


In [12]:
# Initializing the environment
env = gym.make('highway-fast-v0', render_mode='rgb_array')
env.configure(config)
env.config["show_trajectories"] = True
env = record_videos(env)

# Running the first test
(obs, info), done, truncated = env.reset(seed=200), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

# Running the second test on a different seed
(obs, info), done, truncated = env.reset(seed=22), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

env.close()
show_videos()

  logger.warn(
  logger.warn(
  logger.warn(


Moviepy - Building video /content/videos/rl-video-episode-0.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-0.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-0.mp4
Moviepy - Building video /content/videos/rl-video-episode-1.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-1.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-1.mp4


#### DQN 200 000

In [13]:
model_path = "/content/drive/MyDrive/DQN_models/dqn_final_200000_steps_RL.zip"
model = DQN.load(model_path)

  and should_run_async(code)
Exception: Can't get attribute '_function_setstate' on <module 'cloudpickle.cloudpickle' from '/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py'>
Exception: Can't get attribute '_function_setstate' on <module 'cloudpickle.cloudpickle' from '/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py'>


In [14]:
# Initializing the environment
env = gym.make('highway-fast-v0', render_mode='rgb_array')
env.configure(config)
env.config["show_trajectories"] = True
env = record_videos(env)

# Running the first test
(obs, info), done, truncated = env.reset(seed=200), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

# Running the second test on a different seed
(obs, info), done, truncated = env.reset(seed=22), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

env.close()
show_videos()

  logger.warn(
  logger.warn(
  logger.warn(


Moviepy - Building video /content/videos/rl-video-episode-0.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-0.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-0.mp4
Moviepy - Building video /content/videos/rl-video-episode-1.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-1.mp4



                                                   

Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-1.mp4




### SAC
It is worth noting that even though the SAC has been trained on a configuration of the environment, where there are borders around the road that cause the episode to end if the agent hits them, the model is still evaluated on the normal open env, since the goal is for the agent to learn to stay on the road by this special measure applied during training.

In [15]:
config = {
    "action": {"type": 'ContinuousAction'},
    "observation": {
        "type": "GrayscaleObservation",
        "observation_shape": (128, 64),
        "stack_size": 4,
        "weights": [0.2989, 0.5870, 0.1140],  # weights for RGB conversion
        "scaling": 1.75,
    }
}

  and should_run_async(code)


#### SAC 10 000

Loading model

In [16]:
model_path = "/content/drive/MyDrive/SAC_models/sac_final_10000_steps.zip"
model = SAC.load(model_path)

Creating env and visualizations

In [17]:
# Initializing the environment
env = gym.make('highway-fast-v0', render_mode='rgb_array')
env.configure(config)
env.config["show_trajectories"] = True
env = record_videos(env)

# Running the first test
(obs, info), done, truncated = env.reset(seed=200), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

# Running the second test on a different seed
(obs, info), done, truncated = env.reset(seed=22), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

env.close()
show_videos()

  logger.warn(
  logger.warn(
  logger.warn(


Moviepy - Building video /content/videos/rl-video-episode-0.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-0.mp4



                                                   

Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-0.mp4




Moviepy - Building video /content/videos/rl-video-episode-1.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-1.mp4



                                                   

Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-1.mp4




#### SAC 20 000

In [18]:
model_path = "/content/drive/MyDrive/SAC_models/sac_final_20000_steps.zip"
model = SAC.load(model_path)

In [19]:
# Initializing the environment
env = gym.make('highway-fast-v0', render_mode='rgb_array')
env.configure(config)
env.config["show_trajectories"] = True
env = record_videos(env)

# Running the first test
(obs, info), done, truncated = env.reset(seed=200), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

# Running the second test on a different seed
(obs, info), done, truncated = env.reset(seed=22), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

env.close()
show_videos()

  logger.warn(
  logger.warn(
  logger.warn(


Moviepy - Building video /content/videos/rl-video-episode-0.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-0.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-0.mp4
Moviepy - Building video /content/videos/rl-video-episode-1.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-1.mp4



                                                   

Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-1.mp4




#### SAC 40 000

In [20]:
model_path = "/content/drive/MyDrive/SAC_models/sac_final_40000_steps.zip"
model = SAC.load(model_path)

  and should_run_async(code)


In [21]:
# Initializing the environment
env = gym.make('highway-fast-v0', render_mode='rgb_array')
env.configure(config)
env.config["show_trajectories"] = True
env = record_videos(env)

# Running the first test
(obs, info), done, truncated = env.reset(seed=200), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

# Running the second test on a different seed
(obs, info), done, truncated = env.reset(seed=22), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

env.close()
show_videos()

  logger.warn(
  logger.warn(
  logger.warn(


Moviepy - Building video /content/videos/rl-video-episode-0.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-0.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-0.mp4
Moviepy - Building video /content/videos/rl-video-episode-1.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-1.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-1.mp4


#### SAC 100 000

In [22]:
model_path = "/content/drive/MyDrive/SAC_models/sac_final_100000.0_steps.zip"
model = SAC.load(model_path)

  and should_run_async(code)


In [23]:
# Initializing the environment
env = gym.make('highway-fast-v0', render_mode='rgb_array')
env.configure(config)
env.config["show_trajectories"] = True
env = record_videos(env)

# Running the first test
(obs, info), done, truncated = env.reset(seed=200), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

# Running the second test on a different seed
(obs, info), done, truncated = env.reset(seed=22), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

env.close()
show_videos()

  logger.warn(
  logger.warn(
  logger.warn(


Moviepy - Building video /content/videos/rl-video-episode-0.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-0.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-0.mp4
Moviepy - Building video /content/videos/rl-video-episode-1.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-1.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-1.mp4


### SAC 100 000 runs with changes to the reward function
default high speed reward is 0.4 and collision reward is -1

#### Configuration: env.config["high_speed_reward"] = 0.8 & env.config["collision_reward"] = -2
The hope has been to motivate the agent to drive at higher speeds while still preventing it from crashing too often because of being too focused on getting the high speed reward.
However, the agent still behaved similarly then before, potentially due to increasing the collision penalty too strongly here.

In [24]:
model_path = "/content/drive/MyDrive/SAC_models/sac_new_rew_100000.0_steps.zip"
model = SAC.load(model_path)

Exception: Can't get attribute '_function_setstate' on <module 'cloudpickle.cloudpickle' from '/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py'>


In [25]:
# Initializing the environment
env = gym.make('highway-fast-v0', render_mode='rgb_array')
env.configure(config)
env.config["show_trajectories"] = True
env = record_videos(env)

# Running the first test
(obs, info), done, truncated = env.reset(seed=200), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

# Running the second test on a different seed
(obs, info), done, truncated = env.reset(seed=22), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

env.close()
show_videos()

  logger.warn(
  logger.warn(
  logger.warn(


Moviepy - Building video /content/videos/rl-video-episode-0.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-0.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-0.mp4
Moviepy - Building video /content/videos/rl-video-episode-1.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-1.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-1.mp4


#### Configuration: env.config["high_speed_reward"] = 2 & env.config["collision_reward"] = -1
The goal of this was testing what drastically increasing the high speed reward would due. It turns out that now the agent makes random turns at some points or crashes really early. This behaviour is overall way worse than before.

In [26]:
model_path = "/content/drive/MyDrive/SAC_models/sac_speed_100000.0_steps.zip"
model = SAC.load(model_path)

  and should_run_async(code)
Exception: Can't get attribute '_function_setstate' on <module 'cloudpickle.cloudpickle' from '/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py'>


In [27]:
# Initializing the environment
env = gym.make('highway-fast-v0', render_mode='rgb_array')
env.configure(config)
env.config["show_trajectories"] = True
env = record_videos(env)

# Running the first test
(obs, info), done, truncated = env.reset(seed=200), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

# Running the second test on a different seed
(obs, info), done, truncated = env.reset(seed=22), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

env.close()
show_videos()

  logger.warn(
  logger.warn(
  logger.warn(


Moviepy - Building video /content/videos/rl-video-episode-0.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-0.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-0.mp4
Moviepy - Building video /content/videos/rl-video-episode-1.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-1.mp4



                                                   

Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-1.mp4




#### Changes to default: env.config["high_speed_reward"] = 1 & env.config["collision_reward"] = 1
The goal here has been to find a suitable balance between the rewards. However, the visualizations clearly show that there is no balance at all.
These trial demonstrate how hard it is to find a suitable reward function in a continuous action space, especially with limited ressources.

In [28]:
model_path = "/content/drive/MyDrive/SAC_models/sac_bal_speed_100000.0_steps.zip"
model = SAC.load(model_path)

  and should_run_async(code)
Exception: Can't get attribute '_function_setstate' on <module 'cloudpickle.cloudpickle' from '/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py'>


In [29]:
# Initializing the environment
env = gym.make('highway-fast-v0', render_mode='rgb_array')
env.configure(config)
env.config["show_trajectories"] = True
env = record_videos(env)

# Running the first test
(obs, info), done, truncated = env.reset(seed=200), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

# Running the second test on a different seed
(obs, info), done, truncated = env.reset(seed=22), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

env.close()
show_videos()

  logger.warn(
  logger.warn(
  logger.warn(


Moviepy - Building video /content/videos/rl-video-episode-0.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-0.mp4



                                                   

Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-0.mp4




Moviepy - Building video /content/videos/rl-video-episode-1.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-1.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-1.mp4


#### Changes to default: env.config["high_speed_reward"] = 1 & env.config["collision_reward"] = 1 & env.config["reward_speed_range"] = [10, 30]
The last trial has been made in adapting the lower bound at which the agent starts to receive high speed rewards. This however could also not motivate it to drive faster, it seemed just happy with getting some small more rewards for driving at the same speed as before with the standard reward function essentially. The incentive of leading it onto driving a bit faster from time to time has not worked out.
It also does not hold the road as good anymore, because during training potentially it now reached speeds that gave additional reward before crashing into the border sometimes.

In [30]:
model_path = "/content/drive/MyDrive/SAC_models/sac_range_100000.0_steps.zip"
model = SAC.load(model_path)

Exception: Can't get attribute '_function_setstate' on <module 'cloudpickle.cloudpickle' from '/usr/local/lib/python3.10/dist-packages/cloudpickle/cloudpickle.py'>


In [31]:
# Initializing the environment
env = gym.make('highway-fast-v0', render_mode='rgb_array')
env.configure(config)
env.config["show_trajectories"] = True
env = record_videos(env)

# Running the first test
(obs, info), done, truncated = env.reset(seed=200), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

# Running the second test on a different seed
(obs, info), done, truncated = env.reset(seed=22), False, False
while not (done or truncated):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)

env.close()
show_videos()

  logger.warn(
  logger.warn(
  logger.warn(


Moviepy - Building video /content/videos/rl-video-episode-0.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-0.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-0.mp4
Moviepy - Building video /content/videos/rl-video-episode-1.mp4.
Moviepy - Writing video /content/videos/rl-video-episode-1.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/rl-video-episode-1.mp4
