# Final Project: [HighwayEnv](https://github.com/Farama-Foundation/HighwayEnv/tree/master)

Ressources:
- **Highway-env** [👨‍💻Repo](https://github.com/Farama-Foundation/HighwayEnv/tree/master) | [📜Documentation](http://highway-env.farama.org/quickstart/)
- **OpenAI Gym**
- **Stable-Baselines3**: [👨‍💻Repo](https://github.com/DLR-RM/stable-baselines3) | [📜Documentation](https://stable-baselines.readthedocs.io/en/master/)

### Your task: Solve the Highway
![](https://raw.githubusercontent.com/eleurent/highway-env/gh-media/docs/media/highway.gif?raw=true)
- By Group of two, three
- Use *at least* two different RL Algorithms
  - try to implement at least one 'by hand'

### Evaluation
*Based on the report (showing that you understood what you did), the performances and the code (you did something that works).*

- **Produce a notebook**
  -  The notebook must run one one go, I will not loose time trying to fix your env...
  - Possible to send a git repo with the weight so that I ca nrun them locally.
- **Produce a 2-5 pages report**
  - Describe Your choices and explain the algorithms used.
  - Benchmark and compare them depending on their hyperparameters.

*Analysis could include exploration of hyperparameters, figures of training, explainations of how your algorithm works*

### Roadmap
- 📆 **12 feb 2025**: Send Me your group names and composition
- 📆 **25 mars** : Send a report (5-10 pages) and a notebook / script



## Utlilities
⚠️ *Do not Modify anything here !*

but always read everything to be sure of what is available

### Imports

In [1]:
!pip install gymnasium>=1.0.0a2
!pip install farama-notifications>=0.0.1
!pip install numpy>=1.21.0
!pip install pygame>=2.0.2
!pip install stable-baselines3[extra]
!pip install highway_env
#tensorboard loading if you want to use it
%load_ext tensorboard

zsh:1: 1.0.0a2 not found
zsh:1: 0.0.1 not found
zsh:1: 1.21.0 not found
zsh:1: 2.0.2 not found
zsh:1: no matches found: stable-baselines3[extra]


### Utils

In [2]:
### VIDEO RECORDER
# Set up fake display; otherwise rendering will fail
import os
import base64
from pathlib import Path
from IPython import display as ipythondisplay
from tqdm import tqdm

os.system("Xvfb :1 -screen 0 1024x768x24 &")
os.environ['DISPLAY'] = ':1'

from stable_baselines3.common.vec_env import VecVideoRecorder, DummyVecEnv

def record_video(env_id, model, video_length=500, prefix="", video_folder="videos/", fps = 10):
    """
    :param env_id: (str)
    :param model: (RL model)
    :param video_length: (int)
    :param prefix: (str)
    :param video_folder: (str)
    """
    eval_env = DummyVecEnv([lambda: gym.make(env_id, render_mode="rgb_array")])
    eval_env.metadata["render_fps"] = fps
    # Start the video at step=0 and record 500 steps
    eval_env = VecVideoRecorder(
        eval_env,
        video_folder=video_folder,
        record_video_trigger=lambda step: step == 0,
        video_length=video_length,
        name_prefix=prefix,
    )
    obs = eval_env.reset()
    for _ in tqdm(range(video_length)):
        action, _ = model.predict(obs)
        obs, _, _, _ = eval_env.step(action)

    # Close the video recorder
    eval_env.close()

def show_videos(video_path="", prefix=""):
    """
    Taken from https://github.com/eleurent/highway-env

    :param video_path: (str) Path to the folder containing videos
    :param prefix: (str) Filter the video, showing only the only starting with this prefix
    """
    html = []
    for mp4 in Path(video_path).glob("{}*.mp4".format(prefix)):
        video_b64 = base64.b64encode(mp4.read_bytes())
        html.append(
            """<video alt="{}" autoplay
                    loop controls style="height: 200px;">
                    <source src="data:video/mp4;base64,{}" type="video/mp4" />
                </video>""".format(
                mp4, video_b64.decode("ascii")
            )
        )
    ipythondisplay.display(ipythondisplay.HTML(data="<br>".join(html)))

_XSERVTransmkdir: ERROR: euid != 0,directory /tmp/.X11-unix will not be created.
_XSERVTransSocketUNIXCreateListener: mkdir(/tmp/.X11-unix) failed, errno = 2
_XSERVTransMakeAllCOTSServerListeners: failed to create listener for local
(EE) 
Fatal server error:
(EE) Cannot establish any listening sockets - Make sure an X server isn't already running(EE) 
2025-03-22 18:49:23.264932: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [3]:
# prompt: define an evaluation function computing mean reward and elapsed episode time on a few runs of vectorized environments
import numpy as np

def evaluate(model, num_episodes=30):
    """
    Evaluates a reinforcement learning agent.

    Args:
        model: The trained RL model.
        env: The environment to evaluate the model on.
        num_episodes: The number of episodes to run for evaluation.

    Returns:
        A tuple containing the mean reward and the mean elapsed time per episode.
    """
    env_id = "highway-fast-v0"
    env = make_vec_env(env_id)
    episode_rewards = []
    episode_times = []
    print(f"evaluating Model on {num_episodes} episodes ...")
    for _ in tqdm(range(num_episodes)):
        obs = env.reset()
        done = False
        total_reward = 0
        start_time = 0 # Assuming env provides time information. Replace with actual time tracking
        current_time = 0

        while not done:
          action, _states = model.predict(obs, deterministic=True)
          obs, reward, done, info = env.step(action)
          total_reward += reward
          current_time += 1 # Replace with actual elapsed time from env info

        episode_rewards.append(total_reward)
        episode_times.append(current_time - start_time)

    mean_reward = np.mean(episode_rewards)
    mean_time = np.mean(episode_times)
    std_reward = np.std(episode_rewards)
    std_time = np.std(episode_times)
    print(f"\n{'-'*50}\nResults :\n\t- Mean Reward: {mean_reward:.3f} ± {std_reward:.2f} \n\t- Mean elapsed Time per episode: {mean_time:.3f} ± {std_time:.2f}\n{'-'*50}")
    return mean_reward, mean_time


## The Highway Environment

In [4]:
## IMPORTS
import gymnasium as gym
from stable_baselines3 import PPO, DQN
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import SubprocVecEnv
import highway_env  # noqa: F401

## Load and explore Environment
Lets first load an untrained model and see how it behaves in the environment.

In [5]:
env_id = "highway-fast-v0"
env = make_vec_env(env_id)
#instanciate model
model = PPO("MlpPolicy", env, verbose=1)

#generate video of random model
record_video(env_id, model, video_length=50, prefix="random-agent", fps = 5)
show_videos("videos", prefix="random-agent")

Using cpu device


  from .autonotebook import tqdm as notebook_tqdm
2025-03-22 18:50:08.913 Python[6122:337176] +[IMKClient subclass]: chose IMKClient_Modern
2025-03-22 18:50:08.913 Python[6122:337176] +[IMKInputSession subclass]: chose IMKInputSession_Modern
 98%|█████████▊| 49/50 [00:04<00:00, 16.10it/s]

Saving video to /Users/anas/Desktop/Renforcement_Learning/Projet/videos/random-agent-step-0-to-step-50.mp4
MoviePy - Building video /Users/anas/Desktop/Renforcement_Learning/Projet/videos/random-agent-step-0-to-step-50.mp4.
MoviePy - Writing video /Users/anas/Desktop/Renforcement_Learning/Projet/videos/random-agent-step-0-to-step-50.mp4



100%|██████████| 50/50 [00:04<00:00, 11.16it/s]

MoviePy - Done !
MoviePy - video ready /Users/anas/Desktop/Renforcement_Learning/Projet/videos/random-agent-step-0-to-step-50.mp4





In [6]:
evaluate(model)

evaluating Model on 30 episodes ...


100%|██████████| 30/30 [00:33<00:00,  1.12s/it]


--------------------------------------------------
Results :
	- Mean Reward: 17.967 ± 5.11 
	- Mean elapsed Time per episode: 24.500 ± 7.41
--------------------------------------------------





(17.967487, 24.5)

Let's now explore the environments settings:
### Action Space
👉 Look at the action space, what actions can the model do ?

In [7]:
print("Action space:", env.action_space)

# Si l'espace d'action est discret alors on affiche le nombre d'actions possible
if hasattr(env.action_space, 'n'):
    print("This is a discrete action space with {} possible actions.".format(env.action_space.n))
    print("Actions available:", list(range(env.action_space.n)))


Action space: Discrete(5)
This is a discrete action space with 5 possible actions.
Actions available: [0, 1, 2, 3, 4]


### Observation Space
👉 Look at the [documentation](http://highway-env.farama.org/observations/) for possibles observations of the agents on the Highway

👉 Look at the observation spae in our case

In [8]:
print("Observation space:", env.observation_space)

# Si c’est un Box, on peut également inspecter ses dimensions et ses bornes
if isinstance(env.observation_space, gym.spaces.Box):
    print("L’espace d’observation est un Box de forme :", env.observation_space.shape)
    print("Bornes basses :", env.observation_space.low)
    print("Bornes hautes :", env.observation_space.high)


Observation space: Box(-inf, inf, (5, 5), float32)
L’espace d’observation est un Box de forme : (5, 5)
Bornes basses : [[-inf -inf -inf -inf -inf]
 [-inf -inf -inf -inf -inf]
 [-inf -inf -inf -inf -inf]
 [-inf -inf -inf -inf -inf]
 [-inf -inf -inf -inf -inf]]
Bornes hautes : [[inf inf inf inf inf]
 [inf inf inf inf inf]
 [inf inf inf inf inf]
 [inf inf inf inf inf]
 [inf inf inf inf inf]]


# Training an Agent on the Environment
👉 **Now it is your turn**, train your agents
Recall:
- you must try and compare different RL Algorithms
- part of your grade will be the evaluation of your best Agent.

🔥Tips
- Use tensorboard to monitor your trainings
- install it locally to get faster and longer trainings (not mandatory, colab should be ok)

In [9]:
######### BUILDING AND TRAINING HERE #########

In [18]:
import os
import gymnasium as gym
import highway_env  # Assurez-vous que highway-env est installé
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv

# --- 1. Création d'une fonction pour instancier l'environnement personnalisé ---
env_id = "highway-fast-v0"

def make_custom_env():
    # Crée l'environnement en mode "rgb_array" pour le rendu vidéo
    env = gym.make(env_id, render_mode="rgb_array")
    # Modification de la configuration via l'attribut config de l'environnement non encapsulé
    config = env.unwrapped.config
    config["observation"] = {
         "type": "Kinematics",
         "vehicles_count": 5,  # nombre de véhicules visibles par l'agent
         "features": ["presence", "x", "y", "vx", "vy"],
         "features_range": {"x": [0, 100], "y": [-10, 10], "vx": [0, 30], "vy": [-5, 5]}
    }
    config["collision_reward"] = -10      # pénalité forte en cas de collision
    config["reward_speed"] = 1.0          # incitation à maintenir une vitesse élevée
    config["lane_change_reward"] = -0.1   # légère pénalité pour éviter des changements trop fréquents
    config["duration"] = 40               # durée de l'épisode
    return env

# --- 2. Entraînement de l'agent avec PPO dans l'environnement personnalisé ---
# Utilisation d'un environnement vectorisé via DummyVecEnv
env = DummyVecEnv([make_custom_env])
model = PPO("MlpPolicy", env, verbose=1, tensorboard_log="./tensorboard/ppo/")
total_timesteps = 200000  # Ajustez selon vos ressources
model.learn(total_timesteps=total_timesteps)
os.makedirs("models", exist_ok=True)
model.save("models/ppo_collision_avoidance")
print("Entraînement terminé et modèle sauvegardé.")


Using cpu device
Logging to ./tensorboard/ppo/PPO_1
-----------------------------
| time/              |      |
|    fps             | 25   |
|    iterations      | 1    |
|    time_elapsed    | 80   |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 24         |
|    iterations           | 2          |
|    time_elapsed         | 168        |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.01525452 |
|    clip_fraction        | 0.248      |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.6       |
|    explained_variance   | -0.0063    |
|    learning_rate        | 0.0003     |
|    loss                 | 8.33       |
|    n_updates            | 10         |
|    policy_gradient_loss | -0.0282    |
|    value_loss           | 18.9       |
----------------------------------------


In [19]:
# --- 3. Enregistrement et affichage de la vidéo ---
# Vous utilisez ici vos fonctions record_video et show_videos déjà définies dans votre notebook.
record_video(env_id, model, video_length=200, prefix="ppo_collision_avoidance", fps=10)
show_videos("videos", prefix="ppo_collision_avoidance")

 99%|█████████▉| 198/200 [00:11<00:00, 16.73it/s]

Saving video to /Users/anas/Desktop/Renforcement_Learning/Projet/videos/ppo_collision_avoidance-step-0-to-step-200.mp4
MoviePy - Building video /Users/anas/Desktop/Renforcement_Learning/Projet/videos/ppo_collision_avoidance-step-0-to-step-200.mp4.
MoviePy - Writing video /Users/anas/Desktop/Renforcement_Learning/Projet/videos/ppo_collision_avoidance-step-0-to-step-200.mp4



100%|██████████| 200/200 [00:12<00:00, 16.29it/s]


MoviePy - Done !
MoviePy - video ready /Users/anas/Desktop/Renforcement_Learning/Projet/videos/ppo_collision_avoidance-step-0-to-step-200.mp4


In [None]:
######### SOME OTHER FANCY TRAINING HERE #########

In [None]:
### SAVE YOUR FINAL MODEL
model_final = .... #YOUR MODEL
model_final.save("highway_final")

# Evalutation
⚠️ *Do not Modify anything here !*

Now that your Agents are trained, we evaluate them

In [None]:
evaluate(model_final)

In [None]:
env_id = "highway-v0"
# Generate video of trained model
record_video(env_id, model_final, video_length=70, prefix="trained-agent", fps = 5)
show_videos("videos", prefix="trained-agent")

# 🎁 Bonus
If it was too easy for your, you can also try to train an agent on an even more difficult environment, for instance the `racetrack` *(see the highway env repo for other possible environments)*

---
![](https://raw.githubusercontent.com/eleurent/highway-env/gh-media/docs/media/racetrack-env.gif?raw=true)
