# Final Project: [HighwayEnv](https://github.com/Farama-Foundation/HighwayEnv/tree/master)

Ressources:
- **Highway-env** [👨‍💻Repo](https://github.com/Farama-Foundation/HighwayEnv/tree/master) | [📜Documentation](http://highway-env.farama.org/quickstart/)
- **OpenAI Gym**
- **Stable-Baselines3**: [👨‍💻Repo](https://github.com/DLR-RM/stable-baselines3) | [📜Documentation](https://stable-baselines.readthedocs.io/en/master/)

### Your task: Solve the Highway
![](https://raw.githubusercontent.com/eleurent/highway-env/gh-media/docs/media/highway.gif?raw=true)
- By Group of two
- Implement at least two different RL Algorithms
- Produce a notebook and a report

*Rated based on the report, the performances and the code.*

#### Goals
- Describe Your choices and explain the algorithms used.
- Benchmark and compare them depending on their hyperparameters.

### Roadmap
- 📆DATE : Send Me your group names
- 📆DATE :Send a report (5-10 pages) and a notebook / script
- 📆DATE :Final presentation to the class



## Utlilities
⚠️ *Do not Modify anything here !*

but always read everything to be sure of what is available

### Imports

In [None]:
!pip install gymnasium>=1.0.0a2
!pip install farama-notifications>=0.0.1
!pip install numpy>=1.21.0
!pip install pygame>=2.0.2
!pip install stable-baselines3[extra]
!pip install highway_env
#tensorboard loading if you want to use it
%load_ext tensorboard

### Utils

In [None]:
### VIDEO RECORDER
# Set up fake display; otherwise rendering will fail
import os
import base64
from pathlib import Path
from IPython import display as ipythondisplay
from tqdm import tqdm

os.system("Xvfb :1 -screen 0 1024x768x24 &")
os.environ['DISPLAY'] = ':1'

from stable_baselines3.common.vec_env import VecVideoRecorder, DummyVecEnv

def record_video(env_id, model, video_length=500, prefix="", video_folder="videos/", fps = 10):
    """
    :param env_id: (str)
    :param model: (RL model)
    :param video_length: (int)
    :param prefix: (str)
    :param video_folder: (str)
    """
    eval_env = DummyVecEnv([lambda: gym.make(env_id, render_mode="rgb_array")])
    eval_env.metadata["render_fps"] = fps
    # Start the video at step=0 and record 500 steps
    eval_env = VecVideoRecorder(
        eval_env,
        video_folder=video_folder,
        record_video_trigger=lambda step: step == 0,
        video_length=video_length,
        name_prefix=prefix,
    )
    obs = eval_env.reset()
    for _ in tqdm(range(video_length)):
        action, _ = model.predict(obs)
        obs, _, _, _ = eval_env.step(action)

    # Close the video recorder
    eval_env.close()

def show_videos(video_path="", prefix=""):
    """
    Taken from https://github.com/eleurent/highway-env

    :param video_path: (str) Path to the folder containing videos
    :param prefix: (str) Filter the video, showing only the only starting with this prefix
    """
    html = []
    for mp4 in Path(video_path).glob("{}*.mp4".format(prefix)):
        video_b64 = base64.b64encode(mp4.read_bytes())
        html.append(
            """<video alt="{}" autoplay
                    loop controls style="height: 200px;">
                    <source src="data:video/mp4;base64,{}" type="video/mp4" />
                </video>""".format(
                mp4, video_b64.decode("ascii")
            )
        )
    ipythondisplay.display(ipythondisplay.HTML(data="<br>".join(html)))

In [None]:
# prompt: define an evaluation function computing mean reward and elapsed episode time on a few runs of vectorized environments

import numpy as np

def evaluate(model, num_episodes=30):
    """
    Evaluates a reinforcement learning agent.

    Args:
        model: The trained RL model.
        env: The environment to evaluate the model on.
        num_episodes: The number of episodes to run for evaluation.

    Returns:
        A tuple containing the mean reward and the mean elapsed time per episode.
    """
    env_id = "highway-fast-v0"
    env = make_vec_env(env_id)
    episode_rewards = []
    episode_times = []
    print(f"evaluating Model on {num_episodes} episodes ...")
    for _ in tqdm(range(num_episodes)):
        obs = env.reset()
        done = False
        total_reward = 0
        start_time = 0 # Assuming env provides time information. Replace with actual time tracking
        current_time = 0

        while not done:
          action, _states = model.predict(obs, deterministic=True)
          obs, reward, done, info = env.step(action)
          total_reward += reward
          current_time += 1 # Replace with actual elapsed time from env info

        episode_rewards.append(total_reward)
        episode_times.append(current_time - start_time)

    mean_reward = np.mean(episode_rewards)
    mean_time = np.mean(episode_times)
    std_reward = np.std(episode_rewards)
    std_time = np.std(episode_times)
    print(f"\n{'-'*50}\nResults :\n\t- Mean Reward: {mean_reward:.3f} ± {std_reward:.2f} \n\t- Mean elapsed Time per episode: {mean_time:.3f} ± {std_time:.2f}\n{'-'*50}")
    return mean_reward, mean_time


## The Highway Environment

In [None]:
## IMPORTS
import gymnasium as gym
from stable_baselines3 import PPO, DQN
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import SubprocVecEnv
import highway_env  # noqa: F401

## Load and explore Environment
Lets first load an untrained model and see how it behaves in the environment.

In [None]:
env_id = "highway-fast-v0"
env = make_vec_env(env_id)
#instanciate model
model = PPO("MlpPolicy", env, verbose=1)

#generate video of random model
record_video(env_id, model, video_length=50, prefix="random-agent", fps = 5)
show_videos("videos", prefix="random-agent")

In [None]:
evaluate(model)

Let's now explore the environments settings:
### Action Space
👉 Look at the action space, what actions can the model do ?

In [None]:
######### YOUR CODE HERE #########


### Observation Space
👉 Look at the [documentation](http://highway-env.farama.org/observations/) for possibles observations of the agents on the Highway

👉 Look at the observation spae in our case

In [None]:
######### YOUR CODE HERE #########


# Training an Agent on the Environment
👉 **Now it is your turn**, train your agents
Recall:
- you must try and compare different RL Algorithms
- part of your grade will be the evaluation of your best Agent.

🔥Tips
- Use tensorboard to monitor your trainings
- install it locally to get faster and longer trainings (not mandatory, colab should be ok)

In [None]:
# if you wnat to use tensorboard, highly recommended
%tensorboard --logdir "highway"

In [None]:
######### YOUR CODE HERE #########


In [None]:
######### SOME OTHER FANCY TRAINING HERE #########

In [None]:
### SAVE YOUR FINAL MODEL
model_final = .... #YOUR MODEL
model_final.save("highway_final")

# Evalutation
⚠️ *Do not Modify anything here !*

Now that your Agents are trained, we evaluate them

In [None]:
evaluate(model_final)

In [None]:
env_id = "highway-v0"
# Generate video of trained model
record_video(env_id, model_final, video_length=70, prefix="trained-agent", fps = 5)
show_videos("videos", prefix="trained-agent")