# Driving a Racecar with Reinforcement Learning
---
The aim of this notebook is to train a reinforcement learning model to learn how to navigate the Box2D Car Racing environment, provided by the Gymnasium online project <a href="https://gymnasium.farama.org/environments/box2d/car_racing/">here</a>.

In [None]:
import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import VecFrameStack
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.evaluation import evaluate_policy
import os

## Problem setup
---

In [None]:
environment_name = "CarRacing-v2"
env = gym.make(environment_name, render_mode='human')

In [None]:
# generates track in a separate window
env.reset()

In [None]:
env.action_space

In [None]:
# racing track, 96 x 96 image with 3 colour overlays
env.observation_space

In [None]:
# produces new window with racetrack environment
env.render()

In [None]:
# close opened track environment
env.close()

You can observe the car's path when taking random actions by running the below code. An episode will terminate when either a fixed number of timesteps have passed, the car visits all the track tiles, or the car falls off the racetrack (in which case it receives reward -100). Feel free to change the number of `episodes`:

In [None]:
environment_name = "CarRacing-v2"
env = gym.make(environment_name, render_mode='human')

episodes = 5
for episode in range(1, episodes+1):
    state = env.reset()
    done = False
    score = 0 
    
    while not done:
        env.render()
        # random action
        action = env.action_space.sample()
        n_state, reward, done, info, _ = env.step(action)
        score+=reward
    print('Episode:{} Score:{}'.format(episode, score))
env.close()

In [None]:
# close the environment if still open in a separate window
env.close()

## Model training
---
We will train our racecar using the Proximal Policy Optimisation (PPO) algorithm.

In [None]:
log_path = os.path.join('Training', 'Logs')

In [None]:
log_path

The `CnnPolicy` policy network is able to deal with image recognition, which is how our agent observes the problem.

In [None]:
env = gym.make(environment_name)
env = DummyVecEnv([lambda: env])
# multi-layer perceptron policy
model = PPO("CnnPolicy", env, verbose=1, tensorboard_log=log_path)

One can modify the number of epochs in which the model is trained. The results can subsequently be saved and later evaluated, viewed, etc. When training, we can view some standard metrics from the model, such as the loss and timesteps elapsed. By including the `tensorboard_log=log_path` parameter in the above code block, we are able to dump the training metrics onto our local machine for visuals in Tensorboard later on.

In [None]:
steps = 100000

# train the model
model.learn(total_timesteps=steps)

## Saving the model

In [None]:
PPO_Path = os.path.join('Training', 'Saved Models', 'PPO_racecar_1,000,000')

In [None]:
# save model to specified location
model.save(PPO_Path)

In [None]:
# delete and reload model
# del model
# model = PPO.load(PPO_Path, env=env)

In [None]:
PPO_Path

## Model evaluation



We can view the performance metrics of our model thanks to the `tensorboard_log` from earlier. To view the board, open the command prompt and activate a virtual environment which has TensorFlow installed. Then run the line `tensorboard --logdir=` followed immediately by the file directory where the Tensorboard log from training was saved.

We can also used the saved model zip file to view our trained models in action. To so do, run the first three code blocks below. Once the runs have completed, you can then reset the simulation by running the third code block, then the first two again.

In [None]:
PPO_Path = os.path.join('Training', 'Saved Models', 'PPO_racecar_1,000,000')

In [None]:
environment_name = "CarRacing-v2"
env = gym.make(environment_name, render_mode='human')

In [None]:
model = PPO.load(PPO_Path, env=env)
evaluate_policy(model, env, n_eval_episodes=10)

In [None]:
env.close()