# Lab 5

**Due Date**: 2/26/25 by 8pm on Canvas

## Installing Libraries

Machine learning in Python has a plethora of popularly used libraries. We can install these library packages within the Jupyter Notebooks itself. Just run the cell below and it should download and install them to your computer. You only need to run this cell one time. After the package has been installed, feel free to change the cell type below from "Code" to "Raw" so it doesn't run this again.

In [5]:
import sys
!{'"' + sys.executable + '"'} -m pip install gymnasium
!{'"' + sys.executable + '"'} -m pip install "stable-baselines3[extra]"

Collecting stable-baselines3[extra]
  Downloading stable_baselines3-2.5.0-py3-none-any.whl.metadata (4.8 kB)
INFO: pip is looking at multiple versions of stable-baselines3[extra] to determine which version is compatible with other requirements. This could take a while.
  Downloading stable_baselines3-2.4.1-py3-none-any.whl.metadata (4.5 kB)
Collecting opencv-python (from stable-baselines3[extra])
  Downloading opencv-python-4.11.0.86.tar.gz (95.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.2/95.2 MB[0m [31m32.2 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting pygame (from stable-baselines3[extra])
  Downloading pygame-2.6.1-cp311-cp311-macosx_10_9_x86_64.whl.metadata (12 kB)
Collecting tensorboard>=2.9.1 (from stable-ba

## Imports

In [7]:
import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env

## The Environment

In reinforcement learning, our AI agent is going to be interacting with an environment. We will be using the Gymansium library to handle creating a virtual environment for us. Specifically, we are going to use the classic Cart Pole problem. You can choose whether or not to render the environment by changing the flag value.

In [9]:
render_flag = True

if render_flag:
    env = gym.make('CartPole-v1', render_mode='human')
else:
    env = gym.make('CartPole-v1')
    env.reset()
    env.render()

## RL Agent

With the environment generated for us, let's focus on the RL. We will be using the Stable Baselines3 (SB3) library for this purpose. This is a set of reliable implementations of reinforcement learning algorithms in PyTorch.

Research on how to create a RL agent that uses the [PPO (Proximal Policy Optimization) algorithm in SB3](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html). FYI, this algorithm is a popular and effective RL algorithm. It's a type of policy gradient method, meaning it directly learns and optimizes the policy (a strategy for choosing actions) by adjusting the parameters of a policy function.

Try out various hyperparameter configurations, for example:

- learning_rate
- gamma
- n_steps
- batch_size

In [11]:
# TODO: experiment with different hyperparameters
model = PPO('MlpPolicy', env, learning_rate=3e-4, gamma=0.99, n_steps=1024, batch_size=64)

## Train the Agent

With an RL agent chosen, now we start the training phase. Use the `learn()` method on your agent. You will need to pass in a value for the parameter `total_timesteps`.

In [13]:
# TODO: use the `learn` method on the `model` and supply a value for `total_timesteps`
model.learn(total_timesteps=100000)

<stable_baselines3.ppo.ppo.PPO at 0x1a1417050>

## Save the Trained Model

Because training can take SOOOO long, you typically want to save the results of your trained model to an external file. The code below will save the model to a `zip` file on your computer.

In [15]:
# save the trained model
model_filename = 'cartpole_ppo_model'
model.save(model_filename)
print(f'Model saved to {model_filename}')

Model saved to cartpole_ppo_model


## Load the Trained Model

With the model saved, we can load this in to the notebook and continue on.

In [17]:
# load the trained model
loaded_model = PPO.load(model_filename, env=env)

## Evaluate the Agent

The moment of truth: how does the trained model fare in the environment? Complete the **TODO** tasks to see how your model does. You will need to look up the docs for how the [`predict`](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html) and [`step`](https://gymnasium.farama.org/api/env/) methods work.

In [19]:
# evaluate the agent
num_evaluation_steps = 5
mean_reward = 0
for i in range(num_evaluation_steps):
    obs, _ = env.reset()
    terminated = False
    truncated = False
    episode_reward = 0
    while not terminated and not truncated:
        # TODO: use the `predict` method on the `loaded_model`
        action, states = loaded_model.predict(obs, deterministic=True)
        
        # TODO: use the `step` method on the `env`
        obs, reward, terminated, truncated, info = env.step(action)

        # TODO: update the `episode_reward` with the new reward
        episode_reward += reward

        # possibly render the environment
        if render_flag:
            env.render()

    # TODO: update the `mean_reward` with the `episode_reward`
mean_reward += episode_reward

# TODO: complete the computation of `mean_reward`
mean_reward /= num_evaluation_steps

# display the information
print(f'Mean reward over {num_evaluation_steps} episodes: {mean_reward}')

Mean reward over 5 episodes: 100.0


## Close the Environment

The last thing to do when using Gymnasium is to close out the environment.

In [21]:
env.close()