# CartPole PPO Walkthrough

This notebook demonstrates how to:

1. Train a PPO agent on `CartPole-v1` using the training script.
2. Visualize the resulting training curve.
3. Run inference and watch the trained policy interact with the environment.

It is intended as a readable, step-by-step companion to the scripts in this project.

## 1. Imports and Setup

This cell imports the necessary Python packages.
Make sure you have installed all dependencies from `requirements.txt` before running the notebook.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
from pathlib import Path

from cartpole.cartpole_ppo_train import train_cartpole, create_directories, load_monitor_csv
from cartpole.cartpole_ppo_infer import run_inference

## 2. Train the PPO Agent

Here we call the `train_cartpole` function from `cartpole_ppo_train.py`.
You can adjust the number of timesteps to trade off training time vs. performance.

In [None]:
total_timesteps = 100_000  # You can increase this for better performance
train_cartpole(total_timesteps=total_timesteps, model_name="cartpole_ppo_model_notebook", seed=0)

## 3. Visualize Training Rewards

The training script logs episode rewards using a Monitor wrapper.
Here we load that log and plot the rewards directly in the notebook.

In [None]:
paths = create_directories()
monitor_df = load_monitor_csv(paths.logs_dir)
monitor_df.head()

In [None]:
episode_rewards = monitor_df["episode_reward"].values
episodes = range(1, len(episode_rewards) + 1)
rolling_rewards = pd.Series(episode_rewards).rolling(10).mean()

plt.figure(figsize=(10, 6))
plt.plot(episodes, episode_rewards, label="Episode reward", alpha=0.4)
plt.plot(episodes, rolling_rewards, label="Rolling mean (10 eps)", linewidth=2)
plt.xlabel("Episode")
plt.ylabel("Reward")
plt.title("CartPole PPO Training Rewards (Notebook Run)")
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()

## 4. Run Inference / Visualization

Finally, we run inference using the helper `run_inference` function.
This will open a render window (depending on platform) and execute a few episodes.

In [None]:
run_inference(model_name="cartpole_ppo_model_notebook", episodes=3, render_mode="human")