# RL Training Notebook: Smart Traffic Management System
This notebook demonstrates how to train a baseline PPO agent in SUMO for urban traffic signal optimization.

## 1. Project Overview & Setup

- **Objective:** Train a PPO agent to optimize traffic signals using SUMO simulation.
- **Requirements:** SUMO, TraCI, Stable-Baselines3, PyTorch, OpenCV, YOLOv8, matplotlib.
- **Setup:** Ensure SUMO and Python dependencies are installed.

In [None]:
# 2. Import Libraries
import os
import sys
import numpy as np
import matplotlib.pyplot as plt
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
import torch

# SUMO TraCI imports (requires SUMO installed)
try:
    import traci
except ImportError:
    print('Please install SUMO and TraCI Python bindings.')

# Import project RL environment
sys.path.append('../src')
from src.sim.env import CustomSUTrafficEnv

## 3. SUMO Environment Initialization

Set up the SUMO simulation environment and define the RL interface.

In [None]:
# Example: Initialize SUMO RL environment (project-compatible)
def make_env():
    return CustomSUTrafficEnv(nogui=True)

# Wrap with DummyVecEnv for Stable-Baselines3
env = DummyVecEnv([make_env])

## 4. PPO Agent Training

Train a PPO agent using Stable-Baselines3 on the SUMO environment.

In [None]:
# PPO training loop using project RL environment
model = PPO('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=10000)

# Save model
model.save("ppo_agent_sumo")

## 5. Evaluation & Visualization

Evaluate the trained agent and visualize results (e.g., average travel time reduction).

In [None]:
# Evaluate agent and plot results
obs = env.reset()
if isinstance(obs, tuple):
    obs = obs[0]
rewards = []
for _ in range(1000):
    action, _ = model.predict(obs, deterministic=True)
    step_result = env.step(action)
    if isinstance(step_result, tuple):
        obs, reward, done, info = step_result
    else:
        obs = step_result
        reward = 0.0
        done = False
        info = {}
    rewards.append(reward[0] if isinstance(reward, (list, np.ndarray)) else reward)
    if done[0] if isinstance(done, (list, np.ndarray)) else done:
        obs = env.reset()
        if isinstance(obs, tuple):
            obs = obs[0]

plt.plot(rewards)
plt.title('Episode Rewards')
plt.xlabel('Step')
plt.ylabel('Reward')
plt.show()

## 6. Next Steps & References

- Extend environment for multi-intersection and real camera feeds.
- Integrate YOLOv8 for live vehicle detection.
- References:
    - [SUMO Documentation](https://sumo.dlr.de/docs/)
    - [Stable-Baselines3](https://stable-baselines3.readthedocs.io/)
    - [PyTorch](https://pytorch.org/)

## Compatibility Update

This notebook now uses the project’s Gym-compatible SUMO RL environment (`CustomSUTrafficEnv`) for PPO training.