<h1 style="text-align:center">Traffic Light RL</h1>

---

<p style="text-align:center">
    <time datetime="2024-09-22">October 18, 2024</time> | <span>Topics: Sumo, Ray, and simulated enviroments</span>
    <br>
</p>

## Project Overview

In this project, we'll teach an AI to control **traffic signals** using reinforcement learning (RL) in a simulated traffic environment. By interacting with the traffic network, the AI will learn how to optimize traffic flow by controlling the signals, improving efficiency and reducing vehicle waiting time. We'll be using tools like **Ray**, **SUMO**, and **SUMO-RL** to build and train our model.

Reinforcement learning is a powerful technique for training AI agents by letting them learn from trial and error in dynamic environments. This project will introduce key concepts of reinforcement learning and show how they can be applied to real-world problems, such as optimizing traffic light schedules to reduce congestion.

For scenarios where multiple traffic lights need to be controlled, we will extend the model into a **multi-agent** system. Each traffic signal will act as an independent agent, working either collaboratively or independently to manage traffic in different parts of the network. This multi-agent approach is necessary when dealing with larger networks where multiple intersections must be optimized simultaneously.

### Objectives

- Set up the traffic simulation environment using SUMO and SUMO-RL.
- Train an AI model to control traffic lights using Proximal Policy Optimization (PPO), a popular RL algorithm.
- Extend the system to handle multiple traffic lights by creating a multi-agent setup, where each signal acts as an independent RL agent.
- Evaluate the performance of the model as it interacts with the traffic environment and measures improvements in traffic flow.

---

## Setting up SUMO Environment

### Objective
- Set up the SUMO environment to simulate a multi-intersection traffic grid for reinforcement learning (RL).

### Step-by-Step Explanation
Before we can train our RL agents, we need to set up the traffic simulation environment using SUMO and SUMO-RL. This involves defining the layout of the road network, the traffic routes, and integrating SUMO with our RL environment.

- **To accomplish the following, you don't need to it manually through code but rather through the gui and export the necessary XML's**

- **Create Traffic Network**  
  Use SUMO’s `.net.xml` file format to define the road layout. This includes the number of lanes, intersections, and traffic lights. For example, you can create a 2x2 or 4x4 grid, representing multiple intersections that the AI will control.

- **Route File**  
  The `.rou.xml` file defines the vehicle routes, which specify how cars move through the network. It’s important for simulating real-world traffic conditions.

- **Configure SUMO-RL**  
  SUMO-RL provides an interface for connecting SUMO with reinforcement learning. We’ll use it to load the traffic network and route files into the RL environment, enabling the agents to interact with the simulation.

### Outcome
At this stage, your SUMO traffic simulation environment is set up, and it is ready to be used by the RL agents for training and optimizing traffic signals.


In [None]:
# Setting up the SUMO Environment

---

## Define Multi-Agent RL Environment

### Objective
- Set up the multi-agent RL environment where each traffic light acts as an independent agent
- Deifne the observation, action, and reward spaces for training.

### Step-by-Step Explanation
Since we are controlling multiple traffic lights, we need to define a multi-agent environment in SUMO-RL where each traffic signal operates independently, learning how to manage traffic flow efficiently.

- **Why multi-agent?**  
  Each traffic signal operates in its own space, so a multi-agent setup allows each signal to learn independently but cooperatively manage traffic across intersections.

- **Observation Space**  
  The observation space is already predefined in the SUMO-RL library, but it’s important to understand the information the agents receive from the environment.

- **Action Space**  
  The action space is also predefined, allowing agents to choose between different phase configurations. However, it's important to define which actions are available to each agent and how often they can make decisions.

- **Reward Function**  
  The reward function defines how the agent is rewarded or penalized based on its actions. By default, the reward is based on total delay or waiting time of vehicles. You may want to define a custom reward function depending on the traffic optimization goal, such as reducing queue lengths or minimizing total delay.

### Outcome
After configuring the multi-agent environment, each traffic signal is now an independent RL agent, ready to learn from its environment and optimize traffic flow.


In [None]:
# Define Multi-Agent RL Environment

---

## Implement and Train PPO Algorithm

### Objective
- Implement and train the agents using the Proximal Policy Optimization (PPO) algorithm from Ray to optimize traffic signal control.

### Step-by-Step Explanation
PPO is a popular reinforcement learning algorithm that works well with complex environments like traffic simulations. We'll use **Ray** to implement, configure, and train the PPO algorithm in the multi-agent environment.

- **Why PPO?**  
  PPO is robust and efficient for environments with continuous and discrete action spaces, making it suitable for traffic light control, where the agent’s actions (signal phase changes) can have a large impact on traffic flow.

- **Configure PPO**  
  Set up PPO with custom hyperparameters like learning rate, number of timesteps, and discount factor. You’ll also need to define the policy and model architecture based on your environment. Also be sure to setup TensorBoard as this will allow us to track training metrics. 

- **Integrate PPO with SUMO-RL**  
  Connect the PPO model to the multi-agent environment created in SUMO-RL. The PPO algorithm will use the agents' observations to update its policy and optimize traffic control.

- **Train Model and Track Progress**  
  During training, monitor key metrics available from the observation space, such as lane density, vehicle queue lengths, and traffic signal phases. These metrics will help evaluate the agents' learning progress and the impact on traffic flow over time.

- **Save Model Checkpoints**  
  Periodically save model checkpoints during training to ensure progress is preserved, and to enable future testing or further training. This can be done using a callback function. 

### Outcome
With the PPO model implemented and training in progress, the traffic signals will gradually learn to manage traffic flow more efficiently, optimizing signal timing to reduce congestion.


In [None]:
# Implement and Train PPO Algorithm

---

## Evaluate Model Performance

### Objective
- Test the trained PPO model and evaluate its effectiveness in managing traffic flow.

### Step-by-Step Explanation
Once training is complete, it’s important to evaluate the PPO model’s performance by running it in the traffic simulation and comparing it with baseline methods or undertrained models.

- **Test the PPO Model**  
  Run the trained PPO model in the same SUMO simulation and observe how it manages traffic compared to traditional fixed-timing signals or other models.

- **Compare Metrics**  
  Measure key metrics like total vehicle waiting time, average vehicle speed, and queue lengths at intersections. Compare these results with the performance of standard traffic light systems to assess the benefits of RL.

- **Refine the Model**  
  If the results aren’t optimal, consider adjusting the reward function, adding more traffic conditions to the simulation, or retraining the agents with different hyperparameters.

### Outcome
By the end of this section, you’ll have a clear understanding of how well the PPO-trained agents are controlling traffic, and what improvements, if any, can be made.
