# Traffic Tune - Optimizing Traffic Signals with Reinforcement Learning

## Introduction

Welcome to the Traffic Tune POC notebook. Our project focused on optimizing traffic signal control using reinforcement learning. Traffic congestion is a major problem in urban areas, leading to increased travel times, fuel consumption, and pollution. Traditional traffic signal control systems often struggle to adapt to dynamic traffic conditions, resulting in suboptimal traffic flow.

Traffic Tune is a recommendation system that leverages reinforcement learning to dynamically adjust traffic signals at intersections. By learning from traffic patterns in real-time, Traffic Tune aims to improve traffic flow, reduce congestion, and enhance overall transportation efficiency.

In this POC, we will demonstrate how to train a reinforcement learning agent to optimize traffic signal control in a simulated environment. We will use the SUMO (Simulation of Urban MObility) traffic simulation tool and the Stable Baselines3 library to train a Deep Q-Network (DQN) agent to learn an optimal traffic signal control policy.


# Setup and Installations

In [None]:
import traci

import env_manager as env_manager
import algo_trainer as algo_trainer
from typing import SupportsIndex

In [None]:
def chain_training(manager: env_manager, generator: env_manager.EnvManager.env_generator, algo_agent, running_result: list):
    if len(running_result) != 0: 
        # take the best config from the previous training 
        best = running_result[-1].get_best_result("env_runners/episode_reward_max", "max")
        
        # Initialize the environment manager with new route file
        rou, csv = next(generator)
        manager.initialize_env(rou, csv)
        
        # continue the training with the best config
        algo_agent.config = best.config
        algo_agent.build_config()
    
    result = algo_agent.train()
    
    return result

def training(num_intersection: int, experiment_type: str, algo_config: str, env_config: str, num_training: SupportsIndex):
    running_result = []
    sumo_type = "SingleAgent"
    algo_type = experiment_type.split("_")
     
    if experiment_type.__contains__("Multi"):
        sumo_type = "MultiAgent"
    
    # Initialize the environment manager
    manager = env_manager.EnvManager(f"{sumo_type}Environment", env_config, intersection_id=f"intersection_{num_intersection}")
    generator = manager.env_generator(f"Nets/intersection_{num_intersection}/route_xml_path_intersection_{num_intersection}.txt", algo_name=algo_type[0])
    
    # Initialize the environment manager with new route file
    rou, csv = next(generator)
    manager.initialize_env(rou, csv)
    
    algo_agent = algo_trainer.ALGOTrainer(config_path=algo_config, env_manager=manager, experiment_type=experiment_type)
    algo_agent.build_config()

    for i in range(num_training):
        chain_result = chain_training(manager=manager, generator=generator, algo_agent=algo_agent, running_result=running_result)
        if chain_result is not None:
            running_result.append(chain_result)
    
    return running_result

In [None]:
num_intersection_to_train = 5  # Choose which intersection you want to train

# Choose the experiment_type:
# PPO_SingleAgent | DQN_SingleAgent | DDQN_SingleAgent | PPO_MultiAgent | DQN_MultiAgent | DDQN_MultiAgent
experiment_type = "DQN_SingleAgent"  

num_training_cycles = 1

env_config_file_path = "env_config.json"

ppo_config_file_path = "ppo_config.json"

dqn_config_file_path = "dqn_config.json"

In [None]:
results = training(num_intersection=num_intersection_to_train, experiment_type=experiment_type, algo_config=dqn_config_file_path, env_config=env_config_file_path, num_training=num_training_cycles)

In [None]:
"""
python script to send imessage when the training is done
you can write in var message the message you want to send
and in var recipient the recipient email or phone number with country code which connected to the icloud account
for now, it's manual but we can use it as a function which will be called whenever we want
"""

# import subprocess
# 
# def send_imessage(message, recipient):
#     apple_script = f'''
#     tell application "Messages"
#         set targetService to 1st service whose service type = iMessage
#         set targetBuddy to buddy "{recipient}" of targetService
#         send "{message}" to targetBuddy
#     end tell
#     '''
#     subprocess.run(['osascript', '-e', apple_script])
# 
# 
# # Notify when done
# message = 'if you got this message, the training is done! send whatsapp to matan'
# recipient = 'eviatar109@icloud.com'  # Replace with your iCloud email
# send_imessage(message, recipient)

In [None]:
import pandas as pd

result = results[0]
result1 = result[0]
custom_metrics = result1.metrics
print(custom_metrics)

# Save the results to a CSV file
df = pd.DataFrame(custom_metrics)
df.to_csv(f"Outputs/Training/intersection_{num_intersection_to_train}/experiments/{experiment_type}_intersection_{num_intersection_to_train}.csv")

 
# result1 = results_1[0]
# result1 = result1[0]
# print("DDQN\n",result1.metrics,"\n\n")
# 
# result2 = results_2[0]
# result2 = result2[0]
# print("PPO\n",result2.metrics)


In [None]:
import numpy as np
from ray.rllib.algorithms.algorithm import Algorithm
best_result = result.get_best_result("env_runners/episode_reward_max", "max")
checkpoint_path = best_result.checkpoint.path
algo = Algorithm.from_checkpoint(checkpoint_path)
eval_env = algo.env_creator({})


# Set up evaluation parameters
num_episodes = 4

# Evaluation loop
episode_rewards = []

for _ in range(num_episodes):
    episode_reward = 0
    done = False
    obse, _ = eval_env.reset()

    
    while not done:
        action = algo.compute_single_action(obse)
        obs, reward, terminated, truncated, info = eval_env.step(action)
        done = terminated or truncated
        episode_reward += reward

    
    episode_rewards.append(episode_reward)

# Calculate and print evaluation metrics
mean_reward = np.mean(episode_rewards)
std_reward = np.std(episode_rewards)

print(f"Evaluation over {num_episodes} episodes:")
print(f"Mean episode reward: {mean_reward:.2f} +/- {std_reward:.2f}")


# Plot the results 
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 5))
plt.plot(episode_rewards)
plt.title("Episode Rewards")
plt.xlabel("Episode")
plt.ylabel("Reward")

plt.tight_layout()
plt.show()

# 7. Clean up
eval_env.close()


In [None]:
print(eval_env)