# Traffic Tune - Optimizing Traffic Signals with Reinforcement Learning

## Introduction

Welcome to the Traffic Tune POC notebook. Our project focused on optimizing traffic signal control using reinforcement learning. Traffic congestion is a major problem in urban areas, leading to increased travel times, fuel consumption, and pollution. Traditional traffic signal control systems often struggle to adapt to dynamic traffic conditions, resulting in suboptimal traffic flow.

Traffic Tune is a recommendation system that leverages reinforcement learning to dynamically adjust traffic signals at intersections. By learning from traffic patterns in real-time, Traffic Tune aims to improve traffic flow, reduce congestion, and enhance overall transportation efficiency.

In this POC, we will demonstrate how to train a reinforcement learning agent to optimize traffic signal control in a simulated environment. We will use the SUMO (Simulation of Urban MObility) traffic simulation tool and the Stable Baselines3 library to train a Deep Q-Network (DQN) agent to learn an optimal traffic signal control policy.


# Setup and Installations

In [1]:
import env_manager as env_manager
import algo_trainer as algo_trainer
from typing import SupportsIndex

In [2]:
def chain_training(manager: env_manager, generator: env_manager.EnvManager.env_generator, algo_agent, running_result: list):
    if len(running_result) != 0: 
        # take the best config from the previous training 
        best = running_result[-1].get_best_result("env_runners/episode_reward_max", "max")
        
        # Initialize the environment manager with new route file
        rou, csv = next(generator)
        manager.initialize_env(rou, csv)
        
        # continue the training with the best config
        algo_agent.config = best.config
        algo_agent.build_config()
    
    result = algo_agent.train()
    
    return result

def training(num_intersection: int, experiment_type: str, algo_config: str, env_config: str, num_training: SupportsIndex):
    running_result = []
    sumo_type = "SingleAgent"
    algo_type = experiment_type.split("_")
     
    if experiment_type.__contains__("Multi"):
        sumo_type = "MultiAgent"
    
    # Initialize the environment manager
    manager = env_manager.EnvManager(f"{sumo_type}Environment", env_config, json_id=f"intersection_{num_intersection}")
    generator = manager.env_generator(f"Nets/intersection_{num_intersection}/route_xml_path_intersection_{num_intersection}.txt", algo_name=algo_type[0])
    
    # Initialize the environment manager with new route file
    rou, csv = next(generator)
    manager.initialize_env(rou, csv)
    
    algo_agent = algo_trainer.ALGOTrainer(config_path=algo_config, env_manager=manager, experiment_type=experiment_type)
    algo_agent.build_config()

    for i in range(num_training):
        chain_result = chain_training(manager=manager, generator=generator, algo_agent=algo_agent, running_result=running_result)
        if chain_result is not None:
            running_result.append(chain_result)
    
    return running_result

In [3]:
num_intersection_to_train = 2  # Choose which intersection you want to train

# Choose the experiment_type:
# PPO_SingleAgent | PPO_MultiAgent | DQN_SingleAgent | DDQN_SingleAgent | DQN_MultiAgent | DDQN_MultiAgent
experiment_type = "DQN_SingleAgent"  

num_training_cycles = 1

env_config_file_path = "env_config.json"

ppo_config_file_path = "ppo_config.json"

dqn_config_file_path = "dqn_config.json"

In [4]:
results = training(num_intersection=num_intersection_to_train, experiment_type=experiment_type, algo_config=dqn_config_file_path, env_config=env_config_file_path, num_training=num_training_cycles)
experiment_type = "DDQN_SingleAgent"
results_1 = training(num_intersection=num_intersection_to_train, experiment_type=experiment_type, algo_config=dqn_config_file_path, env_config=env_config_file_path, num_training=num_training_cycles)
experiment_type = "PPO_SingleAgent"
results_2 = training(num_intersection=num_intersection_to_train, experiment_type=experiment_type, algo_config=ppo_config_file_path, env_config=env_config_file_path, num_training=num_training_cycles)



0,1
Current time:,2024-07-25 13:58:45
Running for:,00:01:47.35
Memory:,10.7/16.0 GiB

Trial name,status,loc,iter,total time (s),ts,num_healthy_workers,num_in_flight_async_ sample_reqs,num_remote_worker_re starts
PPO_PPO_9d2a3_00000,TERMINATED,127.0.0.1:7621,5,98.5591,3600,1,0,0




[36m(RolloutWorker pid=7622)[0m  Retrying in 1 seconds
[36m(RolloutWorker pid=7622)[0m Step #0.00 (0ms ?*RT. ?UPS, TraCI: 6ms, vehicles TOT 0 ACT 0 BUF 0)                      
[36m(RolloutWorker pid=7630)[0m  Retrying in 1 seconds


[36m(PPO pid=7621)[0m Install gputil for GPU system monitoring.


Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 30ms, vehicles TOT 3632 ACT 48 BUF 1)               ACT 0 BUF 0)                      
[36m(RolloutWorker pid=7630)[0m Step #0.00 (0ms ?*RT. ?UPS, TraCI: 4ms, vehicles TOT 0 ACT 0 BUF 0)                      
[36m(RolloutWorker pid=7622)[0m  Retrying in 1 seconds[32m [repeated 2x across cluster][0m


[36m(PPO pid=7621)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/Users/eviat/Desktop/Final_Project/Traffic_Tune_Project/Outputs/Training/intersection_2/saved_agent/PPO_2024-07-25_13-56-58/PPO_PPO_9d2a3_00000_0_2024-07-25_13-56-58/checkpoint_000000)


Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 36ms, vehicles TOT 3590 ACT 44 BUF 0)               ACT 0 BUF 0)                      
[36m(RolloutWorker pid=7622)[0m  Retrying in 1 seconds[32m [repeated 2x across cluster][0m
Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 23ms, vehicles TOT 3513 ACT 30 BUF 2)               CT 0 BUF 0)                       
Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 26ms, vehicles TOT 3547 ACT 51 BUF 2)               ACT 1 BUF 0)                      
[36m(RolloutWorker pid=7630)[0m  Retrying in 1 seconds[32m [repeated 2x across cluster][0m
Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 23ms, vehicles TOT 3680 ACT 31 BUF 0)               ACT 1 BUF 0)                      
[36m(RolloutWorker pid=7630)[0m  Retrying in 1 seconds
Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 27ms, vehicles TOT 3649 ACT 47 BUF 0)               ACT 0 BUF 0)                      
[36m(RolloutWorker pid=7630)[0m  Retrying in 1 seconds
Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 25ms, vehicles TOT 3529 ACT 36 BU

[36m(PPO pid=7621)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/Users/eviat/Desktop/Final_Project/Traffic_Tune_Project/Outputs/Training/intersection_2/saved_agent/PPO_2024-07-25_13-56-58/PPO_PPO_9d2a3_00000_0_2024-07-25_13-56-58/checkpoint_000001)


Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 32ms, vehicles TOT 3540 ACT 46 BUF 9)               ACT 1 BUF 0)                      
[36m(RolloutWorker pid=7622)[0m  Retrying in 1 seconds
Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 25ms, vehicles TOT 3659 ACT 50 BUF 3)               ACT 2 BUF 0)                      


[36m(PPO pid=7621)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/Users/eviat/Desktop/Final_Project/Traffic_Tune_Project/Outputs/Training/intersection_2/saved_agent/PPO_2024-07-25_13-56-58/PPO_PPO_9d2a3_00000_0_2024-07-25_13-56-58/checkpoint_000002)


Step #3600.00 (1ms ~= 1000.00*RT, ~44000.00UPS, TraCI: 31ms, vehicles TOT 3524 ACT 44 BUF ACT 2 BUF 0)                      
[36m(RolloutWorker pid=7622)[0m  Retrying in 1 seconds[32m [repeated 2x across cluster][0m
Step #3600.00 (1ms ~= 1000.00*RT, ~31000.00UPS, TraCI: 24ms, vehicles TOT 3525 ACT 31 BUF ACT 2 BUF 0)                      
Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 24ms, vehicles TOT 3677 ACT 46 BUF 19)              ACT 1 BUF 0)                      
[36m(RolloutWorker pid=7630)[0m  Retrying in 1 seconds[32m [repeated 2x across cluster][0m
Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 23ms, vehicles TOT 3533 ACT 36 BUF 0)               ACT 0 BUF 0)                      
[36m(RolloutWorker pid=7630)[0m  Retrying in 1 seconds
Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 25ms, vehicles TOT 3703 ACT 37 BUF 8)               CT 0 BUF 0)                       
[36m(RolloutWorker pid=7630)[0m  Retrying in 1 seconds


[36m(PPO pid=7621)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/Users/eviat/Desktop/Final_Project/Traffic_Tune_Project/Outputs/Training/intersection_2/saved_agent/PPO_2024-07-25_13-56-58/PPO_PPO_9d2a3_00000_0_2024-07-25_13-56-58/checkpoint_000003)


Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 29ms, vehicles TOT 3582 ACT 40 BUF 0)               ACT 1 BUF 0)                      
[36m(RolloutWorker pid=7630)[0m  Retrying in 1 seconds
Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 29ms, vehicles TOT 3696 ACT 44 BUF 0)               ACT 1 BUF 0)                      
[36m(RolloutWorker pid=7622)[0m  Retrying in 1 seconds


[36m(PPO pid=7621)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/Users/eviat/Desktop/Final_Project/Traffic_Tune_Project/Outputs/Training/intersection_2/saved_agent/PPO_2024-07-25_13-56-58/PPO_PPO_9d2a3_00000_0_2024-07-25_13-56-58/checkpoint_000004)
2024-07-25 13:58:45,879	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/Users/eviat/Desktop/Final_Project/Traffic_Tune_Project/Outputs/Training/intersection_2/saved_agent/PPO_2024-07-25_13-56-58' in 0.0082s.
2024-07-25 13:58:46,457	INFO tune.py:1041 -- Total run time: 107.94 seconds (107.34 seconds for the tuning loop).


Step #5.00 (0ms ?*RT. ?UPS, TraCI: 782ms, vehicles TOT 10 ACT 10 BUF 0)                   ACT 1 BUF 0)                      


In [8]:
# Assuming the result is stored in a variable named `result_grid`
result = results[0]
result = result[0]
print("DQN\n",result.metrics.get("evaluation", {}),"\n\n")

result1 = results_1[0]
result1 = result1[0]
print("DDQN\n",result1.metrics.get("evaluation", {}),"\n\n")

result2 = results_2[0]
result2 = result2[0]
print("PPO\n",result2.metrics.get("evaluation", {}),"\n\n")



DQN
 {'env_runners': {'episode_reward_max': -0.17000000000000018, 'episode_reward_min': -1.8899999999999997, 'episode_reward_mean': -0.8925000000000006, 'episode_len_mean': 720.0, 'episode_media': {}, 'episodes_timesteps_total': 2880, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [-0.17000000000000018, -1.8899999999999997, -1.050000000000001, -0.46000000000000113], 'episode_lengths': [720, 720, 720, 720]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 1.6125261472809571, 'mean_inference_ms': 0.431369241189615, 'mean_action_processing_ms': 0.04417968598327324, 'mean_env_wait_ms': 41.01937919259143, 'mean_env_render_ms': 0.0}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': 0.0025212764739990234, 'StateBufferConnector_ms': 0.0018894672393798828, 'ViewRequirementAgentConnector_ms': 0.03832578659057617}, 'num_episodes': 4, 'episode_return_max': -0.17000000000000018, 'episode