# Traffic Tune - Optimizing Traffic Signals with Reinforcement Learning

## Introduction

Welcome to the Traffic Tune POC notebook. Our project focused on optimizing traffic signal control using reinforcement learning. Traffic congestion is a major problem in urban areas, leading to increased travel times, fuel consumption, and pollution. Traditional traffic signal control systems often struggle to adapt to dynamic traffic conditions, resulting in suboptimal traffic flow.

Traffic Tune is a recommendation system that leverages reinforcement learning to dynamically adjust traffic signals at intersections. By learning from traffic patterns in real-time, Traffic Tune aims to improve traffic flow, reduce congestion, and enhance overall transportation efficiency.

In this POC, we will demonstrate how to train a reinforcement learning agent to optimize traffic signal control in a simulated environment. We will use the SUMO (Simulation of Urban MObility) traffic simulation tool and the Stable Baselines3 library to train a Deep Q-Network (DQN) agent to learn an optimal traffic signal control policy.


# Setup and Installations

In [1]:
import env_manager as env_manager
import ppo_trainer as ppo_trainer
import dqn_trainer as dqn_trainer

In [2]:
def chain_training(manager: env_manager, generator: env_manager.EnvManager.env_generator, algo_agent, running_result: list):
    if len(running_result) != 0: 
        # take the best config from the previous training 
        best = running_result[-1].get_best_result("env_runners/episode_reward_max", "max")
        
        # Initialize the environment manager with new route file
        rou, csv = next(generator)
        manager.initialize_env(rou, csv)
        
        # continue the training with the best config
        algo_agent.config = best.config
        algo_agent.build_config()
    
    result = algo_agent.train()
    
    return result

def training(num_intersection: int, experiment_type: str, algo_config: str, env_config: str, num_training_cycles: int):
    running_result = []
    algo_agent = None
    sumo_type = "SingleAgent"
    algo_type = experiment_type.split("_")
    
    if experiment_type.__contains__("Multi"):
        sumo_type = "MultiAgent"
    
    # Initialize the environment manager
    manager = env_manager.EnvManager(f"{sumo_type}Environment", env_config, json_id=f"intersection_{num_intersection}")
    generator = manager.env_generator(f"Nets/intersection_{num_intersection}/route_xml_path_intersection_{num_intersection}.txt", algo_name=algo_type[0])
    
    # Initialize the environment manager with new route file
    rou, csv = next(generator)
    manager.initialize_env(rou, csv)
    
    # Initialize the Algo agent
    if algo_config.startswith("ppo"):
        ppo_agent = ppo_trainer.PPOTrainer(config_path=algo_config, env_manager=manager, experiment_type=experiment_type)
        ppo_agent.build_config()
        algo_agent = ppo_agent
        
    elif algo_config.startswith("dqn"):
        dqn_agent = dqn_trainer.DQNTrainer(config_path=algo_config, env_manager=manager, experiment_type=experiment_type)
        dqn_agent.build_config()
        algo_agent = dqn_agent

    for i in range(num_training_cycles):
        chain_result = chain_training(manager=manager, generator=generator, algo_agent=algo_agent, running_result=running_result)
        if chain_result is not None:
            running_result.append(chain_result)
    
    return running_result

In [3]:
num_intersection_to_train = 3  # Choose which intersection you want to train

# Choose the experiment_type:
# PPO_SingleAgent | PPO_MultiAgent | DQN_SingleAgent | DDQN_SingleAgent | DQN_MultiAgent | DDQN_MultiAgent
experiment_type = "PPO_SingleAgent"  

num_training_cycles = 1

env_config_file_path = "env_config.json"

ppo_config_file_path = "ppo_config.json"

dqn_config_file_path = "dqn_config.json"

In [4]:
results = training(num_intersection=num_intersection_to_train, experiment_type=experiment_type, algo_config=ppo_config_file_path, env_config=env_config_file_path, num_training_cycles=num_training_cycles)



0,1
Current time:,2024-07-25 00:23:41
Running for:,00:00:29.38
Memory:,10.4/16.0 GiB

Trial name,status,loc,iter,total time (s),ts,num_healthy_workers,num_in_flight_async_ sample_reqs,num_remote_worker_re starts
PPO_PPO_ee2ea_00000,TERMINATED,127.0.0.1:4653,3,24.123,2160,1,0,0




[36m(RolloutWorker pid=4654)[0m  Retrying in 1 seconds


[36m(RolloutWorker pid=4654)[0m 2024-07-25 00:23:16,101	INFO policy.py:1272 -- Policy (worker=1) running on CPU.
[36m(RolloutWorker pid=4654)[0m 2024-07-25 00:23:16,101	INFO torch_policy_v2.py:111 -- Found 0 visible cuda devices.


[36m(RolloutWorker pid=4654)[0m Step #0.00 (0ms ?*RT. ?UPS, TraCI: 5ms, vehicles TOT 0 ACT 0 BUF 0)                      


[36m(PPO pid=4653)[0m 2024-07-25 00:23:16,636	INFO env_runner_group.py:333 -- Inferred observation/action spaces from remote worker (local worker has no env): {'default_policy': (Box(0.0, 1.0, (26,), float32), Discrete(3)), '__env__': (Box(0.0, 1.0, (26,), float32), Discrete(3))}
[36m(PPO pid=4653)[0m 2024-07-25 00:23:16,640	INFO policy.py:1272 -- Policy (worker=local) running on CPU.
[36m(RolloutWorker pid=4654)[0m 2024-07-25 00:23:16,632	INFO util.py:118 -- Using connectors:
[36m(RolloutWorker pid=4654)[0m 2024-07-25 00:23:16,632	INFO util.py:119 --     AgentConnectorPipeline
[36m(RolloutWorker pid=4654)[0m         ObsPreprocessorConnector
[36m(RolloutWorker pid=4654)[0m         StateBufferConnector
[36m(RolloutWorker pid=4654)[0m         ViewRequirementAgentConnector
[36m(RolloutWorker pid=4654)[0m 2024-07-25 00:23:16,632	INFO util.py:120 --     ActionConnectorPipeline
[36m(RolloutWorker pid=4654)[0m         ConvertToNumpyConnector
[36m(RolloutWorker pid=4654)[0m

[36m(RolloutWorker pid=4654)[0m  Retrying in 1 seconds




[36m(RolloutWorker pid=4654)[0m Step #0.00 (0ms ?*RT. ?UPS, TraCI: 18ms, vehicles TOT 0 ACT 0 BUF 0)                      




Step #300.00 (0ms ?*RT. ?UPS, TraCI: 9ms, vehicles TOT 338 ACT 53 BUF 0)                  15 ACT 45 BUF 0)                  




Step #600.00 (1ms ~= 1000.00*RT, ~54000.00UPS, TraCI: 8ms, vehicles TOT 666 ACT 54 BUF 0) 46 ACT 48 BUF 1)                  




Step #800.00 (0ms ?*RT. ?UPS, TraCI: 9ms, vehicles TOT 917 ACT 65 BUF 0)                  s, vehicles TOT 790 ACT 52 BUF 0) 




[36m(RolloutWorker pid=4654)[0m Step #900.00 (0ms ?*RT. ?UPS, TraCI: 8ms, vehicles TOT 1030 ACT 49 BUF 1)                 
[36m(RolloutWorker pid=4654)[0m Step #1000.00 (0ms ?*RT. ?UPS, TraCI: 7ms, vehicles TOT 1142 ACT 43 BUF 0)                




Step #1200.00 (0ms ?*RT. ?UPS, TraCI: 6ms, vehicles TOT 1358 ACT 34 BUF 0)                1256 ACT 41 BUF 0)                




[36m(RolloutWorker pid=4654)[0m Step #1300.00 (1ms ~= 1000.00*RT, ~64000.00UPS, TraCI: 10ms, vehicles TOT 1477 ACT 64 BUF 




[36m(RolloutWorker pid=4654)[0m Step #1400.00 (0ms ?*RT. ?UPS, TraCI: 7ms, vehicles TOT 1593 ACT 56 BUF 0)                
[36m(RolloutWorker pid=4654)[0m Step #1500.00 (0ms ?*RT. ?UPS, TraCI: 8ms, vehicles TOT 1725 ACT 52 BUF 1)                




[36m(RolloutWorker pid=4654)[0m Step #1600.00 (0ms ?*RT. ?UPS, TraCI: 7ms, vehicles TOT 1840 ACT 40 BUF 2)                


[36m(PPO pid=4653)[0m 2024-07-25 00:23:16,640	INFO torch_policy_v2.py:111 -- Found 0 visible cuda devices.


Step #1800.00 (0ms ?*RT. ?UPS, TraCI: 9ms, vehicles TOT 2063 ACT 58 BUF 0)                1938 ACT 35 BUF 1)                
Step #2200.00 (0ms ?*RT. ?UPS, TraCI: 8ms, vehicles TOT 2525 ACT 48 BUF 0)                2165 ACT 48 BUF 0)                


[36m(PPO pid=4653)[0m 2024-07-25 00:23:17,024	INFO util.py:118 -- Using connectors:
[36m(PPO pid=4653)[0m 2024-07-25 00:23:17,024	INFO util.py:119 --     AgentConnectorPipeline
[36m(PPO pid=4653)[0m         ObsPreprocessorConnector
[36m(PPO pid=4653)[0m         StateBufferConnector
[36m(PPO pid=4653)[0m         ViewRequirementAgentConnector
[36m(PPO pid=4653)[0m 2024-07-25 00:23:17,024	INFO util.py:120 --     ActionConnectorPipeline
[36m(PPO pid=4653)[0m         ConvertToNumpyConnector
[36m(PPO pid=4653)[0m         NormalizeActionsConnector
[36m(PPO pid=4653)[0m         ImmutableActionsConnector


[36m(RolloutWorker pid=4654)[0m Step #2300.00 (0ms ?*RT. ?UPS, TraCI: 8ms, vehicles TOT 2654 ACT 41 BUF 0)                
[36m(RolloutWorker pid=4654)[0m Step #2400.00 (1ms ~= 1000.00*RT, ~46000.00UPS, TraCI: 7ms, vehicles TOT 2754 ACT 46 BUF 0




[36m(RolloutWorker pid=4654)[0m Step #2500.00 (0ms ?*RT. ?UPS, TraCI: 8ms, vehicles TOT 2892 ACT 57 BUF 1)                




[36m(RolloutWorker pid=4654)[0m Step #2600.00 (1ms ~= 1000.00*RT, ~36000.00UPS, TraCI: 7ms, vehicles TOT 3002 ACT 36 BUF 0
Step #3600.00 (1ms ~= 1000.00*RT, ~43000.00UPS, TraCI: 27ms, vehicles TOT 4185 ACT 43 BUF  3122 ACT 48 BUF 0)               
[36m(RolloutWorker pid=4654)[0m  Retrying in 1 seconds


[36m(RolloutWorker pid=4654)[0m 2024-07-25 00:23:25,495	INFO rollout_worker.py:721 -- Completed sample batch:
[36m(RolloutWorker pid=4654)[0m 
[36m(RolloutWorker pid=4654)[0m { 'count': 720,
[36m(RolloutWorker pid=4654)[0m   'policy_batches': { 'default_policy': { 'action_dist_inputs': np.ndarray((720, 3), dtype=float32, min=-0.004, max=0.008, mean=0.0),
[36m(RolloutWorker pid=4654)[0m                                           'action_logp': np.ndarray((720,), dtype=float32, min=-1.103, max=-1.092, mean=-1.099),
[36m(RolloutWorker pid=4654)[0m                                           'actions': np.ndarray((720,), dtype=int32, min=0.0, max=2.0, mean=1.001),
[36m(RolloutWorker pid=4654)[0m                                           'advantages': np.ndarray((720,), dtype=float32, min=-3.169, max=6.876, mean=-0.001),
[36m(RolloutWorker pid=4654)[0m                                           'agent_index': np.ndarray((720,), dtype=int64, min=0.0, max=0.0, mean=0.0),
[36m(Rol

Step #100.00 (0ms ?*RT. ?UPS, TraCI: 5ms, vehicles TOT 123 ACT 50 BUF 3)                  ACT 1 BUF 0)                      




Step #300.00 (0ms ?*RT. ?UPS, TraCI: 7ms, vehicles TOT 377 ACT 49 BUF 0)                  46 ACT 49 BUF 0)                  




[36m(RolloutWorker pid=4654)[0m Step #400.00 (0ms ?*RT. ?UPS, TraCI: 9ms, vehicles TOT 496 ACT 51 BUF 0)                  




Step #900.00 (0ms ?*RT. ?UPS, TraCI: 8ms, vehicles TOT 1107 ACT 50 BUF 2)                 26 ACT 55 BUF 0)                  




Step #1500.00 (0ms ?*RT. ?UPS, TraCI: 8ms, vehicles TOT 1766 ACT 49 BUF 3)                1218 ACT 44 BUF 0)                




Step #1700.00 (1ms ~= 1000.00*RT, ~43000.00UPS, TraCI: 6ms, vehicles TOT 1992 ACT 43 BUF 01889 ACT 52 BUF 0)                




Step #2400.00 (0ms ?*RT. ?UPS, TraCI: 9ms, vehicles TOT 2781 ACT 44 BUF 0)                2107 ACT 32 BUF 0)                
Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 29ms, vehicles TOT 4214 ACT 48 BUF 0)               2893 ACT 55 BUF 0)                
[36m(RolloutWorker pid=4654)[0m  Retrying in 1 seconds


[36m(PPO pid=4653)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/Users/eviat/Desktop/Final_Project/Traffic_Tune_Project/Outputs/Training/intersection_3/saved_agent/PPO_2024-07-25_00-23-11/PPO_PPO_ee2ea_00000_0_2024-07-25_00-23-11/checkpoint_000001)


[36m(RolloutWorker pid=4654)[0m Step #0.00 (0ms ?*RT. ?UPS, TraCI: 13ms, vehicles TOT 0 ACT 0 BUF 0)                      




Step #400.00 (0ms ?*RT. ?UPS, TraCI: 10ms, vehicles TOT 510 ACT 70 BUF 0)                 25 ACT 47 BUF 0)                  




Step #900.00 (0ms ?*RT. ?UPS, TraCI: 9ms, vehicles TOT 1112 ACT 54 BUF 1)                 21 ACT 44 BUF 0)                  




[36m(RolloutWorker pid=4654)[0m Step #1000.00 (0ms ?*RT. ?UPS, TraCI: 8ms, vehicles TOT 1228 ACT 44 BUF 1)                




Step #1200.00 (0ms ?*RT. ?UPS, TraCI: 8ms, vehicles TOT 1454 ACT 57 BUF 0)                1336 ACT 38 BUF 0)                




Step #1400.00 (0ms ?*RT. ?UPS, TraCI: 9ms, vehicles TOT 1719 ACT 61 BUF 0)                6ms, vehicles TOT 1588 ACT 90 BUF 




Step #1700.00 (0ms ?*RT. ?UPS, TraCI: 7ms, vehicles TOT 2064 ACT 52 BUF 0)                1842 ACT 49 BUF 1)                




Step #1900.00 (0ms ?*RT. ?UPS, TraCI: 8ms, vehicles TOT 2290 ACT 56 BUF 0)                2174 ACT 49 BUF 1)                




Step #2200.00 (0ms ?*RT. ?UPS, TraCI: 8ms, vehicles TOT 2642 ACT 47 BUF 0)                2400 ACT 36 BUF 2)                




[36m(RolloutWorker pid=4654)[0m Step #2300.00 (0ms ?*RT. ?UPS, TraCI: 8ms, vehicles TOT 2775 ACT 56 BUF 1)                




Step #2700.00 (0ms ?*RT. ?UPS, TraCI: 8ms, vehicles TOT 3278 ACT 61 BUF 0)                2883 ACT 34 BUF 0)                
[36m(RolloutWorker pid=4654)[0m Step #2800.00 (0ms ?*RT. ?UPS, TraCI: 6ms, vehicles TOT 3389 ACT 38 BUF 0)                




Step #3100.00 (0ms ?*RT. ?UPS, TraCI: 8ms, vehicles TOT 3755 ACT 51 BUF 1)                0ms, vehicles TOT 3520 ACT 61 BUF 




Step #3500.00 (1ms ~= 1000.00*RT, ~48000.00UPS, TraCI: 7ms, vehicles TOT 4235 ACT 48 BUF 03867 ACT 41 BUF 1)                
[36m(RolloutWorker pid=4654)[0m Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 28ms, vehicles TOT 4353 ACT 44 BUF 1)               
[36m(RolloutWorker pid=4654)[0m  Retrying in 1 seconds


2024-07-25 00:23:41,214	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/Users/eviat/Desktop/Final_Project/Traffic_Tune_Project/Outputs/Training/intersection_3/saved_agent/PPO_2024-07-25_00-23-11' in 0.0058s.
[36m(PPO pid=4653)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/Users/eviat/Desktop/Final_Project/Traffic_Tune_Project/Outputs/Training/intersection_3/saved_agent/PPO_2024-07-25_00-23-11/PPO_PPO_ee2ea_00000_0_2024-07-25_00-23-11/checkpoint_000002)
2024-07-25 00:23:41,650	INFO tune.py:1041 -- Total run time: 29.85 seconds (29.38 seconds for the tuning loop).


Step #5.00 (0ms ?*RT. ?UPS, TraCI: 664ms, vehicles TOT 2 ACT 2 BUF 0)                     ACT 2 BUF 0)                      


In [None]:
print(results)