# Traffic Tune - Optimizing Traffic Signals with Reinforcement Learning

## Introduction

Welcome to the Traffic Tune POC notebook. Our project focused on optimizing traffic signal control using reinforcement learning. Traffic congestion is a major problem in urban areas, leading to increased travel times, fuel consumption, and pollution. Traditional traffic signal control systems often struggle to adapt to dynamic traffic conditions, resulting in suboptimal traffic flow.

Traffic Tune is a recommendation system that leverages reinforcement learning to dynamically adjust traffic signals at intersections. By learning from traffic patterns in real-time, Traffic Tune aims to improve traffic flow, reduce congestion, and enhance overall transportation efficiency.

In this POC, we will demonstrate how to train a reinforcement learning agent to optimize traffic signal control in a simulated environment. We will use the SUMO (Simulation of Urban MObility) traffic simulation tool and the Stable Baselines3 library to train a Deep Q-Network (DQN) agent to learn an optimal traffic signal control policy.


# Setup and Installations

In [1]:
import env_manager as env_manager
import ppo_trainer as ppo_trainer
import dqn_trainer as dqn_trainer

In [2]:
def chain_training(manager: env_manager, generator: env_manager.EnvManager.env_generator, algo_agent, running_result: list):
    if len(running_result) != 0: 
        # take the best config from the previous training 
        best = running_result[-1].get_best_result("env_runners/episode_reward_max", "max")
        
        # Initialize the environment manager with new route file
        rou, csv = next(generator)
        manager.initialize_env(rou, csv)
        
        # continue the training with the best config
        algo_agent.config = best.config
        algo_agent.build_config()
    
    result = algo_agent.train()
    
    return result

def training(num_intersection: int, experiment_type: str, algo_config: str, env_config: str, num_training_cycles: int):
    running_result = []
    algo_agent = None
    sumo_type = "SingleAgent"
    algo_type = experiment_type.split("_")
    
    if experiment_type.__contains__("Multi"):
        sumo_type = "MultiAgent"
    
    # Initialize the environment manager
    manager = env_manager.EnvManager(f"{sumo_type}Environment", env_config, json_id=f"intersection_{num_intersection}")
    generator = manager.env_generator(f"Nets/intersection_{num_intersection}/route_xml_path_intersection_{num_intersection}.txt", algo_name=algo_type[0])
    
    # Initialize the environment manager with new route file
    rou, csv = next(generator)
    manager.initialize_env(rou, csv)
    
    # Initialize the Algo agent
    if algo_config.startswith("ppo"):
        ppo_agent = ppo_trainer.PPOTrainer(config_path=algo_config, env_manager=manager, experiment_type=experiment_type)
        ppo_agent.build_config()
        algo_agent = ppo_agent
        
    elif algo_config.startswith("dqn"):
        dqn_agent = dqn_trainer.DQNTrainer(config_path=algo_config, env_manager=manager, experiment_type=experiment_type)
        dqn_agent.build_config()
        algo_agent = dqn_agent

    for i in range(num_training_cycles):
        chain_result = chain_training(manager=manager, generator=generator, algo_agent=algo_agent, running_result=running_result)
        if chain_result is not None:
            running_result.append(chain_result)
    
    return running_result

In [5]:
num_intersection_to_train = 1  # Choose which intersection you want to train

# Choose the experiment_type:
# PPO_SingleAgent | PPO_MultiAgent | DQN_SingleAgent | DDQN_SingleAgent | DQN_MultiAgent | DDQN_MultiAgent
experiment_type = "PPO_SingleAgent"  

num_training_cycles = 1

env_config_file_path = "env_config.json"

ppo_config_file_path = "ppo_config.json"

dqn_config_file_path = "dqn_config.json"

In [6]:
results = training(num_intersection=num_intersection_to_train, experiment_type=experiment_type, algo_config=ppo_config_file_path, env_config=env_config_file_path, num_training_cycles=num_training_cycles)



0,1
Current time:,2024-07-24 22:37:30
Running for:,00:02:21.13
Memory:,10.4/16.0 GiB

Trial name,status,loc,iter,total time (s),ts,num_healthy_workers,num_in_flight_async_ sample_reqs,num_remote_worker_re starts
PPO_PPO_d609e_00000,TERMINATED,127.0.0.1:3996,3,132.504,2160,1,0,0




[36m(RolloutWorker pid=3998)[0m  Retrying in 1 seconds


[36m(RolloutWorker pid=3998)[0m 2024-07-24 22:35:13,541	INFO policy.py:1272 -- Policy (worker=1) running on CPU.
[36m(RolloutWorker pid=3998)[0m 2024-07-24 22:35:13,541	INFO torch_policy_v2.py:111 -- Found 0 visible cuda devices.


[36m(RolloutWorker pid=3998)[0m Step #0.00 (0ms ?*RT. ?UPS, TraCI: 5ms, vehicles TOT 0 ACT 0 BUF 0)                      


[36m(PPO pid=3996)[0m 2024-07-24 22:35:14,018	INFO env_runner_group.py:333 -- Inferred observation/action spaces from remote worker (local worker has no env): {'default_policy': (Box(0.0, 1.0, (27,), float32), Discrete(4)), '__env__': (Box(0.0, 1.0, (27,), float32), Discrete(4))}
[36m(PPO pid=3996)[0m 2024-07-24 22:35:14,023	INFO policy.py:1272 -- Policy (worker=local) running on CPU.
[36m(RolloutWorker pid=3998)[0m 2024-07-24 22:35:14,014	INFO util.py:118 -- Using connectors:
[36m(RolloutWorker pid=3998)[0m 2024-07-24 22:35:14,014	INFO util.py:119 --     AgentConnectorPipeline
[36m(RolloutWorker pid=3998)[0m         ObsPreprocessorConnector
[36m(RolloutWorker pid=3998)[0m         StateBufferConnector
[36m(RolloutWorker pid=3998)[0m         ViewRequirementAgentConnector
[36m(RolloutWorker pid=3998)[0m 2024-07-24 22:35:14,014	INFO util.py:120 --     ActionConnectorPipeline
[36m(RolloutWorker pid=3998)[0m         ConvertToNumpyConnector
[36m(RolloutWorker pid=3998)[0m

[36m(RolloutWorker pid=3998)[0m  Retrying in 1 seconds[32m [repeated 2x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)[0m


[36m(RolloutWorker pid=4000)[0m 2024-07-24 22:35:17,020	INFO policy.py:1272 -- Policy (worker=1) running on CPU.
[36m(PPO pid=3996)[0m 2024-07-24 22:35:17,420	INFO torch_policy_v2.py:111 -- Found 0 visible cuda devices.[32m [repeated 3x across cluster][0m


[36m(RolloutWorker pid=3998)[0m Step #0.00 (1ms ~= 1000.00*RT, ~3000.00UPS, TraCI: 17ms, vehicles TOT 3 ACT 3 BUF 0)      
[36m(RolloutWorker pid=4000)[0m Step #0.00 (0ms ?*RT. ?UPS, TraCI: 6ms, vehicles TOT 0 ACT 0 BUF 0)                      


[36m(RolloutWorker pid=4000)[0m 2024-07-24 22:35:17,412	INFO util.py:118 -- Using connectors:[32m [repeated 3x across cluster][0m
[36m(RolloutWorker pid=4000)[0m 2024-07-24 22:35:17,412	INFO util.py:119 --     AgentConnectorPipeline[32m [repeated 3x across cluster][0m
[36m(RolloutWorker pid=4000)[0m         ObsPreprocessorConnector[32m [repeated 3x across cluster][0m
[36m(RolloutWorker pid=4000)[0m         StateBufferConnector[32m [repeated 3x across cluster][0m
[36m(RolloutWorker pid=4000)[0m         ViewRequirementAgentConnector[32m [repeated 3x across cluster][0m
[36m(RolloutWorker pid=4000)[0m 2024-07-24 22:35:17,412	INFO util.py:120 --     ActionConnectorPipeline[32m [repeated 3x across cluster][0m
[36m(RolloutWorker pid=4000)[0m         ConvertToNumpyConnector[32m [repeated 3x across cluster][0m
[36m(RolloutWorker pid=4000)[0m         NormalizeActionsConnector[32m [repeated 3x across cluster][0m
[36m(RolloutWorker pid=4000)[0m         ImmutableAc

Step #1100.00 (1ms ~= 1000.00*RT, ~249000.00UPS, TraCI: 20ms, vehicles TOT 1724 ACT 249 BU5ms, vehicles TOT 214 ACT 120 BUF 




Step #1800.00 (0ms ?*RT. ?UPS, TraCI: 22ms, vehicles TOT 2651 ACT 256 BUF 1475)            1829 ACT 214 BUF 914)            
Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 44ms, vehicles TOT 5100 ACT 251 BUF 3165)           23ms, vehicles TOT 2799 ACT 258 BU
[36m(RolloutWorker pid=3998)[0m  Retrying in 1 seconds


[36m(RolloutWorker pid=3998)[0m 2024-07-24 22:35:38,189	INFO rollout_worker.py:721 -- Completed sample batch:
[36m(RolloutWorker pid=3998)[0m 
[36m(RolloutWorker pid=3998)[0m { 'count': 720,
[36m(RolloutWorker pid=3998)[0m   'policy_batches': { 'default_policy': { 'action_dist_inputs': np.ndarray((720, 4), dtype=float32, min=-0.013, max=0.008, mean=-0.001),
[36m(RolloutWorker pid=3998)[0m                                           'action_logp': np.ndarray((720,), dtype=float32, min=-1.397, max=-1.377, mean=-1.386),
[36m(RolloutWorker pid=3998)[0m                                           'actions': np.ndarray((720,), dtype=int32, min=0.0, max=3.0, mean=1.536),
[36m(RolloutWorker pid=3998)[0m                                           'advantages': np.ndarray((720,), dtype=float32, min=-28.521, max=35.274, mean=-0.409),
[36m(RolloutWorker pid=3998)[0m                                           'agent_index': np.ndarray((720,), dtype=int64, min=0.0, max=0.0, mean=0.0),
[36

Step #400.00 (1ms ~= 1000.00*RT, ~262000.00UPS, TraCI: 21ms, vehicles TOT 775 ACT 262 BUF ACT 2 BUF 2)                      




Step #1400.00 (1ms ~= 1000.00*RT, ~334000.00UPS, TraCI: 29ms, vehicles TOT 2163 ACT 334 BU6ms, vehicles TOT 908 ACT 275 BUF 




Step #2400.00 (1ms ~= 1000.00*RT, ~307000.00UPS, TraCI: 26ms, vehicles TOT 3443 ACT 307 BU26ms, vehicles TOT 2260 ACT 307 BU
Step #3600.00 (1ms ~= 1000.00*RT, ~316000.00UPS, TraCI: 48ms, vehicles TOT 4995 ACT 316 BU28ms, vehicles TOT 3602 ACT 320 BU
[36m(RolloutWorker pid=3998)[0m  Retrying in 1 seconds


[36m(RolloutWorker pid=4000)[0m 2024-07-24 22:36:00,955	INFO rollout_worker.py:679 -- Generating sample batch of size 1


Step #200.00 (0ms ?*RT. ?UPS, TraCI: 19ms, vehicles TOT 419 ACT 214 BUF 47)                vehicles TOT 3 ACT 3 BUF 0)      




Step #500.00 (1ms ~= 1000.00*RT, ~248000.00UPS, TraCI: 20ms, vehicles TOT 870 ACT 248 BUF 9ms, vehicles TOT 544 ACT 215 BUF 




Step #1400.00 (1ms ~= 1000.00*RT, ~246000.00UPS, TraCI: 24ms, vehicles TOT 2050 ACT 246 BU2ms, vehicles TOT 988 ACT 272 BUF 
[36m(RolloutWorker pid=4000)[0m  Retrying in 1 seconds




Step #2700.00 (1ms ~= 1000.00*RT, ~290000.00UPS, TraCI: 24ms, vehicles TOT 3759 ACT 290 BU42ms, vehicles TOT 2188 ACT 239 BU
Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 39ms, vehicles TOT 4872 ACT 255 BUF 3491)           5ms, vehicles TOT 3835 ACT 278 BUF
[36m(RolloutWorker pid=4000)[0m  Retrying in 1 seconds


[36m(RolloutWorker pid=4000)[0m 2024-07-24 22:36:22,382	INFO rollout_worker.py:721 -- Completed sample batch:
[36m(RolloutWorker pid=4000)[0m 
[36m(RolloutWorker pid=4000)[0m { 'count': 720,
[36m(RolloutWorker pid=4000)[0m   'policy_batches': { 'default_policy': { 'advantages': np.ndarray((720,), dtype=float32, min=-21.403, max=26.065, mean=-0.407),
[36m(RolloutWorker pid=4000)[0m                                           'agent_index': np.ndarray((720,), dtype=int64, min=0.0, max=0.0, mean=0.0),
[36m(RolloutWorker pid=4000)[0m                                           'eps_id': np.ndarray((720,), dtype=int64, min=5.6189537864363936e+17, max=5.6189537864363936e+17, mean=5.618953786436394e+17),
[36m(RolloutWorker pid=4000)[0m                                           'infos': np.ndarray((720,), dtype=object, head={'step': 0.0, 'system_total_stopped': 0, 'system_total_waiting_time': 0, 'system_mean_waiting_time': 0.0, 'system_mean_speed': 0.0, 'cluster8610962215_8610962216_

Step #600.00 (1ms ~= 1000.00*RT, ~248000.00UPS, TraCI: 22ms, vehicles TOT 1055 ACT 248 BUFACT 2 BUF 0)                      
Step #3600.00 (0ms ?*RT. ?UPS, TraCI: 39ms, vehicles TOT 4945 ACT 266 BUF 3441)           1ms, vehicles TOT 1192 ACT 245 BUF
[36m(RolloutWorker pid=4000)[0m  Retrying in 1 seconds




Step #1100.00 (1ms ~= 1000.00*RT, ~250000.00UPS, TraCI: 22ms, vehicles TOT 1740 ACT 250 BUACT 4 BUF 0)                      
Step #3600.00 (1ms ~= 1000.00*RT, ~287000.00UPS, TraCI: 40ms, vehicles TOT 4781 ACT 287 BU 1826 ACT 226 BUF 878)            
[36m(RolloutWorker pid=4000)[0m  Retrying in 1 seconds


[36m(PPO pid=3996)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/Users/eviat/Desktop/Final_Project/Traffic_Tune_Project/Outputs/Training/intersection_1/saved_agent/PPO_2024-07-24_22-35-08/PPO_PPO_d609e_00000_0_2024-07-24_22-35-08/checkpoint_000001)


[36m(RolloutWorker pid=3998)[0m Step #0.00 (0ms ?*RT. ?UPS, TraCI: 9ms, vehicles TOT 0 ACT 0 BUF 0)                       




Step #1600.00 (1ms ~= 1000.00*RT, ~290000.00UPS, TraCI: 31ms, vehicles TOT 2362 ACT 290 BU226 ACT 142 BUF 10)               
Step #1900.00 (2ms ~= 500.00*RT, ~160500.00UPS, TraCI: 32ms, vehicles TOT 2739 ACT 321 BUF33ms, vehicles TOT 2472 ACT 303 BU




Step #2800.00 (2ms ~= 500.00*RT, ~152000.00UPS, TraCI: 32ms, vehicles TOT 3834 ACT 304 BUF35ms, vehicles TOT 2823 ACT 314 BU




Step #3100.00 (1ms ~= 1000.00*RT, ~314000.00UPS, TraCI: 30ms, vehicles TOT 4253 ACT 314 BU0ms, vehicles TOT 3990 ACT 312 BUF
Step #3600.00 (2ms ~= 500.00*RT, ~154500.00UPS, TraCI: 52ms, vehicles TOT 4944 ACT 309 BUFACT 1 BUF 0)                      
[36m(RolloutWorker pid=4000)[0m  Retrying in 1 seconds
Step #3600.00 (1ms ~= 1000.00*RT, ~338000.00UPS, TraCI: 44ms, vehicles TOT 5048 ACT 338 BU31ms, vehicles TOT 4383 ACT 313 BU


2024-07-24 22:37:30,001	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/Users/eviat/Desktop/Final_Project/Traffic_Tune_Project/Outputs/Training/intersection_1/saved_agent/PPO_2024-07-24_22-35-08' in 0.0068s.
[36m(PPO pid=3996)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/Users/eviat/Desktop/Final_Project/Traffic_Tune_Project/Outputs/Training/intersection_1/saved_agent/PPO_2024-07-24_22-35-08/PPO_PPO_d609e_00000_0_2024-07-24_22-35-08/checkpoint_000002)
2024-07-24 22:37:30,445	INFO tune.py:1041 -- Total run time: 141.60 seconds (141.12 seconds for the tuning loop).


Step #5.00 (0ms ?*RT. ?UPS, TraCI: 656ms, vehicles TOT 9 ACT 9 BUF 1)                     ACT 1 BUF 0)                      


In [None]:
print(results)