# Traffic Tune - Optimizing Traffic Signals with Reinforcement Learning

## Introduction

Welcome to the Traffic Tune POC notebook. Our project focused on optimizing traffic signal control using reinforcement learning. Traffic congestion is a major problem in urban areas, leading to increased travel times, fuel consumption, and pollution. Traditional traffic signal control systems often struggle to adapt to dynamic traffic conditions, resulting in suboptimal traffic flow.

Traffic Tune is a recommendation system that leverages reinforcement learning to dynamically adjust traffic signals at intersections. By learning from traffic patterns in real-time, Traffic Tune aims to improve traffic flow, reduce congestion, and enhance overall transportation efficiency.

In this POC, we will demonstrate how to train a reinforcement learning agent to optimize traffic signal control in a simulated environment. We will use the SUMO (Simulation of Urban MObility) traffic simulation tool and the Stable Baselines3 library to train a Deep Q-Network (DQN) agent to learn an optimal traffic signal control policy.


# Setup and Installations

In [1]:
import env_manager as env_manager
import ppo_trainer as ppo_trainer

In [2]:
num_intersection_to_train = 4 # Choose which intersection you want to train

experiment_type = "SingleAgent" # Choose the experiment_type: SingleAgent | MultiAgent

env setup

In [3]:
manager = env_manager.EnvManager(f"{experiment_type}Environment", "env_config.json", json_id=f"intersection_{num_intersection_to_train}")
generator = manager.env_generator(f"Nets/intersection_{num_intersection_to_train}/route_xml_path_intersection_{num_intersection_to_train}.txt")
rou , csv = next(generator)
env_kwargs = manager.initialize_env(rou, csv)

print(f"\nEnv creat for intersection_{num_intersection_to_train}",
      "\nNet path:", manager.kwargs["net_file"],
      "\nRoute path:", rou,
      "\nCsv path:", csv)


Env creat for intersection_4 
Net path: Nets/intersection_4/intersection_4.net.xml 
Route path: Nets/intersection_4/routes_4/intersection_4_random_easy_1.rou.xml 
Csv path: Outputs/Training/intersection_4/experiments/intersection_4_random_easy_1_07.22-00:02:37


agent setup

In [4]:
ppo_agent = ppo_trainer.PPOTrainer("ppo_config.json", manager, experiment_type=experiment_type)
ppo_agent.build_config()

2024-07-22 00:02:39,024	INFO worker.py:1771 -- Started a local Ray instance.


<ray.rllib.algorithms.ppo.ppo.PPOConfig at 0x11118f760>

agent training

In [5]:
results = ppo_agent.train()



0,1
Current time:,2024-07-22 00:04:53
Running for:,00:02:14.15
Memory:,11.4/16.0 GiB

Trial name,status,loc,iter,total time (s),ts,num_healthy_workers,num_in_flight_async_ sample_reqs,num_remote_worker_re starts
PPO_PPO_9076e_00000,TERMINATED,127.0.0.1:33236,6,125.588,4320,1,0,0


2024-07-22 00:04:53,731	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/Users/eviat/Desktop/Final_Project/Traffic_Tune_Project/Outputs/Training/intersection_4/saved_agent/PPO_2024-07-22_00-02-39' in 0.0082s.
2024-07-22 00:05:03,483	INFO tune.py:1041 -- Total run time: 144.00 seconds (134.14 seconds for the tuning loop).


In [6]:
print(results.get_best_result())

Result(
  metrics={'evaluation': {'env_runners': {'episode_reward_max': -4.120000000000002, 'episode_reward_min': -4.120000000000002, 'episode_reward_mean': -4.120000000000002, 'episode_len_mean': 720.0, 'episode_media': {}, 'episodes_timesteps_total': 720, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [-4.120000000000002], 'episode_lengths': [720]}, 'sampler_perf': {'mean_raw_obs_processing_ms': 1.5980213402708496, 'mean_inference_ms': 0.3973026947456844, 'mean_action_processing_ms': 0.04702149549979571, 'mean_env_wait_ms': 13.809121538481767, 'mean_env_render_ms': 0.0}, 'num_faulty_episodes': 0, 'connector_metrics': {'ObsPreprocessorConnector_ms': 0.0021219253540039062, 'StateBufferConnector_ms': 0.0019311904907226562, 'ViewRequirementAgentConnector_ms': 0.043010711669921875}, 'num_episodes': 1, 'episode_return_max': -4.120000000000002, 'episode_return_min': -4.120000000000002, 'episode_return_mean':

agent prediction

In [None]:
ppo_agent.evaluate(results=results, kwargs=env_kwargs)

In [None]:
best = results.get_best_result("env_runners/episode_reward_max", "max")
print(best)

In [None]:
rou , csv = next(generator)
print(rou)
print(csv)
env_kwargs = manager.initialize_env(rou, csv)
print(env_kwargs)

In [None]:
ppo_agent.config = best.config
ppo_agent.build_config(env_kwargs)

In [None]:
result_2 = ppo_agent.train()