In Intelligent Transportation Systems (ITS), Deep Reinforcement Learning (DRL) is being rapidly adopted in many complex environments due to its ability to leverage neural networks to learn good strategies. A centralized TSC controller with a deep RL agent (DRL-agent) is trained by a novel deep dueling on-policy learning method referred to as 2DSARSA.
In Traffic Signal Control (TSC), there has been a number of research efforts to apply ML techniques in general and Reinforcement Learning (RL) in particular, to optimize TSC. However, there is insufficient study on the comparison between deep on-policy (DSARSA) and deep off-policy (DQN) in the context of TSC. This work takes the first step in addressing the important gap in the learning performance in these two fundamental approaches for TSC researchers. Our preliminary work have shown that DQN and 3DQN perform unstably in a complex environment when the state and action space are extremely large. To address this issue, overall, this work makes three significant contributions on the important and challenging topic of applying RL in TSC, namely, 1) a first comparison of two fundamental deep reinforcement learning approaches, i.e., on-policy learning and off-policy learning, 2) a novel way of representing the state of the environment using traffic flow maps, and 3) an intuitive yet novel rewards function using the power metric that co-optimizes the network throughput and the end-to-end delay.
One contribution of this work relates to the design reinforcement learning algorithm. As mentioned before, deep RL methods that use neural networks, can be broadly classified into off-policy and on-policy methods. We have proposed and designed a novel on-policy deep RL agent for a centralized controller for a network of TSC that incorporate traffic flow maps as the state description, the power metric based reward function, Dueling Neural Network Architectures, and Experience Replay Memory to improve traffic signal control. The results show that the RL agent can better understand an environment, effectively learn from environmental feedback through the reward function and the learning process converges faster than many existing algorithms. In addition, the RL agent outperforms traditional BP-based algorithms and well-known deep off-policy based RL agents.
One fundamental problem in applying RL to a network of traffic intersections is the state space explosion problem. The state space grows exponentially both with increase in the fidelity of the state description as well as with the number of traffic intersections. Another important contribution of this paper is to address this problem by describing the state using TFMs. In TFM, the state variables which can be real numbers (such as the waiting time of the Head-of-Line vehicle) are mapped into color map to transform the state into an image. This allows the state to be described with arbitrary high fidelity and capture and store dynamic traffic flows for a network of multiple intersections. We propose TFMs that capture head-of-the-line (HOL) sojourn times for traffic lanes and HOL differences for adjacent intersections.
The third critical aspect in any RL based approach is defining an appropriate reward function. This not only influences the learning performance but what the RL agent learns to optimize. We propose a novel reward function based on the power metric which is defined as the ratio of the system throughput to the end-to-end delay. In computer networks, this is referred to as the Klienrock's optimal operating point and is the basis of recent congestion control algorithm developed for the Internet. Based on detailed simulation analysis, it shown that the RL agent not only achieves good learning performance (faster convergence) but also achieve better performance in terms of network throughput and end-to-end delay.
The 2DSARSA agent is able to perform remarkably well in a complicated traffic network of multiple intersections in our work. We have shown that the proposed 2DSARSA architecture has a significantly better learning performance compared to other DRL architectures including Deep Q-Network (DQN), 3DQN and Deep SARSA (DSARSA).
- Set up your platform before running the code
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sha256sum Miniconda3-latest-Linux-x86_64.sh or sha256sum Miniconda2-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
conda create --name env_name
conda activate env_name
Add channel conda-forge
conda config --add channels conda-forge
Add channel pytorch
conda config --add channels pytorch
Install numpy
conda install numpy
Install pytorch with a specific version
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
Install matplotlib
conda install matplotlib
Install scipy
conda install scipy
Install python-utils
conda install python-utils
Install utils
pip install utils
Install gym
pip install gym
Install ipython
conda install ipython
Install opencv
conda install opencv
Install opencv-python
pip install opencv-python
You can clone this repository by:
git clone https://github.com/colouryen/2DSARSA.git
Switch to the code folder
cd 2DSARSA/code
Run the code
python RL_comp_multi.py
You can readily train a new model for a traffic network of 9 intersections by running the RL_comp_multi.py script from the agent's main directory.
You can save a trained model in the saved_agents
folder after running the RL_comp_multi.py script from the main folder.
If you find this open-source release useful, please reference in your paper:
@inproceedings{yen2020deep,
title={A Deep On-Policy Learning Agent for Traffic Signal Control of Multiple Intersections},
author={Yen, Chia-Cheng and Ghosal, Dipak and Zhang, Michael and Chuah, Chen-Nee},
booktitle={2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)},
pages={1--6},
year={2020},
organization={IEEE}
}