Self-Improving Reinforcement Learning

Self-Improving Safety Performance of Reinforcement Learning Based Driving with Black-Box Verification Algorithms

Paper | Code | ArXiv | Slide | Video

Citation

@INPROCEEDINGS{dagdanov2023self,
  author={Dagdanov, Resul and Durmus, Halil and Ure, Nazim Kemal},
  booktitle={2023 IEEE International Conference on Robotics and Automation (ICRA)}, 
  title={Self-Improving Safety Performance of Reinforcement Learning Based Driving with Black-Box Verification Algorithms}, 
  year={2023},
  volume={},
  number={},
  pages={5631-5637},
  doi={10.1109/ICRA48891.2023.10160883}
}

Citation
Installation
Tests
- Test Training Example
Train Reinforcement Learning Agent
- Proximal Policy Optimization (PPO)
- Soft Actor-Critic (SAC)
Tune Reward Function
Evaluation
- Evaluate RL Agent
- Evaluate IDM Vehicle
Verification Algorithms
Self Improvement
- Train RL on Custom Verification Scenarios
Analyse Results
- Analyse & Visualize Validation Scenarios
Sbatch Slurm
- Slurm Training & Verification

Installation

Export Repository Path

save this directory to .bashrc

gedit ~/.bashrc

paste and save the following to .bashrc file

export BLACK_BOX="LocalPathOfThisRepository"

execure saved changes

source ~/.bashrc

Anaconda Environment Creation

used python3.7

conda create -n highway python=3.7.13

conda activate highway

install required packages

pip install -r requirements.txt

In order to successfully use GPUs, please install CUDA by following the site : https://pytorch.org/get-started/locally/

Trained and tested the repository with the following versions:
- Python -> 3.7.13
- Pytorch -> 1.11.0
- Ray -> 2.0.0
- Gym -> 0.22.0

Environment Installation

prepare Ubuntu

sudo apt-get update -y

sudo apt-get install -y python-dev libsdl-image1.2-dev libsdl-mixer1.2-dev libsdl-ttf2.0-dev libsdl1.2-dev libsmpeg-dev
    python-numpy subversion libportmidi-dev ffmpeg libswscale-dev libavformat-dev libavcodec-dev libfreetype6-dev gcc

accept license for additional gym alike games, otherwise this can cause error in highway-environment installation

pip install autorom-accept-rom-license==0.4.2

install highway-environment

pip install highway-env==1.5

install custom highway environment globally

cd highway_environment

python setup.py install

NOTE: make sure that after each update of Environment class, environment installation has to be performed again

register custom environment wrapper class

cd highway_environment/highway_environment

python create_env.py

Package Installation

install Ray + dependencies for Ray Tune

pip install -U "ray[tune]"==2.0.0

install Ray + dependencies for Ray RLlib

pip install -U "ray[rllib]"==2.0.0

Tests

Test Training Example

run default PPO training example with ray.tune

cd highway_environment/highway_environment

python test_train.py

Train RL Agent

NOTE:

parameters of a trained model will be saved at /experiments/results/trained_models folder
please specify training-iteration parameter inside /experiments/configs/train_config.yaml config for how many iteration to train model
training model parameters could be changed from /experiments/configs/ppo_train.yaml for PPO or /experiments/configs/sac_train.yaml for SAC algorithms

Proximal Policy Optimization

cd experiments/training

python ppo_train.py

Soft Actor-Critic

cd experiments/training

python sac_train.py

Tune Reward Function

NOTE:

custom reward function for RL agent training is calculated in /highway_environment/highway_environment/envs/environment.py as compute_reward()
energy weights of the function is computed by analysing real driving scenarios
grid search algorith is applied to find weight multipliers of the function that maximizes reward obtained in real driving scenarios
tuning logs and results are saved in /experiments/results/tuning_reward_function/ folder
currently Eatron driving dataset is used for tuning
before tuning, please take a look at /experiments/configs/reward_tuning.yaml configuration file

cd experiments/utils

python reward_tuning.py

Evaluation

Evaluate RL Agent

NOTE:

parameters of a trained model should be moved to /experiments/results/trained_models/ folder from ~/ray_results/ folder
please check load-agent-name key inside /experiments/configs/evaluation_config.yaml config to be the model intended to evaluate
consider initial-space key in the same config yaml that represents the initial conditions of the front vehicle while evaluation

cd experiments/evaluation

python evaluate_model.py

Evaluate IDM Vehicle

NOTE:

EGO vehicle could be set as an IDM vehicle
controlling actions of an EGO vehicle will be taken by IDM vehicle
set controlled-vehicles key inside /experiments/configs/env_config.yaml to 0 (zero)

Verification Algorithms

Grid-Search Validation

apply grid-search algorithm verification on a trained rl model

NOTE:

check load-agent-name key inside /experiments/configs/grid_search.yaml config and make sure that the model is located in /experiments/results/trained_models/ folder

cd experiments/algorithms

python grid_search.py

Monte-Carlo-Search Validation

apply monte-carlo-search algorithm verification on a trained rl model

NOTE:

check load-agent-name key inside /experiments/configs/monte_carlo_search.yaml config and make sure that the model is located in /experiments/results/trained_models/ folder

cd experiments/algorithms

python monte_carlo_search.py

Cross-Entropy-Search Validation

apply cross-entropy-search algorithm verification on a trained rl model

NOTE:

check load-agent-name key inside /experiments/configs/ce_search.yaml config and make sure that the model is located in /experiments/results/trained_models/ folder
check number-of-samples key inside /experiments/configs/ce_search.yaml config is defined as a multiplication of iteration number and sample size per iteration. At each iteration best 10 percent will be selected from batch of sample size to determine next iteration's minimum and maximum limits.

cd experiments/algorithms

python ce_search.py

Bayesian-Optimization-Search Validation

apply bayesian-optimization-search algorithm verification on a trained rl model

install package

pip install bayesian-optimization==1.4.0

NOTE:

check load-agent-name key inside /experiments/configs/bayesian_search.yaml config and make sure that the model is located in /experiments/results/trained_models/ folder

cd experiments/algorithms

python bayesian_search.py

Adaptive-Multilevel-Splitting-Search Validation

apply adaptive-multilevel-splitting-search algorithm verification on a trained rl model

NOTE:

check load-agent-name key inside /experiments/configs/ams_search.yaml config and make sure that the model is located in /experiments/results/trained_models/ folder

cd experiments/algorithms

python ams_search.py

Self-Improvement

Train RL on Custom Verification Scenarios

after applying verification algorithm, RL agent could be trained again on validation results

NOTE:

use /experiments/configs/self_improvement.yaml config
train model with /experiments/training/self_improvement.py script
trained model could be loaded and re-trained from latest checkpoint with is-restore key inside config
custom scenario setter class is located at /experiments/utils/scenarios.py
new scenario loader could be added and referenced with key validation-type in config self_improvement.yaml config

cd experiments/training

python self_improvement.py

to include specific verification results into sampling container, read the following note

NOTE:

change validation-type key inside /experiments/configs/self_improvement.yaml config to "complex"
take a look at scenario-mixer key parameters and specify which validation results to include
each validation comes with probability of sampling which should sum up to 1.0
folder names in scenario-mixer key should be null if not specified and total sum percentage-probability of existing folders should be 100 (1.0)

Analyse Results

Analyse & Visualize Validation Scenarios

after training and running a verification algorithm, visualize validation and failure scenarios

cd experiments/analyses

python3 -m notebook

Sbatch Slurm

Slurm Training & Verification

submit a batch script to slurm for training an RL model

cd experiments/training

conda activate highway

# checkout resource allocations before submitting a slurm batch
sbatch slurm_train.sh

submit a batch script to slurm for applying selected verification algorithm

cd experiments/algorithms

conda activate highway

# checkout selected algorithm and resource allocations before submitting a slurm batch
sbatch slurm_verification.sh

basic slurm commands

# submit a batch script to Slurm for processing
sbatch <job-id>

# show information about your job(s) in the queue
squeue

# show information about current and previous jobs
sacct

# end or cancel a queued job
scancel <job-id>

# read last lines of terminal logs (.err or .out)
tail -f <job-id>.out

Name		Name	Last commit message	Last commit date
Latest commit History 250 Commits
assets		assets
experiments		experiments
highway_environment		highway_environment
slides		slides
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

data-and-decision-lab/self-improving-RL

Folders and files

Latest commit

History

Repository files navigation

Self-Improving Reinforcement Learning

Self-Improving Safety Performance of Reinforcement Learning Based Driving with Black-Box Verification Algorithms

Paper | Code | ArXiv | Slide | Video

Citation

Contents

Installation

Export Repository Path

Anaconda Environment Creation

Environment Installation

Package Installation

Tests

Test Training Example

Train RL Agent

Proximal Policy Optimization

Soft Actor-Critic

Tune Reward Function

Evaluation

Evaluate RL Agent

Evaluate IDM Vehicle

Verification Algorithms

Grid-Search Validation

Monte-Carlo-Search Validation

Cross-Entropy-Search Validation

Bayesian-Optimization-Search Validation

Adaptive-Multilevel-Splitting-Search Validation

Self-Improvement

Train RL on Custom Verification Scenarios

Analyse Results

Analyse & Visualize Validation Scenarios

Sbatch Slurm

Slurm Training & Verification

About

Topics

Resources

License

Stars

Watchers

Forks

Languages