Safety is a major issue of reinforcement learning (RL) in complex real-world scenarios. In recent years, formal verification has increasingly been used to provide rigorous safety guarantees for RL. However, until now, major technical obstacles for combining probabilistic model checking with, in particular deep RL, remain. Our easy-to-use tool-chain COOL-MC unifies the powerful toolset of model checking, interpretable machine learning, and deep RL. At the heart is a tight integration of learning and verification that involves, amongst others, (1) the incremental building of state spaces, (2) mapping of policies obtained from state-of-the-art deep RL to formal models, and (3) the use of features from interpretable and explainable machine learning such as decision trees and attention maps. We evaluate our tool-chain on multiple commonly use.
- Getting Started with COOL-MC
- Example 1 (Frozen Lake)
- Example 2 (Taxi)
- General Pipeline
- RL Agent Training
- RL Model Checking
- RL Policy Interpretation
- RL Policy Decision Explaination
- Storm Model Checking
- Command Line Arguments
- Benchmarking
- Software Architecture
- Manual Installation
We assume that you have docker installed and that you run the following commands in the root of this repository:
- Download the docker container here.
- Load the docker container:
docker load --input coolmc.tar
- Create a project folder:
mkdir projects
- Run the docker container:
docker run --user "$(id -u):$(id -g)" -v "$(pwd)/projects":/projects -v "$(pwd)/prism_files":/prism_files -it coolmc bash
We discuss how to create the docker container yourself, and how to install the tool natively later.
To demonstrate our tool, we are going to train a near optimal RL policy for the commonly known frozen lake environment. The goal is to get familar with our tool and being able to use all our supported features.
- Start the interactive container and mount the PRISM and project folder: Run
docker run --user "$(id -u):$(id -g)" -v "$(pwd)/projects":/projects -v "$(pwd)/prism_files":/prism_files -it coolmc bash
- Execute
bash example_1.sh
.
python cool_mc.py --task training --architecture dqn --max_steps 20 --prism_file_path frozen_lake_4x4.prism --constant_definitions "slippery=0.04" --prop 'Pmax=? [F "water"]' --prop_type "min_prop" --project_name exp01_FL_4x4
This first command trains an RL agent to get a near-optimal policy to reach the frisbee without falling into the water. The task
parameter sets the COOL-MC into training mode (reinforcement learning) and the architecture
the RL algorithm (in this example: deep q-learning). The max_steps
parameter is needed to terminate the environment after a given number of steps (20). With the prism_file_path
parameter, we inform COOL-MC on which environment we want to train our agent. The constant_definitions
defines the constants of the environment. The prop
parameter is the property specification, which we want to query while training. prop_type
that we want to optimize for minimizing the property result. The project_name
parameter names the project. num_episodes
defines the number of epochs and eval_interval
the evaluation and property querying interval.
COOL-MC gives us the possibility to monitor the training progress via Tensorboard (if installed):
tensorboard --logdir projects # Execute this command on your local machine
.
After 10000 epochs, we gain an RL policy with a roughly 5% probability of falling into the water. Since the frozen lake environment is quite small, we can check the optimal probability of falling into the water by modifying task training
to task storm_model_checking
and removing unnecessary parameters. This allows us to use Storm directly:
python cool_mc.py --task storm_model_checking --prism_file_path frozen_lake_4x4.prism --constant_definitions "slippery=0.04" --prop 'Pmax=? [F "water"] --project_name exp01_FL_4x4
By changing task storm_model_checking
to task rl_model_checking
we are able to do model checking with the trained RL policy:
python cool_mc.py --task rl_model_checking --prism_file_path frozen_lake_4x4.prism --constant_definitions "slippery=0.04" --prop 'Pmax=? [F "water"] --project_name exp01_FL_4x4
By changing task rl_model_checking
to task decision_tree
we are able to interprete the trained policy via a decision tree:
python cool_mc.py --task decision_tree --prism_file_path frozen_lake_4x4.prism --project_name example1 --constant_definitions "slippery=0.04" --architecture dqn --prop 'Pmax=? [F "water"]'
We can find the decision tree plot inside the project folder.
Decision Tree Policy Interpretation of RL policy of example 1.
After training the decision tree, we can use task dt_model_checking
to model check the decision tree policy:
python cool_mc.py --task dt_model_checking --prism_file_path frozen_lake_4x4.prism --project_name example1 --constant_definitions "slippery=0.04"
By changing task dt_model_checking
to task attention_training,
we can train an explainable model for our trained RL policy. Now it is possible to see which features influence the RL policy decision for a certain state:
python cool_mc.py --task attention_training --project_name example1 --constant_definitions "slippery=0.1"
python cool_mc.py --task attention_mapping --project_name example1 --constant_definitions "slippery=0.1" attention_input "x=0,y=2"
You can find the attention mapping plot in the project folder:
If we interested in how our RL agent performs over a range of different environment initializations, we can use the following command:
python3.8 cool_mc.py --task rl_model_checking --project_name example1 --constant_definitions "slippery=[0.1;0.1;1]" --prop 'Pmin=? [F "water"]'
A plot that visualizes how different constant definitions influence the safety property of the trained RL policy.
Besides using Tensorboard, it is also possible to directly plot the rewards and property results while training and save them into the project folder:
python cool_mc.py --task plot_rewards --project_name example1
python cool_mc.py --task plot_props --project_name example1
To demonstrate our tool, we are going to train a near optimal RL policy for the commonly known taxi environment.
- Start the interactive container and mount the PRISM and project folder: Run
docker run --user "$(id -u):$(id -g)" -v "$(pwd)/projects":/projects -v "$(pwd)/prism_files":/prism_files -it coolmc bash
- Execute
bash example_2.sh
.
bash example_2.sh
executes five commands.
The first command trains a DQN policy on the taxi environment and queries every 500 epochs the probability of finishing two jobs.
The second command uses Storm to get the optimal probability of finishing two jobs.
The third command uses the trained DQN policy to get the probability of finishing two jobs.
The fourth command generates an interpretable decision tree policy of the trained DQN policy.
The fifth command uses the interpretable decision tree policy to get the probability of finishing two jobs.
The sixth command trains an attention autoencoder for the explainability of the DQN policy.
With command 7, it is possible to see which state variables influence the DQN decision for the given input (explainable reinforcement learning).
We first have to model our RL environment. COOL-MC supports PRISM as modeling language. It can be difficult to design own PRISM environments. Here are some tips how to make sure that your PRISM environment works correctly with COOL-MC:
- Make sure that you only use transition-rewards
- After the agent reaches a terminal state, the storm simulator stops the simulation. Therefore, terminal state transitions will not executed. So, do not use self-looping terminal states.
- To improve the training performance, try to make all actions at every state available. Otherwise, the agent may chooses a not available action and receives a penalty.
- Try to unit test your PRISM environment before RL training. Does it behave as you want?
After we modeled the environment, we can train RL agents on this environment. It is also possible to develop your own RL agents:
- Create a AGENT_NAME.py in the src.rl_agents package
- Create a class AGENT_NAME and inherit all methods from src.agent.Agent
- Set use_tf_environment to true if you use tf_environments instead of py_environments from tf_agents
- Override all the needed methods (depends on your agent) + the agent save- and load-method.
- In src.rl_agents.agent_builder extends the build_agent method with an additional elif branch for your agent
- Add additional command-line arguments in cool_mc.py (if needed)
Here are some tips that may improve the training progress:
- Try to use the disable_state parameter to disable state variables from PRISM which are only relevant for the PRISM environment architecture.
- Play around with the RL parameters.
- Think especially about the size of the max_steps if you have only one terminal state and if this terminal state is a negative outcome with a huge penalty. Too large max_steps values lead to always reaching this negative step in the beginning of the training. Instead, decrease the max_steps so that the RL agent may not reach this terminal state and stops training earlier. So the huge penality is not influencing the RL training too much in the beginning.
- The model checking part while RL training can take time. Therefore, the best way to train and verify your model is to first use reward_max. After the RL model may reaches an execptable reward the change the parameter prop_type to min_prop or max_prop and adjust the evaluation intervals.
We can use the trained RL policy to create a induce Discrete-Time Markov Chain (DTMC). This allows us to verify the RL policy in a much smaller DTMC as with Storm.
Another toolset that helps us to make machine learning modelssafer is called interpretable machine learning. It refers to methods and modelsthat make the behavior and predictions of machine learning systems understand-able. We make our trained RL policy interpretable by training a decision treeas surrogate model.
Humans want to understand adecision or at least they want to get an explanation for certain decisions. There-fore, COOL-MC also supports attention mapping which is a explainable machinelearning method. For that, we first train an auto encoder with attention lay-ers as a surrogate model of the trained RL policy
If possible,we also give the user the possibility to calculate the optimal policy of the PRISMenvironment via Storm.
The following list contains all the major COOL-MC command line arguments. It does not contain the arguments which are related to the RL algorithms. For a detailed description, we refer to the src.rl_agents package.
task
specifies the task:
- Reinforcement Learning (training)
- RL Model Checking (rl_model_checking)
- Storm Model Checking (storm_model_checking)
- Attention Training (attention_training)
- Attention Map Plotting (attention_mapping)
- decision tree training (decision_tree)
- Plot Rewards of the training progress (plot_rewards)
- Plot property results of the training progress (plot_props)
Specifies the folder with all PRISM files.
Specifies the path of the PRISM file inside the prism file folder.
Specifies the folder with the COOL-MC projects.
Specifies the current COOL-MC project name and saves it inside the project_dir
folder.
Constant definitions for the environments. Examples:
- x=0.2
- x=0.3,y=1
- x=4,y=3,z=3
- x=[0,3,30],y=3 with a range of x-values from 0 until 27 with step size of 3.
Specifies the maximal number of allowed steps inside the environment.
If true, it disables the property querying.
If true, the agent receives rewards instead of penalties.
Penaly of choosing wrong actions.
Specifies the agent algorithm (investigate src.rl_agents.agent_builder for supported agents).
Specifies the number of learning episodes.
Specifies the number of supervised learning episodes (necessary for interpretable/explainable machine learning).
Specifies the evaluation interval of the RL policy.
Specifies the property specification.
Optimizes the RL policy for Maximal Reward (max_reward), Minimal Reward (min_reward), Property minimization (min_prop) or maximization (max_prop).
Features which should not be used by the RL agent: FEATURE1,FEATURE2,...
Input for attention map plotting.
Disable GPU.
We tested our tool on a variety of benchmarks to compare PRISM modeling styles, COOL-MC features, and scalability. The results of these experiments are described in our paper. To reproduce our results, run the following shell scripts:
- Run
bash experiments_frozen_lake.sh
to reproduce the comparison of multiple RL algorithms in the frozen lake environment. - Run
bash experiments_taxi.sh
to reproduce the taxi environment experiments. - Run
bash experiments_smart_grid.sh
to reproduce the smart grid environment experiments. - Run
bash experiments_qcomp.sh
to verify that the QComp benchmarks with reward functions are working. - Run
bash experiments_modified_qcomp.sh
to verify that the QComp benchmarks with dummy reward functions are working.
Note: Experiments which RAN OUT OF MEMORY or TIME OUTs are commented out.
You can build the container via docker build -t coolmc .
It is also possible for UNIX users to run the bash script in the bin-folder.
Switch to the repository folder and define environment variable COOL_MC="$PWD"
sudo apt-get update && sudo apt-get -y install build-essential git cmake libboost-all-dev libcln-dev libgmp-dev libginac-dev automake libglpk-dev libhwloc-dev libz3-dev libxerces-c-dev libeigen3-dev python3 python-is-python3 python3-setuptools python3-pip graphviz && sudo apt-get install -y --no-install-recommends maven uuid-dev virtualenv
cd $COOL_MC
git clone https://github.com/moves-rwth/storm.git
cd storm
mkdir build
cd build
cmake ..
make -j 1
For more information about building Storm, click here.
For testing the installation, follow the follow steps here.
cd $COOL_MC
git clone https://github.com/moves-rwth/pycarl.git
cd pycarl
python setup.py build_ext --jobs 1 develop
If permission problems: sudo chmod 777 /usr/local/lib/python3.8/dist-packages/
and run third command again.
cd $COOL_MC
git clone https://github.com/moves-rwth/stormpy.git
cd stormpy
python setup.py build_ext --storm-dir "${COOL_MC}/build/" --jobs 1 develop
For more information about the Stormpy installation, click here.
For testing the installation, follow the steps here.
cd $COOL_MC
pip install -r requirements.txt
mkdir projects