This is a repo for the deep learning course project : Correct-by-synthesis reinforcement learning with temporal logic constraints 1.
More information regarding the course can be found here
The main aim of this project is implementation and evaluating the results for the examples discussed by the authors in the paper.
Table of Contents
Follow these three steps to succesffully install and run the project.
Install anaconda on your OS and then run the below mentioned shell commands with conda comand path
- Install dependencies - Install conda env
Use the terminal or an Anaconda Prompt for the following steps:
-
Create the environment from the environment.yml file under conda/
conda env create -f CoRL_env.yml
-
The first line of the yml file sets the new environment's name. In my case it is
dl_project_env
. You could modify it before compiling this file. -
Activate the new environment: note replace myenv with "dl_project_env"
conda activate myenv
-
Verify that the new environment was installed correctly. You should see a star('*') in front the current active env
conda env list
-
You can also use
conda info --envs
-
Make a
frames
folder within thesrc
folder. This folder is used to save the frames used to create a gif in the code.
- Finally, you need create the
slugs
executable. cd into the src folder inslugs
and run themake
.ls
to ensure that the executable exists.
cd into the src/ folder and type
python3 main.py <save_flag> <grid_size>
For convenience I would recommend using save_flag = True
and the value of grid_size(N) = 4 - 7
.
├── conda # contains the environment file necessary to reproduce the python env
│ └── CoRL_env.yml
├── lib # contains the source code for slugs tool
│ ├── README.md
│ └── slugs
├── README.md
└── src # containts all the code relevant to the project
├── figures_and_gifs # directory containing all the saved gifs for Example 1. and Example 2 (refer to the project report).
├── frames # required to store frames. Need to make (using mkdir) one before running the main.py module
├── learning_reward_function # source code containing various player, learning algothing and the rl env.
├── main.py # main file
├── results # directory to dump the run time stats
├── saved_players # binary files of learned player
└── two_player_game # source code relevant to construction of two-player game
└── plots # directory containing all the high resolution plots of policy, Q plot, Valid transitions, max V change while learning
Autonomous systems are gradually becoming ubiquitous. Beyond simply demonstrating these systems, it is becoming increasingly important to provide guarantees that they behave safely and reliably. We can leverage methods developed from the Formal methods community to synthesize controllers/strategies to achieve a given task with task critical safety and performance guarantees. Traditional open-loop synthesis may work well in static environments but they may fail to find strategies that guarantee task completion under uncertain or adversarial behavior of the environment. Reactive synthesis is the field that deals with systems that continuously interact with the environment they operate in. These interactions have novel constraints such as real-time constraints, concurrency, parallelism that make them difficult to model correctly. We can model the interaction between the system(robot in our case) and the environment as a two player game and synthesize a winning strategy that satisfies the given specification formulated within a fragment of temporal logic framework. Thus we can synthesize controllers that guarantee completion of the task using Formal methods. We can then employ reinforcement learning techniques to learn to achieve the given task optimally by learning the underlying unknown reward function. Thus we establish both correctness (with respect to the temporal logic specifications) and optimality (with respect to the a priori unknown performance criterion) in regards to the task in a stochastic environment for a fragment of temporal logic specification. Hence, we can guarantee both qualitative (encoded as the winning condition) and quantitative (optimal reactive controllers) performance for a system operating in an unknown environment.
This project can be decoupled into two major sub-problems :
- Compute a set of permissive (winning) strategies that are realizable for a given game
- Choose a strategy that maximizes the underlying unknow reward function using maximin-Q learning algorithm.
Task : The system robot (green) should always be in a cell diagonally opposite to the env robot
Please refer to project_report directory for the presentation and the paper report for this project. Please contact me if you have questions at :karan.muvvala@colorado.edu
[1]: Min Wen, Rüdiger Ehlers, and Ufuk Topcu. “Correct-by-synthesis reinforcement learning with temporal logic constraints”. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE. 2015, pp. 4983–4990.