Instructing goal-conditioned agents with LTL objectives

Goal-conditioned reinforcement learning (RL) is a powerful approach for learning general-purpose skills by reaching diverse goals. However, it has limitations when it comes to task-conditioned policies, where goals are specified by temporally extended instructions written in the Linear Temporal Logic (LTL) formal language. Existing approaches for finding LTL-satisfying policies rely on sampling a large set of LTL instructions during training to adapt to unseen tasks at inference time. However, these approaches do not guarantee generalization to out-of-distribution LTL objectives, which may have increased complexity. In this work, we developed a novel neurosymbolic approach to address this challenge. We showed that simple goal-conditioned RL agents can be instructed to follow arbitrary LTL specifications without additional training over the LTL task space.

ZoneEnv

We use a robot from Safety Gym called Point, with one actuator for turning and another for moving forward or backward. An agent can observe the LiDAR information of its surrounding zones. Given this indirect geographical information, it has to visit and/or avoid certain zones to satisfy sampled LTL task specifications. The initial positions of the zones and the robot are random in every episode.

Setup

Using conda and install pygraphviz.
```
conda install -c conda-forge pygraphviz
```
Install mujoco and mujoco-py.

Install safty-gym.

pip install -e zones/envs/safety/safety-gym/

Install required pip packages

numpy
torch
stable-baslines3
graphviz
gym
mujoco-py

Training (optional)

Training primitive action policies for ZoneEnv, including UP, DOWN, LEFT, and RIGHT:
```
python train_primitives.py
```
Training goal-conditioned policy for ZoneEnv and acquire a trajectory dataset:
```
python train_agent.py
```
Training goal-value function for ZoneEnv:
```
python train_gcvf.py
```
Optionally, train a goal-value function without training a new policy:
```
python collect_traj.py
python train_gcvf.py
```

Models

Primitive action policies for navigating the Point robot are saved in:
```
[project_base]/zones/models/primitives/*.zip
```
Trained goal-conditioned policies are saved in:
```
[project_base]/zones/models/goal-conditioned/best_model_ppo_[N].zip
```
where N denotes the number of zones present in the environment (8 by default).

Experiments

Avoidance experiments e.g. $\neg y U (j \wedge (\neg wUr))$ (where $y$ for yellow, $j$ for jet-black, $w$ for white, and $r$ for red).
```
python exp.py --task='avoid'
```
Loop experiments e.g. $GF(r \wedge XF y) \wedge G(\neg w)$
```
python exp.py --task='traverse'
```
Goal-chaining experiments e.g. $F(j \wedge F(w \wedge F(r \wedge Fy)))$
```
python exp.py --task='chain'
```
Stability experiments e.g $FGy$
```
python exp.py --task='stable'
```
See script [project_base]/zones/exp.py for more details including specifying eval_repeats and device, etc.

Examples

The left and right figures show the trajectory for the task $\neg y U (j \wedge (\neg wUr))$.

The left and right figures show the trajectory for the task $F(j \wedge X(\neg y U r)) \wedge G(\neg w) $.

The left and right figures show the trajectories for the task $F(j \wedge F(w \wedge F(r \wedge Fy)))$.

The left and right figures show the trajectories for the task $GF(r \wedge XF y) \wedge G(\neg w)$.

The left and right figures show the trajectories for the task $FGy$.

The left figure shows the trajectory for the task $F(j \wedge r)$.
The right figure shows the trajectory for the task $F(j \wedge \neg r)$.

The left figure shows the trajectory for the task $GFw \wedge GFy$.
The right figure shows the trajectory for the task $GFw \wedge GFy \wedge G(\neg j)$

The figure shows the trajectory for the task $Fj \wedge (\neg r \wedge \neg y \wedge \neg w)Uj$

References

pygraphviz, https://pygraphviz.github.io
LTL2Action, https://github.com/LTL2Action/LTL2Action
gltl2ba, https://github.com/PatrickTrentin88/gltl2ba
safety-gym, https://github.com/openai/safety-gym

Ant 16 rooms

Ant-16rooms is an environment with continuous observation and action space. In this walled environment with 16 rooms, each room has the same size 8 × 8 divided by walls and corridors with a thickness of 1. There are two obstacles denoted by black squares in the environment. We place a Mujoco Ant robot in this environment for navigation.

Setup

The environment for the Ant16rooms experiment is based on the following version of the packages:

	numpy=1.18.5
	torch=1.5.1
	gym=0.13.1
	mujoco_py=2.0.2.5

along with MuJoCo simulator version mujoco200 from MuJoCo release website

Docker The environment is designed based on the environment used in GCSL. To download the docker image:
```
docker pull dibyaghosh/gcsl:0.1
```
Conda For conda environment setting-up, please refer to conda_environment.yml for all specific versions of packages.
Python(pip) For Python pip packages, please refer to python_requirement.txt for all specific versions of packages. Also, install packages used in "dependencies" folder with "pip -e ."

Run

Add workspace directory to PYTHONPATH:

export PYTHONPATH="${PYTHONPATH}:{path_of_GCRL-LTL_ant_folder}"

Testing with LTL specifications

python experiments/TestLTLspecs_Buchi.py ant16rooms {#ofspecification}

specifications $\phi_1$ to $\phi_5$ are corresponding to # 9 to 13 as input

Results

Specification $\phi_1$

$F((0, 2) \vee (2, 0))$ - either reaching room (2,0) [orange position on the left] or room (0,2) [orange position on the right]

Specification $\phi_2$

$F(((0, 2) \vee (2, 0)) \wedge F(2, 2))$ - reaching room (2,2) [orange] by choosing any of the two orange paths

Specification $\phi_3$

$F(((0, 2) \vee (2, 0)) \wedge F((2, 2) \wedge F(((2, 1) \vee (3, 2)) \wedge F(3, 1))))$ - reaching room (3,1) [yellow] after visiting orange.

Specification $\phi_4$

$F(((0, 2) \vee (2, 0)) \wedge F((2, 2) \wedge F(((2, 1) \vee (3, 2)) \wedge F((3, 1) \wedge F(((1, 1) \vee (3, 3)) \wedge F(1, 3))))))$ - reaching room (1,3) [green] after visiting orange and yellow sequentially.

Specification $\phi_5$

$F(((0, 2) \vee (2, 0)) \wedge F((2, 2) \wedge F(((2, 1) \vee (3, 2)) \wedge F((3, 1) \wedge F(((1, 1) \vee (3, 3)) \wedge F((1, 3) \wedge F(((1, 1) \vee (0, 3)) \wedge F(0, 1))))))))$ - reaching room (0,1) [purple] after visiting orange, yellow and green sequentially.

$\omega$-Regular Specification $\phi_6$

$\varphi_1 \vee \varphi_2$ where $\varphi_1$ (the green path) is $GF((1, 0) ∧ X(F((3, 0) ∧ X(F(3, 2) ∧ XF(1, 2)))))$ and $\varphi_2$ (the orange path) is $F(0, 2) \wedge XGF((2, 2) \wedge X(F((3, 2) \wedge X(F(3, 3) \wedge XF(2, 3)))))$ - the agent opts to iteratively traverse a small loop to satisfy the $\omega$-regular specification, although this loop is on a far-away end.

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
ant		ant
zones		zones
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instructing goal-conditioned agents with LTL objectives

ZoneEnv

Setup

Training (optional)

Models

Experiments

Examples

References

Ant 16 rooms

Setup

Run

Testing with LTL specifications

Results

Specification $\phi_1$

Specification $\phi_2$

Specification $\phi_3$

Specification $\phi_4$

Specification $\phi_5$

$\omega$-Regular Specification $\phi_6$

About

Releases

Packages

Contributors 3

Languages

License

RU-Automated-Reasoning-Group/GCRL-LTL

Folders and files

Latest commit

History

Repository files navigation

Instructing goal-conditioned agents with LTL objectives

ZoneEnv

Setup

Training (optional)

Models

Experiments

Examples

References

Ant 16 rooms

Setup

Run

Testing with LTL specifications

Results

Specification $\phi_1$

Specification $\phi_2$

Specification $\phi_3$

Specification $\phi_4$

Specification $\phi_5$

$\omega$-Regular Specification $\phi_6$

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages