Skip to content

NeurlPS'23 Instructing Goal-Conditioned Agents with LTL Objectives

License

Notifications You must be signed in to change notification settings

RU-Automated-Reasoning-Group/GCRL-LTL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Instructing goal-conditioned agents with LTL objectives

Goal-conditioned reinforcement learning (RL) is a powerful approach for learning general-purpose skills by reaching diverse goals. However, it has limitations when it comes to task-conditioned policies, where goals are specified by temporally extended instructions written in the Linear Temporal Logic (LTL) formal language. Existing approaches for finding LTL-satisfying policies rely on sampling a large set of LTL instructions during training to adapt to unseen tasks at inference time. However, these approaches do not guarantee generalization to out-of-distribution LTL objectives, which may have increased complexity. In this work, we developed a novel neurosymbolic approach to address this challenge. We showed that simple goal-conditioned RL agents can be instructed to follow arbitrary LTL specifications without additional training over the LTL task space.

ZoneEnv

We use a robot from Safety Gym called Point, with one actuator for turning and another for moving forward or backward. An agent can observe the LiDAR information of its surrounding zones. Given this indirect geographical information, it has to visit and/or avoid certain zones to satisfy sampled LTL task specifications. The initial positions of the zones and the robot are random in every episode.

Setup

  • Using conda and install pygraphviz.
    conda install -c conda-forge pygraphviz
  • Install mujoco and mujoco-py.
  • Install safty-gym.
    pip install -e zones/envs/safety/safety-gym/
    
  • Install required pip packages
    numpy
    torch
    stable-baslines3
    graphviz
    gym
    mujoco-py
    

Training (optional)

  • Training primitive action policies for ZoneEnv, including UP, DOWN, LEFT, and RIGHT:
    python train_primitives.py
    
  • Training goal-conditioned policy for ZoneEnv and acquire a trajectory dataset:
    python train_agent.py
  • Training goal-value function for ZoneEnv:
    python train_gcvf.py
    
  • Optionally, train a goal-value function without training a new policy:
    python collect_traj.py
    python train_gcvf.py
    

Models

  • Primitive action policies for navigating the Point robot are saved in:
    [project_base]/zones/models/primitives/*.zip
    
  • Trained goal-conditioned policies are saved in:
    [project_base]/zones/models/goal-conditioned/best_model_ppo_[N].zip
    
    where N denotes the number of zones present in the environment (8 by default).

Experiments

  • Avoidance experiments e.g. $\neg y U (j \wedge (\neg wUr))$ (where $y$ for yellow, $j$ for jet-black, $w$ for white, and $r$ for red).
    python exp.py --task='avoid'
  • Loop experiments e.g. $GF(r \wedge XF y) \wedge G(\neg w)$
    python exp.py --task='traverse'
  • Goal-chaining experiments e.g. $F(j \wedge F(w \wedge F(r \wedge Fy)))$
    python exp.py --task='chain'
  • Stability experiments e.g $FGy$
    python exp.py --task='stable'
    See script [project_base]/zones/exp.py for more details including specifying eval_repeats and device, etc.

Examples

  • The left and right figures show the trajectory for the task $\neg y U (j \wedge (\neg wUr))$.



  • The left and right figures show the trajectory for the task $F(j \wedge X(\neg y U r)) \wedge G(\neg w) $.



  • The left and right figures show the trajectories for the task $F(j \wedge F(w \wedge F(r \wedge Fy)))$.



  • The left and right figures show the trajectories for the task $GF(r \wedge XF y) \wedge G(\neg w)$.



  • The left and right figures show the trajectories for the task $FGy$.



  • The left figure shows the trajectory for the task $F(j \wedge r)$.
  • The right figure shows the trajectory for the task $F(j \wedge \neg r)$.



  • The left figure shows the trajectory for the task $GFw \wedge GFy$.
  • The right figure shows the trajectory for the task $GFw \wedge GFy \wedge G(\neg j)$



  • The figure shows the trajectory for the task $Fj \wedge (\neg r \wedge \neg y \wedge \neg w)Uj$



References

Ant 16 rooms

Ant-16rooms is an environment with continuous observation and action space. In this walled environment with 16 rooms, each room has the same size 8 × 8 divided by walls and corridors with a thickness of 1. There are two obstacles denoted by black squares in the environment. We place a Mujoco Ant robot in this environment for navigation.

Setup

The environment for the Ant16rooms experiment is based on the following version of the packages:

	numpy=1.18.5
	torch=1.5.1
	gym=0.13.1
	mujoco_py=2.0.2.5

along with MuJoCo simulator version mujoco200 from MuJoCo release website

  1. Docker The environment is designed based on the environment used in GCSL. To download the docker image:

    docker pull dibyaghosh/gcsl:0.1
    
  2. Conda For conda environment setting-up, please refer to conda_environment.yml for all specific versions of packages.

  3. Python(pip) For Python pip packages, please refer to python_requirement.txt for all specific versions of packages. Also, install packages used in "dependencies" folder with "pip -e ."

Run

Add workspace directory to PYTHONPATH:

export PYTHONPATH="${PYTHONPATH}:{path_of_GCRL-LTL_ant_folder}"
Testing with LTL specifications
python experiments/TestLTLspecs_Buchi.py ant16rooms {#ofspecification}

specifications $\phi_1$ to $\phi_5$ are corresponding to # 9 to 13 as input

Results

Specification $\phi_1$

$F((0, 2) \vee (2, 0))$ - either reaching room (2,0) [orange position on the left] or room (0,2) [orange position on the right]

phi1maze phi1

Specification $\phi_2$

$F(((0, 2) \vee (2, 0)) \wedge F(2, 2))$ - reaching room (2,2) [orange] by choosing any of the two orange paths

phi2maze phi2

Specification $\phi_3$

$F(((0, 2) \vee (2, 0)) \wedge F((2, 2) \wedge F(((2, 1) \vee (3, 2)) \wedge F(3, 1))))$ - reaching room (3,1) [yellow] after visiting orange.

phi3maze phi3

Specification $\phi_4$

$F(((0, 2) \vee (2, 0)) \wedge F((2, 2) \wedge F(((2, 1) \vee (3, 2)) \wedge F((3, 1) \wedge F(((1, 1) \vee (3, 3)) \wedge F(1, 3))))))$ - reaching room (1,3) [green] after visiting orange and yellow sequentially.

phi4maze phi4

Specification $\phi_5$

$F(((0, 2) \vee (2, 0)) \wedge F((2, 2) \wedge F(((2, 1) \vee (3, 2)) \wedge F((3, 1) \wedge F(((1, 1) \vee (3, 3)) \wedge F((1, 3) \wedge F(((1, 1) \vee (0, 3)) \wedge F(0, 1))))))))$ - reaching room (0,1) [purple] after visiting orange, yellow and green sequentially.

phi5maze phi5

$\omega$-Regular Specification $\phi_6$

$\varphi_1 \vee \varphi_2$ where $\varphi_1$ (the green path) is $GF((1, 0) ∧ X(F((3, 0) ∧ X(F(3, 2) ∧ XF(1, 2)))))$ and $\varphi_2$ (the orange path) is $F(0, 2) \wedge XGF((2, 2) \wedge X(F((3, 2) \wedge X(F(3, 3) \wedge XF(2, 3)))))$ - the agent opts to iteratively traverse a small loop to satisfy the $\omega$-regular specification, although this loop is on a far-away end.

phi9maze phi_loop

About

NeurlPS'23 Instructing Goal-Conditioned Agents with LTL Objectives

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published