Skip to content

Latest commit

 

History

History
57 lines (36 loc) · 2.33 KB

README.md

File metadata and controls

57 lines (36 loc) · 2.33 KB

ORL-TAMP

Optimistic Reinforcement Learning Task and Motion Planning (ORL-TAMP) is a framework integrating an RL policy into TAMP pipelines. The general idea is to encapsulate an RL policy into a so-called skill. A skill comprises an RL policy, a state discriminator, and a sub-goal generator. Besides steering the action, the RL policy, state discriminator, and sub-goal generator are used to verify symbolic predicates and ground geometric values.

Video

The method introduction and experiments:

Watch the video

Installation

The current version is tested on Ubuntu 20.04

  1. Dependencies:

    We are currently trying to remove the dependency of MoveIt due to its inflexibility and ROS specificity.

  2. Build PDDL FastDownward solver:

    orl_tamp$ ./downward/build.py
    
  3. Compile IK solver:

    orl_tamp$ cd utils/pybullet_tools/ikfast/franka_panda/
    franka_panda$ python setup.py
    

Run

  1. Download the RL policy models: Retrieve and EdgePush, and save policies in the /orl_tamp/policies folder.

  2. Run MoveIt (following the tutorial)

  3. Run demos:

    • Retrieve: orl_tamp$ ./run_demo.sh retrieve
    • EdgePush: orl_tamp$ ./run_demo.sh edgepush
    • Rearange: orl_tamp$ ./run_demo.sh rearrange

Train

This section we give general steps about to train your own skills.

  1. Modify the PDDL domain file and and stream file, add the PDDL definations of the skills.
  2. Use StableBaselines3 to standardized the policy trainning.
  3. Generate dataset in the domain scenario.
  4. Train the state discriminator.