Skip to content

aforseys/diff-tuning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Inference-Time Policy Steering (ITPS)

Maze2D benchmark of various sampling methods with sketch input from the paper Inference-Time Policy Steering through Human Interactions.

Installation

Clone this repo

git clone git@github.com:yanweiw/itps.git
cd itps

Create a virtual environment with Python 3.10

conda create -y -n itps python=3.10
conda activate itps

Install ITPS

pip install -e .

Download the pre-trained weights for Action Chunking Transformers and Diffusion Policy and put them in the itps/itps folder (Be sure to unzip the downloaded zip file).

Visualize pre-trained policies.

Run ACT or DP unconditionally to explore motion manifolds learned by these pre-trained policies.

python interact_maze2d.py -p [act, dp] -u
Multimodal predictions of DP

Bias sampling with sketch interaction.

-ph - Post-Hoc Ranking -op - Output Perturbation -bi - Biased Initialization -gd - Guided Diffusion -ss - Stochastic Sampling

python interact_maze2d.py -p [act, dp] [-ph, -bi, -gd, -ss]
Post-Hoc Ranking Example
Draw by clicking and dragging the mouse. Re-initialize the agent (red) position by moving the mouse close to it without clicking.

Visualize sampling dynamics.

Run DP with BI, GD or SS with -v option.

python interact_maze2d.py -p [act, dp] [-bi, -gd, -ss] -v
Stochastic Sampling Example

Benchmark methods.

Save sketches into a file exp00.json and use them across methods.

python interact_maze2d.py -p [act, dp] -s exp00.json

Visualize saved sketches by loading the saved file, press the key n for next.

python interact_maze2d.py -p [act, dp] [-ph, -op, -bi, -gd, -ss] -l exp00.json

Save experiments into exp00_dp_gd.json

python interact_maze2d.py -p dp -gd -l exp00.json -s .json

Replay experiments.

python interact_maze2d.py -l exp00_dp_gd.json

How to get the pre-trained policy?

While the ITPS framework assumes the pre-trained policy is given, I have received many requests to open source my training data (D4RL Maze2D) and training code (my LeRobot fork) (use it at your own risk as it is not as well-maintained as the inference code in this repo). So here you are:

Make sure you are on the custom_dataset branch of the training codebase and use the dataset here.

python lerobot/scripts/train.py policy=maze2d_act env=maze2d

You can set policy=maze2d_dp to train a diffusion policy. If the itps conda environment does not support training, create a lerobot environment following this. Hopefully, this will work. But I cannot guarantee it, as this is not the paper contribution and I am not maintaining it.

Acknowledgement

Part of the codebase is modified from LeRobot.

About

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%