Skip to content

Code for Data Collection & Training in Sim+Real Envs: [RSS 2024] Natural Language Can Help Bridge the Sim2Real Gap

License

Notifications You must be signed in to change notification settings

UT-Austin-RobIn/lang4sim2real

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lang4sim2real

Introduction

This codebase contains the training code and algorithm for the Lang4Sim2Real paper:

Natural Language Can Help Bridge the Sim2Real Gap
Albert Yu, Adeline Foote, Raymond J. Mooney, and Roberto MartĂ­n-MartĂ­n
Robotics: Science and Systems (RSS), 2024
Web | PDF | 5-min video

This codebase builds on the functions/classes from the previously released repo, deltaco, which was released with the Del-TaCo paper.

Citation

@inproceedings{yu2024lang4sim2real,
      title={Natural Language Can Help Bridge the Sim2Real Gap},
      author={Yu, Albert and Foote, Adeline and Mooney, Raymond and MartĂ­n-MartĂ­n, Roberto},
      booktitle={Robotics: Science and Systems (RSS), 2024},
      year={2024}
}

Table of Contents: Steps to Reproducing Our Sim2Real Results

Step 0. Setting Up

After cloning this repo and cd-ing into it:

cd train-lang4sim2real
conda env create -f env.yml
pip install -e .
python setup.py develop
cp rlkit/launchers/config_template.py rlkit/launchers/config.py

Modify the LOCAL_LOG_DIR in train-lang4sim2real/rlkit/launchers/config.py to a path on your machine where the experiment logs will be saved.

Pip-install the local sim environments (our version of the original robosuite repo):

cd ../robosuite-lang4sim2real
pip install -r requirements.txt
pip install -e .

If you plan on collecting data from scratch, also pip-install our local version of the original robomimic repo by following:

cd ../robomimic-lang4sim2real/robomimic
pip install -e .

On a machine with access to a real Franka Emika Panda robot, create a new python environment and install the real environments (our version of deoxys):

cd ../../deoxys-lang4sim2real
./InstallPackages
make -j build_deoxys=1
pip install -U -r requirements.txt

Follow instructions for compiling NUC codebase here, as well as additional documentation here.

Clone and install the sentence transformers repo.

cd ../..
git clone git@github.com:UKPLab/sentence-transformers.git
pip install -e .

Be aware that there may be more dependences you may need to pip install to run specific parts of our code. Open an issue if you have difficulty with installation.

Extra setup steps

BLEURT

To install the pytorch implementation of BLEURT, run:

pip install git+https://github.com/lucadiliello/bleurt-pytorch.git

Afterwards, you should be able to run:

from bleurt_pytorch import BleurtConfig, BleurtForSequenceClassification, BleurtTokenizer

CLIP

If you wish to run experiments involving CLIP as the visual backbone of the policy, you will need to install open_clip and add the following line to your ~/.bashrc file:

export PYTHONPATH="$PYTHONPATH:[path_to_openclip_repo]/src"

R3M

If you wish to run experiments with R3M as the visual backbone of the policy, see the r3m repo for installation details.

Step 1. Collect Sim+Real Data

Option A: Download Our Datasets

All our datasets are on Box. However, the 2-step Pick-and-Place datasets are on OneDrive due to being larger than the Box file size limit.

Pick-and-Place

sim2real

.../1pp_sim2real.hdf5

  • 0-3: Sim prior domain, 400 trajs/task, 200 timesteps/traj. 4 different robosuite objects for the four task indices.
  • 4-5: Real target domain, 500 trajs/task, 18 timesteps/traj. carrot, forward or backward directions for the two task indices.
  • 6-7: Real prior task, target domain, 50 trajs/task. 18 timesteps/traj. paper box, forward or backward directions for the two task indices.
sim2sim

.../1pp_sim2sim.hdf5

  • 0-3: Sim prior domain, 400 trajs/task, 200 timesteps/traj.
  • 4-7: Sim target domain, 95 trajs/task, 200 timesteps/traj.

2-step Pick-and-Place

sim2real

.../2pp_sim2real.hdf5

  • 0-3: Sim prior domain, 1375 trajs/task, 320 timesteps/traj. 4 different robosuite objects for the four task indices.
  • 4-5: Real target domain, 102 trajs (task 4), 101 trajs (task 5), 45 timesteps/traj. carrot into bowl onto plate.
sim2sim

.../2pp_sim2sim.hdf5

  • 0-3: Sim prior domain, 1375 trajs/task, 320 timesteps/traj. 4 different robosuite objects for the four task indices.
  • 4-7: Sim target domain, 100 trajs/task, 320 timesteps/traj. 4 different robosuite objects for the four task indices.

Wrap Wire

sim2real

.../ww_sim2real.hdf5

  • 0: Sim prior domain, 1000 trajs, 200 timesteps/traj.
  • 1: Real target domain target task, 98 trajs, 45 timesteps/traj.
  • 2: Real target domain unused task (reverse data of task 1), 102 trajs, 45 timesteps/traj.
sim2sim

.../ww_sim2sim.hdf5

  • 0-1: Sim prior domain, 400 trajs/task, 200 timesteps/traj. 0 (counterclockwise), 1 (clockwise).
  • 2-3: Sim target domain, 100 trajs/task, 200 timesteps/traj. 2 (counterclockwise), 3 (clockwise).

Baseline Sim2Real Datasets

  • To collect Domain Rando datasets, simply add the flag --randomize wide to the collect_demonstrations_parallel.py data collection script described below.
  • To collect ADR+RNA datasets, simply add the flag --adr-rna to the collect_demonstrations_parallel.py data collection script.

Pick-and-Place

.../1pp_domain-rando_sim2real.hdf5 .../1pp_adr-rna_sim2real.hdf5 The task indices of the two baseline datasets are as described:

  • 0-3: Sim prior domain, data collected from domain randomization or ADR+RNA. 400 trajs/task, 200 timesteps/traj. 4 different robosuite objects for the four task indices.
  • 4-5: Real target domain, 500 trajs/task, 18 timesteps/traj. carrot, forward or backward directions for the two task indices.
  • 6-7: Real prior task, target domain, 50 trajs/task. 18 timesteps/traj. paper box, forward or backward directions for the two task indices.

2-step Pick-and-Place

.../2pp_domain-rando_sim2real.hdf5 .../2pp_adr-rna_sim2real.hdf5 The task indices of the two baseline datasets are as described:

  • 0-3: Sim prior domain, data collected from domain randomization or ADR+RNA. 1400 trajs/task, 320 timesteps/traj.
  • 4-5: Real target task, target domain, 102 trajs (task 4), 101 trajs (task 5), 45 timesteps/traj. carrot into bowl onto plate (forward and reverse task directions).
  • 6-7: Real prior task, target domain, 50 trajs/task, 45 timesteps/traj. wooden bridge block into bowl onto plate (forward and reverse task directions).

Wrap Wire

.../ww_domain-rando_sim2real.hdf5 .../ww_adr-rna_sim2real.hdf5 The task indices of the two baseline datasets are as described:

  • 0: Sim prior domain, data collected from domain randomization or ADR+RNA. 1024 trajs (domain rando) and 950 trajs (domain rando).
  • 1-2: Real target task, target domain. 98 trajs/task. Wrapping wire with eu plug around blender.
  • 3-4: Real prior task, target domain. 51 trajs/task. Wrapping ethernet cable with wooden bridge block around spool.

Option B: Collect Your Own Data

Sim

Pick-and-Place
Source Domain
python robosuite-lang4sim2real/robosuite/scripts/collect_demonstrations_parallel.py --robots Panda --environment Multitaskv2 --device scripted-policy --noise-std 0.05 -n 1600 -p 40 --task-idx-intervals 0-3 --directory [.../data_collection_out_dir] --camera agentview --img-dim 128 --state-mode 1 --multitask-hdf5-format --intra-thread-delay 30
Target Domain
python robosuite-lang4sim2real/robosuite/scripts/collect_demonstrations_parallel.py --robots Panda --environment Multitaskv2_ang1_fr5damp50 --device scripted-policy --noise-std 0.05 -n 400 -p 20 --task-idx-intervals 0-3 --directory [.../data_collection_out_dir] --camera agentview --img-dim 128 --state-mode 1 --multitask-hdf5-format --intra-thread-delay 3
2-step Pick-and-Place
Source Domain
python robosuite-lang4sim2real/robosuite/scripts/collect_demonstrations_parallel.py --robots Panda --environment PPObjToPotToStove --device scripted-policy --noise-std 0.05 -n 5600 -p 56 --task-idx-intervals 0-3 --directory [.../data_collection_out_dir] --camera agentview --img-dim 128 --state-mode 1 --multitask-hdf5-format --intra-thread-delay 40
Target Domain
python robosuite-lang4sim2real/robosuite/scripts/collect_demonstrations_parallel.py --robots Panda --environment PPObjToPotToStove_ang1_fr5damp50 --device scripted-policy --noise-std 0.05 -n 400 -p 20 --task-idx-intervals 0-3 --directory [.../data_collection_out_dir] --camera agentview --img-dim 128 --state-mode 1 --multitask-hdf5-format --intra-thread-delay 1
Wire Wrap
Source Domain
python robosuite-lang4sim2real/robosuite/scripts/collect_demonstrations_parallel.py --robots Panda --environment WrapUnattachedWire_v2 --device scripted-policy --noise-std 0.05 -n 800 -p 20 --task-idx-intervals 0-1 --directory [.../data_collection_out_dir] --camera agentview --img-dim 128 --state-mode 1 --multitask-hdf5-format --intra-thread-delay 5 --policy wrap-relative-location
Target Domain
python robosuite-lang4sim2real/robosuite/scripts/collect_demonstrations_parallel.py --robots Panda --environment WrapUnattachedWire_ang1_fr5damp50_v2 --device scripted-policy --noise-std 0.05 -n 200 -p 10 --task-idx-intervals 0-1 --directory [.../data_collection_out_dir] --camera agentview --img-dim 128 --state-mode 1 --multitask-hdf5-format --policy wrap-relative-location --save-video --intra-thread-delay 1

Real

Pick-and-Place
python deoxys-lang4sim2real/deoxys/scripts/data_collection.py --out-dir [.../data_collection_out_dir] --policy pick_place --env frka_pp --horiz 18 --noise 0.05 --obj-id 1 --num 1 --state-mode 1 --substeps-per-step 1
2-step Pick-and-Place
python deoxys-lang4sim2real/deoxys/scripts/data_collection.py --out-dir [.../data_collection_out_dir] --policy pick_place_n --env frka_obj_bowl_plate --horiz 45 --noise 0.05 --obj-id 6 --num 2 --state-mode 1 --substeps-per-step 1 --multistep-env
Wire Wrap
python deoxys-lang4sim2real/deoxys/scripts/data_collection.py --out-dir [.../data_collection_out_dir] --policy wrap_wire --env frka_wirewrap --horiz 45 --noise 0.05 --obj-id 2 --num 2 --state-mode 1 --substeps-per-step 1

Concatenating buffer command

python robosuite-lang4sim2real/robosuite/scripts/concat_hdf5.py -e [env_name] -p [buffer1_path] [buffer2_path] ... -d [out_dir_path] --concat-mode relabel-task-idx
  • There are two concat-modes. relabel-task-idx gives each task in each buffer a new task idx, starting from 0. For instance, if buffer1 contained tasks 0-3 and buffer2 contained tasks 1-2, then the output buffer would contain tasks 0-5 (where buffer2's tasks 1-2 get mapped to tasks 4-5 in the output buffer). merge-on-task-idx combines all the demos in each buffer with the same task-idx under that task-idx in the output buffer.

Running VLM-based Automated Stage Labeler

  1. Train gripper state predictor
python train-lang4sim2real/rlkit/lang4sim2real_utils/auto_captioner/train_gripper_state_pred.py --img-dir [.../2pp_sim2real.hdf5] --batch_size 256 --lr 0.02 --dom1-num-demos-per-task 100 --dom1-task-idxs 0-0 --num-epochs 100 --out-dir [.../out_dir]
  • We expect gripper_state_loss to end up around 0.33, ee_pos_l1_err to end up around 0.049, and gripper_classif_acc to be 0.98.
  1. Set up GroundingDINO so that you can run from groundingdino.util.inference import load_model.

  2. Run automatic stage labeler with the checkpoint from step 1 to get a buffer that has pred_stage_num alongside lang_stage_num.

python train-lang4sim2real/rlkit/lang4sim2real_utils/auto_captioner/object_detector_labeling.py ----gdino-path [.../parent_dir_of_gdino] --buffer-path [.../.hdf5] --gripper-state-pred-model [.../.pt from step 1]

Step 2. Pretrain Policy CNN

Option A: Download Our Pretrained Checkpoints

Download the Pretrained ResNet-18 checkpoints we used for our experiments.

Each checkpoint file is named with three attributes:

  • task: {1pp, 2pp, ww} for pick-and-place, 2-step pick-and-place, and wire wrap.
  • setting: {sim2real, sim2sim}
  • method: {lang-reg, lang-dist, stage-classif}

Option B: Pretrain Your Own Checkpoint

All commands shown below are language regression variant. See here for running the language distance variant, and here for running stage classification ablation.

Pick-and-Place

Sim2Real

Running this command requires downloading .../1pp_pretrain_real_prior-task.hdf5. Note that this trains with the real world prior task (pick-place paper box) instead of the target task (pick-place carrot).

python train-lang4sim2real/rlkit/lang4sim2real_utils/train/train_policy_cnn_lang4sim2real.py --dom1-img-dir [.../1pp_sim2real.hdf5] --dom1-task-idxs 0-3 --dom1-num-demos-per-task 100 --dom2-img-dir [.../1pp_pretrain_real_prior-task.hdf5] --dom2-task-idxs 0-0 --dom2-num-demos-per-task 50 --batch-size 256 --num-epochs 150 --lr 0.04 --out-dir [.../phase1_out_dir] --img-aug pad_crop --pad-size 12 --variant lang-reg --save-ckpt-freq 50 --shuffle-demos
Sim2Sim
python train-lang4sim2real/rlkit/lang4sim2real_utils/train/train_policy_cnn_lang4sim2real.py --dom1-img-dir [.../1pp_sim2sim.hdf5] --dom1-task-idxs 0-3 --dom1-num-demos-per-task 100 --dom2-img-dir [.../1pp_sim2sim.hdf5] --dom2-task-idxs 7-7 --dom2-num-demos-per-task 100 --batch-size 256 --num-epochs 150 --lr 0.04 --out-dir [.../phase1_out_dir] --img-aug pad_crop --pad-size 12 --variant lang-reg --save-ckpt-freq 50 --shuffle-demos

2-step Pick-and-Place

Sim2Real

Running this command requires downloading .../2pp_pretrain_real_target-task.hdf5.

python train-lang4sim2real/rlkit/lang4sim2real_utils/train/train_policy_cnn_lang4sim2real.py --dom1-img-dir [.../2pp_sim2real.hdf5] --dom1-task-idxs 0-3 --dom1-num-demos-per-task 100 --dom2-img-dir [.../2pp_pretrain_real_target-task.hdf5] --dom2-task-idxs 0-0 --dom2-num-demos-per-task 100 --batch-size 256 --num-epochs 150 --lr 0.04 --out-dir [.../phase1_out_dir] --img-aug pad_crop --pad-size 12 --variant lang-reg --save-ckpt-freq 50 --shuffle-demos
Sim2Sim
python train-lang4sim2real/rlkit/lang4sim2real_utils/train/train_policy_cnn_lang4sim2real.py --dom1-img-dir [.../2pp_sim2sim.hdf5] --dom1-task-idxs 0-3 --dom1-num-demos-per-task 100 --dom2-img-dir [.../2pp_sim2sim.hdf5] --dom2-task-idxs 7-7 --dom2-num-demos-per-task 100 --batch-size 256 --num-epochs 150 --lr 0.04 --out-dir [.../phase1_out_dir] --img-aug pad_crop --pad-size 12 --variant lang-reg --save-ckpt-freq 50 --shuffle-demos

Wrap Wire

Sim2Real

Running this command requires downloading .../ww_pretrain_real_target-task.hdf5.

python train-lang4sim2real/rlkit/lang4sim2real_utils/train/train_policy_cnn_lang4sim2real.py --dom1-img-dir [.../ww_sim2real.hdf5] --dom1-task-idxs 0-0 --dom1-num-demos-per-task 100 --dom2-img-dir [.../ww_pretrain_real_target-task.hdf5] --dom2-task-idxs 0-0 --dom2-num-demos-per-task 100 --batch-size 256 --num-epochs 150 --lr 0.04 --out-dir [.../phase1_out_dir] --img-aug pad_crop --pad-size 12 --variant lang-reg --save-ckpt-freq 50 --shuffle-demos
Sim2Sim
python train-lang4sim2real/rlkit/lang4sim2real_utils/train/train_policy_cnn_lang4sim2real.py --dom1-img-dir [.../ww_sim2sim.hdf5] --dom1-task-idxs 0-1 --dom1-num-demos-per-task 100 --dom2-img-dir [.../ww_sim2sim.hdf5] --dom2-task-idxs 3-3 --dom2-num-demos-per-task 100 --batch-size 256 --num-epochs 50 --lr 0.04 --out-dir [.../phase1_out_dir] --img-aug pad_crop --pad-size 12 --variant lang-reg --save-ckpt-freq 50 --shuffle-demos

Running language distance (BLEURT) variant

To pretrain with the language distance variant, you will need precomputed BLEURT score matrices in this folder, or listed by experiment:

Compute Your Own BLEURT Similarity Score Matrices

If you would like to change the language annotations at each stage of the trajectory (which are stored in the hdf5 datasets as an attribute under each task idx), you can recompute the BLEURT score matrices:

python train-lang4sim2real/rlkit/plot/plot_bleurt_dist.py --dom1-img-dir [hdf5_path1]
--dom1-task-idxs 0-1 --dom2-img-dir [hdf5_path2]
--dom2-task-idxs 1-1 --batch-size 256

Change the --dom*-task-idxs flags as appropriate.

Running Language Distance Pretraining

Then add the flags --variant lang-dist-dotprod --loss-arg-mult 40.0 --target-diff-mat-path [PATH_TO_BLEURT_TARGET_DIFF_MAT] to the CNN pretraining command, where [PATH_TO_BLEURT_TARGET_DIFF_MAT] is the path of the BLEURT score matrix downloaded/computed above, and change --variant lang-reg to --variant lang-dist-dotprod.

Running Stage Classification Ablation

Change the flag: --variant lang-reg to --variant stage-classif. Keep --lr 0.04.

Step 3. Train Policies with Multi-task, Multi-domain BC

Pick-and-Place

Sim2Real

Ours
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../1pp_sim2real.hdf5] --xdomain-buffer-envs Multitaskv2 frka_pp --xdomain-env-instruct-prefixes Multitaskv2:Simulation frka_pp:Real --train-target-task-idx-intervals 0-4 --eval-task-idx-intervals 0-0 --max-path-len 200 18 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_pp --realrobot-target-obj=carrot --num-tasks 8 --num-train-target-demos-per-task 400 --focus-train-task-idx-intervals 4-4 --focus-train-tasks-sample-prob 0.2 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --policy-cnn-ckpt [.../phase1_cnn_ckpt.pt] --policy-cnn-ckpt-unfrozen-mods film+cnnlastlayer --save-checkpoint-freq 50 --gpu 0 --num-epochs 300 --seed 310
No Pretrain (real)
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../1pp_sim2real.hdf5] --xdomain-buffer-envs Multitaskv2 frka_pp --xdomain-env-instruct-prefixes Multitaskv2:Simulation frka_pp:Real --train-target-task-idx-intervals 4-4 --eval-task-idx-intervals 0-0 --max-path-len 200 18 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_pp --realrobot-target-obj=carrot --num-tasks 8 --num-train-target-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 50 --gpu 0 --num-epochs 300 --seed 110
No Pretrain (sim+real)
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../1pp_sim2real.hdf5] --xdomain-buffer-envs Multitaskv2 frka_pp --xdomain-env-instruct-prefixes Multitaskv2:Simulation frka_pp:Real --train-target-task-idx-intervals 0-4 --eval-task-idx-intervals 0-0 --max-path-len 200 18 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_pp --realrobot-target-obj=carrot --num-tasks 8 --num-train-target-demos-per-task 400 --focus-train-task-idx-intervals 4-4 --focus-train-tasks-sample-prob 0.2 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 50 --gpu 0 --num-epochs 300 --seed 210
CLIP
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../1pp_sim2real.hdf5] --xdomain-buffer-envs Multitaskv2 frka_pp --xdomain-env-instruct-prefixes Multitaskv2:Simulation frka_pp:Real --train-target-task-idx-intervals 0-4 --eval-task-idx-intervals 0-0 --max-path-len 200 18 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode concat_to_img_embs --env frka_pp --realrobot-target-obj=carrot --num-tasks 8 --num-train-target-demos-per-task 400 --focus-train-task-idx-intervals 4-4 --focus-train-tasks-sample-prob 0.2 --num-focus-train-demos-per-task 100 --policy-cnn-type clip --clip-ckpt=[.../clip_ckpt.pt]  --freeze-clip --save-checkpoint-freq 250 --gpu 0 --num-epochs 300 --seed 610
R3M
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../1pp_sim2real.hdf5] --xdomain-buffer-envs Multitaskv2 frka_pp --xdomain-env-instruct-prefixes Multitaskv2:Simulation frka_pp:Real --train-target-task-idx-intervals 0-4 --eval-task-idx-intervals 0-0 --max-path-len 200 18 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode concat_to_img_embs --env frka_pp --realrobot-target-obj=carrot --num-tasks 8 --num-train-target-demos-per-task 400 --focus-train-task-idx-intervals 4-4 --focus-train-tasks-sample-prob 0.2 --num-focus-train-demos-per-task 100 --policy-cnn-type r3m --freeze-policy-cnn --save-checkpoint-freq 50 --gpu 0 --num-epochs 300 --seed 710
MMD
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../1pp_sim2real.hdf5] --xdomain-buffer-envs Multitaskv2 frka_pp --xdomain-env-instruct-prefixes Multitaskv2:Simulation frka_pp:Real --train-target-task-idx-intervals 0-4 --eval-task-idx-intervals 0-0 --max-path-len 200 18 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_pp --realrobot-target-obj=carrot --num-tasks 8 --num-train-target-demos-per-task 400 --focus-train-task-idx-intervals 4-4 --focus-train-tasks-sample-prob 0.2 --num-focus-train-demos-per-task 100 --mmd-coefficient 0.01 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 50 --gpu 0 --num-epochs 300 --seed 1440
Domain Randomization
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../1pp_domain-rando_sim2real.hdf5] --xdomain-buffer-envs Multitaskv2 frka_pp --xdomain-env-instruct-prefixes Multitaskv2:Simulation frka_pp:Real --train-target-task-idx-intervals 0-4 --eval-task-idx-intervals 0-0 --max-path-len 200 18 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_pp --realrobot-target-obj=carrot --num-tasks 8 --num-train-target-demos-per-task 400 --focus-train-task-idx-intervals 4-4 --focus-train-tasks-sample-prob 0.2 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 50 --gpu 0 --num-epochs 300 --seed 1640
ADR+RNA
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../1pp_adr-rna_sim2real.hdf5] --xdomain-buffer-envs Multitaskv2 frka_pp --xdomain-env-instruct-prefixes Multitaskv2:Simulation frka_pp:Real --train-target-task-idx-intervals 0-4 --eval-task-idx-intervals 0-0 --max-path-len 200 18 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_pp --realrobot-target-obj=carrot --num-tasks 8 --num-train-target-demos-per-task 400 --focus-train-task-idx-intervals 4-4 --focus-train-tasks-sample-prob 0.2 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 50 --gpu 0 --num-epochs 300 --seed 1340

Sim2Sim (target domain, prior task data)

Ours
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../1pp_sim2sim.hdf5] --xdomain-buffer-envs Multitaskv2 Multitaskv2_ang1_fr5damp50 --xdomain-env-instruct-prefixes Multitaskv2:Simulation Multitaskv2_ang1_fr5damp50:Real --train-target-task-idx-intervals 0-4 --eval-task-idx-intervals 0-0 --max-path-len 200 200 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env Multitaskv2_ang1_fr5damp50 --realrobot-target-obj="" --num-tasks 8 --num-train-target-demos-per-task 400 --focus-train-task-idx-intervals 4-4 --focus-train-tasks-sample-prob 0.2 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --policy-cnn-ckpt [.../phase1_cnn_ckpt.pt] --policy-cnn-ckpt-unfrozen-mods film+cnnlastlayer --save-checkpoint-freq 1000 --gpu 0 --num-epochs 500 --seed 460
No Pretrain (real)
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../1pp_sim2sim.hdf5] --xdomain-buffer-envs Multitaskv2 Multitaskv2_ang1_fr5damp50 --xdomain-env-instruct-prefixes Multitaskv2:Simulation Multitaskv2_ang1_fr5damp50:Real --train-target-task-idx-intervals 4-4 7-7 --eval-task-idx-intervals 0-0 --max-path-len 200 200 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env Multitaskv2_ang1_fr5damp50 --realrobot-target-obj="" --num-tasks 8 --num-train-target-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 1000 --gpu 0 --num-epochs 500 --seed 160
No Pretrain (sim+real)
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../1pp_sim2sim.hdf5] --xdomain-buffer-envs Multitaskv2 Multitaskv2_ang1_fr5damp50 --xdomain-env-instruct-prefixes Multitaskv2:Simulation Multitaskv2_ang1_fr5damp50:Real --train-target-task-idx-intervals 0-4 7-7 --eval-task-idx-intervals 0-0 --max-path-len 200 200 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env Multitaskv2_ang1_fr5damp50 --realrobot-target-obj="" --num-tasks 8 --num-train-target-demos-per-task 400 --focus-train-task-idx-intervals 4-4 7-7 --focus-train-tasks-sample-prob 0.333 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 1000 --gpu 0 --num-epochs 500 --seed 260
CLIP
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../1pp_sim2sim.hdf5] --xdomain-buffer-envs Multitaskv2 Multitaskv2_ang1_fr5damp50 --xdomain-env-instruct-prefixes Multitaskv2:Simulation Multitaskv2_ang1_fr5damp50:Real --train-target-task-idx-intervals 0-4 7-7 --eval-task-idx-intervals 0-0 --max-path-len 200 200 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode concat_to_img_embs --env Multitaskv2_ang1_fr5damp50 --realrobot-target-obj="" --num-tasks 8 --num-train-target-demos-per-task 400 --focus-train-task-idx-intervals 4-4 7-7 --focus-train-tasks-sample-prob 0.333 --num-focus-train-demos-per-task 100 --policy-cnn-type clip --clip-ckpt=[.../clip_ckpt.pt]  --freeze-clip --gpu 0 --num-epochs 500 --seed 660
R3M
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../1pp_sim2sim.hdf5] --xdomain-buffer-envs Multitaskv2 Multitaskv2_ang1_fr5damp50 --xdomain-env-instruct-prefixes Multitaskv2:Simulation Multitaskv2_ang1_fr5damp50:Real --train-target-task-idx-intervals 0-4 7-7 --eval-task-idx-intervals 0-0 --max-path-len 200 200 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode concat_to_img_embs --env Multitaskv2_ang1_fr5damp50 --realrobot-target-obj="" --num-tasks 8 --num-train-target-demos-per-task 400 --focus-train-task-idx-intervals 4-4 7-7 --focus-train-tasks-sample-prob 0.333 --num-focus-train-demos-per-task 100 --policy-cnn-type r3m --freeze-policy-cnn --gpu 0 --num-epochs 500 --seed 760

2-step Pick-and-Place

Sim2Real

Ours
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../2pp_sim2real.hdf5] --xdomain-buffer-envs PPObjToPotToStove frka_obj_bowl_plate --xdomain-env-instruct-prefixes PPObjToPotToStove:Simulation frka_obj_bowl_plate:Real --train-target-task-idx-intervals 0-4 --eval-task-idx-intervals 0-0 --max-path-len 320 45 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_obj_bowl_plate --realrobot-target-obj carrot  --num-tasks 6 --num-train-target-demos-per-task 1375 --focus-train-task-idx-intervals 4-4 --focus-train-tasks-sample-prob 0.2 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --policy-cnn-ckpt [.../phase1_cnn_ckpt.pt] --policy-cnn-ckpt-unfrozen-mods film+cnnlastlayer --save-checkpoint-freq 50 --gpu 0 --num-epochs 600 --seed 110
No Pretrain (real)
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../2pp_sim2real.hdf5] --xdomain-buffer-envs PPObjToPotToStove frka_obj_bowl_plate --xdomain-env-instruct-prefixes PPObjToPotToStove:Simulation frka_obj_bowl_plate:Real --train-target-task-idx-intervals 4-4 --eval-task-idx-intervals 0-0 --max-path-len 320 45 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_obj_bowl_plate --realrobot-target-obj carrot  --num-tasks 6 --num-train-target-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 50 --gpu 0 --num-epochs 600 --seed 110
No Pretrain (sim+real)
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../2pp_sim2real.hdf5] --xdomain-buffer-envs PPObjToPotToStove frka_obj_bowl_plate --xdomain-env-instruct-prefixes PPObjToPotToStove:Simulation frka_obj_bowl_plate:Real --train-target-task-idx-intervals 0-4 --eval-task-idx-intervals 0-0 --max-path-len 320 45 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_obj_bowl_plate --realrobot-target-obj carrot  --num-tasks 6 --num-train-target-demos-per-task 1375 --focus-train-task-idx-intervals 4-4 --focus-train-tasks-sample-prob 0.2 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 50 --gpu 0 --num-epochs 600 --seed 210
CLIP
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../2pp_sim2real.hdf5] --xdomain-buffer-envs PPObjToPotToStove frka_obj_bowl_plate --xdomain-env-instruct-prefixes PPObjToPotToStove:Simulation frka_obj_bowl_plate:Real --train-target-task-idx-intervals 0-4 --eval-task-idx-intervals 0-0 --max-path-len 320 45 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode concat_to_img_embs --env frka_obj_bowl_plate --realrobot-target-obj=carrot --num-tasks 6 --num-train-target-demos-per-task 1400 --focus-train-task-idx-intervals 4-4 --focus-train-tasks-sample-prob 0.2 --num-focus-train-demos-per-task 100 --policy-cnn-type clip --clip-ckpt=[.../clip_ckpt.pt]  --freeze-clip --save-checkpoint-freq 250 --gpu 0 --num-epochs 600 --seed 310
R3M
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../2pp_sim2real.hdf5] --xdomain-buffer-envs PPObjToPotToStove frka_obj_bowl_plate --xdomain-env-instruct-prefixes PPObjToPotToStove:Simulation frka_obj_bowl_plate:Real --train-target-task-idx-intervals 0-4 --eval-task-idx-intervals 0-0 --max-path-len 320 45 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode concat_to_img_embs --env frka_obj_bowl_plate --realrobot-target-obj=carrot --num-tasks 6 --num-train-target-demos-per-task 1400 --focus-train-task-idx-intervals 4-4 --focus-train-tasks-sample-prob 0.2 --num-focus-train-demos-per-task 100 --policy-cnn-type r3m --freeze-policy-cnn --save-checkpoint-freq 50 --gpu 0 --num-epochs 600 --seed 310
MMD
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../2pp_sim2real.hdf5] --xdomain-buffer-envs PPObjToPotToStove frka_obj_bowl_plate --xdomain-env-instruct-prefixes PPObjToPotToStove:Simulation frka_obj_bowl_plate:Real --train-target-task-idx-intervals 0-4 --eval-task-idx-intervals 0-0 --max-path-len 320 45 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_obj_bowl_plate --realrobot-target-obj carrot  --num-tasks 6 --num-train-target-demos-per-task 1375 --focus-train-task-idx-intervals 4-4 --focus-train-tasks-sample-prob 0.2 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --mmd-coefficient 0.01 --save-checkpoint-freq 50 --gpu 0 --num-epochs 600 --seed 220
Domain Randomization
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../2pp_domain-rando_sim2real.hdf5] --xdomain-buffer-envs PPObjToPotToStove frka_obj_bowl_plate --xdomain-env-instruct-prefixes PPObjToPotToStove:Simulation frka_obj_bowl_plate:Real --train-target-task-idx-intervals 0-4 --eval-task-idx-intervals 0-0 --max-path-len 320 45 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_obj_bowl_plate --realrobot-target-obj=carrot --num-tasks 8 --num-train-target-demos-per-task 1400 --focus-train-task-idx-intervals 4-4 --focus-train-tasks-sample-prob 0.2 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 50 --gpu 0 --num-epochs 600 --seed 1112
ADR+RNA
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../2pp_adr-rna_sim2real.hdf5] --xdomain-buffer-envs PPObjToPotToStove frka_obj_bowl_plate --xdomain-env-instruct-prefixes PPObjToPotToStove:Simulation frka_obj_bowl_plate:Real --train-target-task-idx-intervals 0-4 --eval-task-idx-intervals 0-0 --max-path-len 320 45 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_obj_bowl_plate --realrobot-target-obj=carrot --num-tasks 8 --num-train-target-demos-per-task 1400 --focus-train-task-idx-intervals 4-4 --focus-train-tasks-sample-prob 0.2 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 50 --gpu 0 --num-epochs 500 --seed 2012

Sim2Sim (target domain, prior task data)

Ours
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../2pp_sim2sim.hdf5] --xdomain-buffer-envs PPObjToPotToStove PPObjToPotToStove_ang1_fr5damp50 --xdomain-env-instruct-prefixes PPObjToPotToStove:Simulation PPObjToPotToStove_ang1_fr5damp50:Real --train-target-task-idx-intervals 0-4 --eval-task-idx-intervals 0-0 --max-path-len 320 320 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env PPObjToPotToStove_ang1_fr5damp50 --realrobot-target-obj="" --num-tasks 8 --num-train-target-demos-per-task 1400 --focus-train-task-idx-intervals 4-4 --focus-train-tasks-sample-prob 0.2 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --policy-cnn-ckpt [.../phase1_cnn_ckpt.pt] --policy-cnn-ckpt-unfrozen-mods film+cnnlastlayer --save-checkpoint-freq 1000 --gpu 0 --num-epochs 600 --seed 420
No Pretrain (real)
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../2pp_sim2sim.hdf5] --xdomain-buffer-envs PPObjToPotToStove PPObjToPotToStove_ang1_fr5damp50 --xdomain-env-instruct-prefixes PPObjToPotToStove:Simulation PPObjToPotToStove_ang1_fr5damp50:Real --train-target-task-idx-intervals 4-4 7-7 --eval-task-idx-intervals 0-0 --max-path-len 320 320 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env PPObjToPotToStove_ang1_fr5damp50 --realrobot-target-obj="" --num-tasks 8 --num-train-target-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 1000 --gpu 0 --num-epochs 500 --seed 120
No Pretrain (sim+real)
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../2pp_sim2sim.hdf5] --xdomain-buffer-envs PPObjToPotToStove PPObjToPotToStove_ang1_fr5damp50 --xdomain-env-instruct-prefixes PPObjToPotToStove:Simulation PPObjToPotToStove_ang1_fr5damp50:Real --train-target-task-idx-intervals 0-4 7-7 --eval-task-idx-intervals 0-0 --max-path-len 320 320 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env PPObjToPotToStove_ang1_fr5damp50 --realrobot-target-obj="" --num-tasks 8 --num-train-target-demos-per-task 1400 --focus-train-task-idx-intervals 4-4 7-7 --focus-train-tasks-sample-prob 0.333 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 1000 --gpu 0 --num-epochs 600 --seed 220
CLIP
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../2pp_sim2sim.hdf5] --xdomain-buffer-envs PPObjToPotToStove PPObjToPotToStove_ang1_fr5damp50 --xdomain-env-instruct-prefixes PPObjToPotToStove:Simulation PPObjToPotToStove_ang1_fr5damp50:Real --train-target-task-idx-intervals 0-4 7-7 --eval-task-idx-intervals 0-0 --max-path-len 320 320 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode concat_to_img_embs --env PPObjToPotToStove_ang1_fr5damp50 --realrobot-target-obj="" --num-tasks 8 --num-train-target-demos-per-task 1400 --focus-train-task-idx-intervals 4-4 7-7 --focus-train-tasks-sample-prob 0.333 --num-focus-train-demos-per-task 100 --policy-cnn-type clip --clip-ckpt=[.../clip_ckpt.pt]  --freeze-clip --gpu 0 --num-epochs 600 --seed 620
R3M
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../2pp_sim2sim.hdf5] --xdomain-buffer-envs PPObjToPotToStove PPObjToPotToStove_ang1_fr5damp50 --xdomain-env-instruct-prefixes PPObjToPotToStove:Simulation PPObjToPotToStove_ang1_fr5damp50:Real --train-target-task-idx-intervals 0-4 7-7 --eval-task-idx-intervals 0-0 --max-path-len 320 320 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode concat_to_img_embs --env PPObjToPotToStove_ang1_fr5damp50 --realrobot-target-obj="" --num-tasks 8 --num-train-target-demos-per-task 1400 --focus-train-task-idx-intervals 4-4 7-7 --focus-train-tasks-sample-prob 0.333 --num-focus-train-demos-per-task 100 --policy-cnn-type r3m --freeze-policy-cnn --gpu 0 --num-epochs 600 --seed 720

Wrap Wire

Sim2Real

Ours
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../ww_sim2real.hdf5] --xdomain-buffer-envs WrapUnattachedWire frka_wirewrap --xdomain-env-instruct-prefixes WrapUnattachedWire:Simulation frka_wirewrap:Real --train-target-task-idx-intervals 0-1 --eval-task-idx-intervals 0-0 --max-path-len 200 45 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_wirewrap --realrobot-target-obj="eu white plug" --num-tasks 2 --num-train-target-demos-per-task 1000 --focus-train-task-idx-intervals 1-1 --focus-train-tasks-sample-prob 0.5 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --policy-cnn-ckpt [.../phase1_cnn_ckpt.pt] --policy-cnn-ckpt-unfrozen-mods film+cnnlastlayer --save-checkpoint-freq 50 --gpu 0 --num-epochs 600 --seed 410
No Pretrain (real)
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../ww_sim2real.hdf5] --xdomain-buffer-envs WrapUnattachedWire frka_wirewrap --xdomain-env-instruct-prefixes WrapUnattachedWire:Simulation frka_wirewrap:Real --train-target-task-idx-intervals 0-0 --eval-task-idx-intervals 0-0 --max-path-len 200 45 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_wirewrap --realrobot-target-obj="eu white plug" --num-tasks 2 --num-train-target-demos-per-task 1000 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 50 --gpu 0 --num-epochs 600 --seed 110
No Pretrain (sim+real)
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../ww_sim2real.hdf5] --xdomain-buffer-envs WrapUnattachedWire frka_wirewrap --xdomain-env-instruct-prefixes WrapUnattachedWire:Simulation frka_wirewrap:Real --train-target-task-idx-intervals 0-1 --eval-task-idx-intervals 0-0 --max-path-len 200 45 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_wirewrap --realrobot-target-obj="eu white plug" --num-tasks 2 --num-train-target-demos-per-task 1000 --focus-train-task-idx-intervals 1-1 --focus-train-tasks-sample-prob 0.5 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 50 --gpu 0 --num-epochs 600 --seed 210
CLIP
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../ww_sim2real.hdf5] --xdomain-buffer-envs WrapUnattachedWire frka_wirewrap --xdomain-env-instruct-prefixes WrapUnattachedWire:Simulation frka_wirewrap:Real --train-target-task-idx-intervals 0-1 --eval-task-idx-intervals 0-0 --max-path-len 200 45 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode concat_to_img_embs --env frka_wirewrap --realrobot-target-obj="eu white plug" --realrobot-obj-set 0 --num-tasks 3 --num-train-target-demos-per-task 1000 --focus-train-task-idx-intervals 1-1 --focus-train-tasks-sample-prob 0.5 --num-focus-train-demos-per-task 100 --policy-cnn-type clip --clip-ckpt=[.../clip_ckpt.pt]  --freeze-clip --save-checkpoint-freq 250 --gpu 0 --num-epochs 600 --seed 410
R3M
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../ww_sim2real.hdf5] --xdomain-buffer-envs WrapUnattachedWire frka_wirewrap --xdomain-env-instruct-prefixes WrapUnattachedWire:Simulation frka_wirewrap:Real --train-target-task-idx-intervals 0-1 --eval-task-idx-intervals 0-0 --max-path-len 200 45 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode concat_to_img_embs --env frka_wirewrap --realrobot-target-obj="eu white plug" --realrobot-obj-set 0 --num-tasks 3 --num-train-target-demos-per-task 1000 --focus-train-task-idx-intervals 1-1 --focus-train-tasks-sample-prob 0.5 --num-focus-train-demos-per-task 100 --policy-cnn-type r3m --freeze-policy-cnn --save-checkpoint-freq 50 --gpu 0 --num-epochs 600 --seed 410
MMD
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../ww_sim2real.hdf5] --xdomain-buffer-envs WrapUnattachedWire frka_wirewrap --xdomain-env-instruct-prefixes WrapUnattachedWire:Simulation frka_wirewrap:Real --train-target-task-idx-intervals 0-1 --eval-task-idx-intervals 0-0 --max-path-len 200 45 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_wirewrap --realrobot-target-obj="eu white plug" --num-tasks 2 --num-train-target-demos-per-task 1000 --focus-train-task-idx-intervals 1-1 --focus-train-tasks-sample-prob 0.5 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 50 --mmd-coefficient 1e-4 --gpu 0 --num-epochs 600 --seed 210
Domain Randomization
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../ww_domain-rando_sim2real.hdf5] --xdomain-buffer-envs WrapUnattachedWire frka_wirewrap --xdomain-env-instruct-prefixes WrapUnattachedWire:Simulation frka_wirewrap:Real --train-target-task-idx-intervals 0-1 --eval-task-idx-intervals 0-0 --max-path-len 200 45 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_wirewrap --realrobot-target-obj="eu white plug" --realrobot-obj-set 0 --num-tasks 5 --num-train-target-demos-per-task 1000 --focus-train-task-idx-intervals 1-1 --focus-train-tasks-sample-prob 0.333 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 50 --gpu 0 --num-epochs 500 --seed 1140
ADR+RNA
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../ww_adr-rna_sim2real.hdf5] --xdomain-buffer-envs WrapUnattachedWire frka_wirewrap --xdomain-env-instruct-prefixes WrapUnattachedWire:Simulation frka_wirewrap:Real --train-target-task-idx-intervals 0-1 --eval-task-idx-intervals 0-0 --max-path-len 200 45 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env frka_wirewrap --realrobot-target-obj="eu white plug" --realrobot-obj-set 0 --num-tasks 5 --num-train-target-demos-per-task 1000 --focus-train-task-idx-intervals 1-1 --focus-train-tasks-sample-prob 0.5 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 50 --gpu 0 --num-epochs 500 --seed 2040

Sim2Sim (target domain, prior task data)

Ours
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../ww_sim2sim.hdf5] --xdomain-buffer-envs WrapUnattachedWire_v2 WrapUnattachedWire_ang1_fr5damp50_v2 --xdomain-env-instruct-prefixes WrapUnattachedWire_v2:Simulation WrapUnattachedWire_ang1_fr5damp50_v2:Real --train-target-task-idx-intervals 0-2 --eval-task-idx-intervals 0-0 --max-path-len 200 200 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env WrapUnattachedWire_ang1_fr5damp50_v2 --realrobot-target-obj="" --num-tasks 4 --num-train-target-demos-per-task 400 --focus-train-task-idx-intervals 2-2 --focus-train-tasks-sample-prob 0.333 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --policy-cnn-ckpt /home/mini_exps/lang4sim2real/phase1/2024-01-28_19-56-38/best.pt  --policy-cnn-ckpt-unfrozen-mods film+cnnlastlayer --save-checkpoint-freq 1000 --gpu 0 --num-epochs 600 --seed 321
No Pretrain (real)
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../ww_sim2sim.hdf5] --xdomain-buffer-envs WrapUnattachedWire_v2 WrapUnattachedWire_ang1_fr5damp50_v2 --xdomain-env-instruct-prefixes WrapUnattachedWire_v2:Simulation WrapUnattachedWire_ang1_fr5damp50_v2:Real --train-target-task-idx-intervals 2-3 --eval-task-idx-intervals 0-0 --max-path-len 200 200 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env WrapUnattachedWire_ang1_fr5damp50_v2 --realrobot-target-obj="" --num-tasks 4 --num-train-target-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 1000 --gpu 0 --num-epochs 500 --seed 110
No Pretrain (sim+real)
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../ww_sim2sim.hdf5] --xdomain-buffer-envs WrapUnattachedWire_v2 WrapUnattachedWire_ang1_fr5damp50_v2 --xdomain-env-instruct-prefixes WrapUnattachedWire_v2:Simulation WrapUnattachedWire_ang1_fr5damp50_v2:Real --train-target-task-idx-intervals 0-3 --eval-task-idx-intervals 0-0 --max-path-len 200 200 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode film --policy-num-film-inputs 1 --env WrapUnattachedWire_ang1_fr5damp50_v2 --realrobot-target-obj="" --num-tasks 4 --num-train-target-demos-per-task 400 --focus-train-task-idx-intervals 2-3 --focus-train-tasks-sample-prob 0.5 --num-focus-train-demos-per-task 100 --policy-resnet-conv-strides=2,2,1,1,1 --save-checkpoint-freq 1000 --gpu 0 --num-epochs 600 --seed 210
CLIP
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../ww_sim2sim.hdf5] --xdomain-buffer-envs WrapUnattachedWire_v2 WrapUnattachedWire_ang1_fr5damp50_v2 --xdomain-env-instruct-prefixes WrapUnattachedWire_v2:Simulation WrapUnattachedWire_ang1_fr5damp50_v2:Real --train-target-task-idx-intervals 0-3 --eval-task-idx-intervals 0-0 --max-path-len 200 200 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode concat_to_img_embs --env WrapUnattachedWire_ang1_fr5damp50_v2 --realrobot-target-obj="" --num-tasks 4 --num-train-target-demos-per-task 400 --focus-train-task-idx-intervals 2-3 --focus-train-tasks-sample-prob 0.5 --num-focus-train-demos-per-task 100 --policy-cnn-type clip --clip-ckpt=[.../clip_ckpt.pt]  --freeze-clip --gpu 0 --num-epochs 500 --seed 610
R3M
python train-lang4sim2real/experiments/multitask_bc.py --train-target-buffers [.../ww_sim2sim.hdf5] --xdomain-buffer-envs WrapUnattachedWire_v2 WrapUnattachedWire_ang1_fr5damp50_v2 --xdomain-env-instruct-prefixes WrapUnattachedWire_v2:Simulation WrapUnattachedWire_ang1_fr5damp50_v2:Real --train-target-task-idx-intervals 0-3 --eval-task-idx-intervals 0-0 --max-path-len 200 200 --batch-size 57 --meta-batch-size 4 --task-emb-input-mode concat_to_img_embs --env WrapUnattachedWire_ang1_fr5damp50_v2 --realrobot-target-obj="" --num-tasks 4 --num-train-target-demos-per-task 400 --focus-train-task-idx-intervals 2-3 --focus-train-tasks-sample-prob 0.5 --num-focus-train-demos-per-task 100 --policy-cnn-type r3m --freeze-policy-cnn --gpu 0 --num-epochs 500 --seed 710

Step 4. Evaluate Policies

Sim2Sim

Evaluation Metrics can be found in the experiment output folder, in progress.csv in the eval/env_infos/final/reward Mean CSV key.

You may find python train-lang4sim2real/rlkit/plot/metric_calculator_by_split.py [list of .../exp_dir or parent of exp dirs] useful for computing all successes and averaging them based on specific hyperparameters.

Sim2Real

To evaluate CLIP/R3M policies, you will take the ckpt .pt file from the experiment output folder, and add the --cnn-type clip or --cnn-type r3m flag. During evaluation, we gave 10% extra timesteps for the policy to finish executing (so if during data collection we allocated 18 timesteps for a trajectory, during evaluation we allowed 20, etc.).

Pick-and-Place

python deoxys-lang4sim2real/deoxys/scripts/eval_collector.py --ckpt [ckpt] --obj-id [obj-id] --env frka_pp --state-mode 1 --task-embedding lang --lang-prefix Real: --max-path-len 20 --num-tasks 2 --eval-task-idxs 0-0 --num-rollouts-per-task 10 --gpu 0

2-step Pick-and-Place

python deoxys-lang4sim2real/deoxys/scripts/eval_collector.py --ckpt [ckpt] --obj-id [obj-id] --env frka_obj_bowl_plate --state-mode 1 --task-embedding lang --lang-prefix Real: --max-path-len 50 --num-tasks 2 --eval-task-idxs 0-0 --num-rollouts-per-task 10 --gpu 0

Wire Wrap

python deoxys-lang4sim2real/deoxys/scripts/eval_collector.py --ckpt [ckpt] --obj-id [obj-id] --env frka_wirewrap --state-mode 1 --task-embedding lang --lang-prefix Real: --max-path-len 50 --num-tasks 2 --eval-task-idxs 0-0 --num-rollouts-per-task 10 --gpu 0

About

Code for Data Collection & Training in Sim+Real Envs: [RSS 2024] Natural Language Can Help Bridge the Sim2Real Gap

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published