This repo contains code used for the following paper (arXiv):
Abhineet Jain*, Jack Kolb*, J.M. Abbess IV and Harish Ravichandar (2022). Evaluating the Effectiveness of Corrective Demonstrations and a Low-Cost Sensor for Dexterous Manipulation. Machine Learning in Human-Robot Collaboration: Bridging the Gap Workshop at the 17th Annual ACM/IEEE International Conference on Human-Robot Interaction (HRI 2022).
- Collecting demonstrations
- Training DAPG
- Collecting failures
- Collecting corrections
- Retraining with corrections
- Policy rollout and success rate
- Visualizing demonstrations
- In one terminal, open
sudo leapd
- In a new terminal, activate conda environment with
conda activate mjrl-env
. - Run
~/GT/CoRX/src/study_setup.sh --samples_filename <samples file>
where<samples file>
is the file name, NOT the path (ex:testing_initial_states_20
) - Enter the User ID for which we are collecting data
- Once you wave your hand over the LeapMotion, MuJoCo will open up.
- In a separate terminal via ssh, open
python control_demos.py
. The conductor will guide the subject to start the demonstration via this control. - The control will stop after collecting 35 demos. At this point, the study is over. MuJoCo will be open but we should force close it.
You can find these demonstrations saved in ~/GT/CoRX/hand_dapg/dapg/study_data/<user_id>/<user_id>_35_demos.pickle
.
Uncomment lines 88-91
and comment lines 82-85
in ~/GT/CoRX/mj_envs/mj_envs/hand_manipulation_suite/relocate_v0.py
. The values under MuJoCo boundaries
should be used for training and beyond.
Run ~/GT/CoRX/src/training_dapg.sh <user_id>
and enjoy.
This will take care of the following steps.
- Create
<user_id>_25_demos.pickle
and<user_id>_15_demos.pickle
demo files out of<user_id>_35_demos.pickle
. Runpython ~/GT/CoRX/src/create_additional_demo_files.py --user_id <user_id>
. - Run
bash study_data/setup_training_configs.bash <user_id>
to set up configuration files for different demonstration sources for a specific user. - Go to
cd ~/GT/CoRX/hand_dapg/dapg
- Run
python examples/job_script.py --output /home/kolb/GT/CoRX/hand_dapg/dapg/study_data/<user_id>/policies/<user_id>_35_policy/ --config /home/kolb/GT/CoRX/hand_dapg/dapg/study_data/<user_id>/configs/dapg_<user_id>_35.txt
- Repeat step 4 for 25 and 15 demo files.
Run ~/GT/CoRX/src/collect_failures.sh <user_id>
and enjoy.
This will take care of the following steps.
- Go to
cd ~/GT/CoRX/hand_dapg/dapg
- Run
MJPL python utils/visualize_policy.py --env_name relocate-v0 --policy study_data/<user_id>/policies/<user_id>_25_policy/iterations/best_policy.pickle --mode evaluation --collect_failures --record
- This will save the failures at
/home/kolb/GT/CoRX/hand_dapg/dapg/study_data/<user_id>/failures/<user_id>_25_failures.pickle
- Repeat the same for the 15 demo policy.
- Make sure that the
conda
environment is set tomjrl-env
andsudo leapd
,python2 cddm_leap.py
are running. - Run
~/GT/CoRX/src/corrections_setup.sh <user_id> <num_demos>
. - You'll have to input the same
<user_id>
as input. - Once you wave your hand over the LeapMotion, MuJoCo will open up.
- In a separate terminal via ssh, open
python control_demos.py --max_demos 10
. The conductor will guide the subject to start the demonstration via this control. - The control will stop after collecting 10 demos. At this point, the study is over. MuJoCo will be open but we should force close it.
You will find these demonstrations saved in /home/kolb/GT/CoRX/hand_dapg/dapg/study_data/<user_id>/corrections/<user_id>_<num_demos>_corrections.pickle
.
Follow this process for <num_demos>
as 15 and 25.
- Run
python ~/GT/CoRX/src/combine_demos.py --user_id <user_id> --input1 <demo file path 1> --input2 <demo file path 2> --output <output demo file path>
.
For example:
python ~/GT/CoRX/src/combine_demos.py --user_id 0201 --input1 /home/kolb/GT/CoRX/hand_dapg/dapg/study_data/0201/0201_training_20_demos.pickle --input2 /home/kolb/GT/CoRX/hand_dapg/dapg/study_data/0201/0201_testing_10_demos.pickle --output /home/kolb/GT/CoRX/hand_dapg/dapg/study_data/0201/0201_combined_20_10_1.pickle
This will concatenate the demos from the two input demo files and output them to the provided output demo file path.
After visualizing your demos, you may want to redo a few. Follow these steps:
- Know the indexes of the demos you want to correct, for example, indexes 5, 13, and 15
- Isolate the samples by running
python ~/GT/CoRX/src/fix_demos_util.py --task isolate_sample --sample_filepath <samples file path> --sample_index <sample index>
for each of the sample indexes; the samples file path is typically in~/GT/CoRX/hand_dapg/data/
- Collect the demo for the single sample by running
~/GT/CoRX/src/study_setup.sh <samples file name>
where the samples file name is the name of the samples file followed by_isolated_<index>
, for exampletraining_initial_states_20_isolated_5
. Run this for each of the indices. - Replace the sample by running
python ~/GT/CoRX/src/fix_demos_util.py --task replace_demo --replace_demo_input_filepath <demo file path> --replace_demo_index <demo index> --replace_demo_original_filepath <original demo file path>
for each of the indices.
Run ~/GT/CoRX/src/training_dapg_corrections.sh <user_id>
and enjoy.
This will take care of the following steps.
- Create
<user_id>_25_10_demos.pickle
and<user_id>_15_10_demos.pickle
demo files. Runpython ~/GT/CoRX/src/append_corrective_demos.py --user_id <user_id>
. - Follow steps 3 and 4 from the
Training DAPG
section. - Repeat step 2 for the 15 demo policy corrections.
- Go to
cd ~/GT/CoRX/hand_dapg/dapg
- Run
MJPL python utils/visualize_policy.py --env_name relocate-v0 --policy study_data/<user_id>/policies/<user_id>_25_policy/iterations/best_policy.pickle --mode evaluation
. If you don't want to visualize the rollouts, add a--record
flag. - This will perform 100 policy rollouts in MuJoCo and report task success for each rollout and total success rate.
- Repeat step 2 for each trained policy. We have 5 policies - 35 demos, 25 demos, 15 demos, 25 demos + 10 corrections, 15 demos + 10 corrections.
- Go to
cd ~/GT/CoRX/hand_dapg/dapg
- Run
MJPL python utils/visualize_collected_demos.py --env_name relocate-v0 --demo_file study_data/<user_id>/<user_id>_<num_demos>_demos.pickle
.
- Go to
cd ~/GT/CoRX/src
- Run
./collect_rollout_statistics.sh
. This will run the following for each user, for each policy MJPL python utils/visualize_policy.py --env_name relocate-v0 --policy study_data/<user_id>/policies/<user_id>_<num_demos>_policy/iterations/best_policy.pickle --mode evaluation --record --episodes 500