Rewards from Human Videos

Learn agent- and domain-agnostic reward functions from human videos that can be adapted to various robots and environments.

Setup for DVD reproduction

Download Something Something from here.
Install Mujoco 2.0 and mujoco-py. Instructions for this are here.
Clone this repository
Create and activate conda environment

conda env create -f conda_env_setup.yml
conda activate dvd_t2t

Add references to modified versions of Metaworld, tensor2tensor, dvd/sim_envs, pytorch_mmpi.

cd metaworld
pip install -e .

cd tensor2tensor
pip install -e .

cd dvd/sim_envs
pip install -e .

Reproducing DVD

For details as to what the commands do/what the arguments are, refer to the original repo.

We can currently reproduce the training command as follows (you might need to use a different version of Pillow - see troubleshooting section below):

cd dvd
python train.py --num_tasks 6 --traj_length 0 --log_dir path/to/train/model/output --similarity --batch_size 24 --im_size 120 --seed 0 --lr 0.01 --pretrained --human_data_dir path/to/smthsmth/sm/20bn-something-something-v2 --sim_dir demos/ --human_tasks 5 41 44 46 93 94 --robot_tasks 5 41 93 --add_demos 60 --gpus 0

We can currently collect robot demos as follows:

python collect_data.py --xml env1 --task_num 94

Testing learned reward function

Using collect_data script above, we can generate sample trajectories in the env directory. We can evaluate the rewards for each of these trajectories against a demo video and get the average reward.

python reward_inference.py --eval_path data/file/from/collect_data/script --demo_path path/to/demo

Run inference with human demos on DVD tasks:

python cem_plan_open_loop.py --num_tasks 2 --task_id 5 --dvd --demo_path demos/task5 --checkpoint /path/to/discriminator/model

Run inference using ground truth (my engineered) rewards:

python cem_plan_open_loop.py --num_tasks 2 --task_id 5 --engineered_rewards

Adding State and Visual Dynamics model

State dynamics model using PETS:

conda activate dvd_pets
cd dvd
git checkout state_history
python cem_plan_learned_dynamics.py --task_id 5 --engineered_rewards --learn_dynamics_model

OR 

git checkout visual_dynamics
python cem_plan_state_dynamics.py --task_id 5 --engineered_rewards --learn_dynamics_model

Training visual dynamics model using pydreamer

conda activate dvd_pydreamer
cd pydreamer
git checkout visual_dynamics

CUDA_VISIBLE_DEVICES=0,1 python train.py --configs defaults tabletop --run_name tabletop

Inference with CEM closed loop using visual dynamics model

cd dvd
[ADD HERE]

Preparing inpainted data

One can inpaint using the data_inpaint.py script as follows:

conda activate e2fgvi
cd dvd

python data_inpaint.py --human_data_dir /path/to/smthsmth/sm --human_tasks 5 41 94 --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl

To do this with EgoHOS segmentations instead of Detectron segmentations:

conda activate dvd_e2fgvi_detectron_egohos
cd dvd

python data_inpaint_egohos.py --human_data_dir /path/to/smthsmth/sm --human_tasks 5 41 94

Training and inference on human-only inpainted data

We might want to train on only human data. In that case, we must set add_demos to 0. Adding the --inpaint flag will indicate that we are using inpainted videos and will modify the log file name appropriately.

conda activate dvd_t2t
cd dvd

python train.py --num_tasks 6 --traj_length 0 --log_dir path/to/train/model/output --similarity --batch_size 24 --im_size 120 --seed 0 --lr 0.01 --pretrained --human_data_dir path/to/smthsmth/sm/20bn-something-something-v2 --human_tasks 5 41 44 46 93 94 --add_demos 0 --inpaint --gpus 0

To run CEM planning on human-only inpainted data

conda activate dvd_e2fgvi_detectron
cd dvd

python cem_plan_inpaint.py --task_id 5 --dvd --demo_path demos/task5 --checkpoint /path/to/trained/reward/model

To do this with EgoHOS segmentations instead of Detectron segmentations:

conda activate dvd_e2fgvi_detectron_egohos
cd dvd

python cem_plan_inpaint_egohos.py --task_id 5 --dvd --demo_path demos/task5 --checkpoint /path/to/trained/reward/model

Troubleshooting

For training, make sure the following versions of protobuf and pillow are installed.

pip install protobuf==3.9.2 pillow==6.1.0

Name		Name	Last commit message	Last commit date
Latest commit History 261 Commits
dvd		dvd
handful-of-trials-pytorch		handful-of-trials-pytorch
metaworld		metaworld
pydreamer		pydreamer
tcc-video		tcc-video
tensor2tensor		tensor2tensor
vip		vip
.gitignore		.gitignore
README.md		README.md
conda_env_setup.yml		conda_env_setup.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dvd

dvd

handful-of-trials-pytorch

handful-of-trials-pytorch

metaworld

metaworld

pydreamer

pydreamer

tcc-video

tcc-video

tensor2tensor

tensor2tensor

vip

vip

.gitignore

.gitignore

README.md

README.md

conda_env_setup.yml

conda_env_setup.yml

requirements.txt

requirements.txt

Repository files navigation

Rewards from Human Videos

Setup for DVD reproduction

Reproducing DVD

Testing learned reward function

Adding State and Visual Dynamics model

Preparing inpainted data

Training and inference on human-only inpainted data

Troubleshooting

About

Releases

Packages

Languages

adityak77/rewards-from-human-videos

Folders and files

Latest commit

History

Repository files navigation

Rewards from Human Videos

Setup for DVD reproduction

Reproducing DVD

Testing learned reward function

Adding State and Visual Dynamics model

Preparing inpainted data

Training and inference on human-only inpainted data

Troubleshooting

About

Resources

Stars

Watchers

Forks

Languages