This is the code used to generate HandMeThat Dataset and evaluate agents on it.
HandMeThat: Human-Robot Communication in Physical and Social Environments
Yanming Wan*, Jiayuan Mao*, and Joshua B. Tenenbaum
[Paper] [Supplementary Material] [Project Page] (* indicates equal contributions.)
Clone this repository:
git clone https://github.com/Simon-Wan/HandMeThat
Clone the third party repositories (XTX, ALFWorld):
git clone https://github.com/princeton-nlp/XTX.git
git clone https://github.com/alfworld/alfworld.git
Add the packages to your PYTHONPATH
environment variable.
export PYTHONPATH=.:$PYTHONPATH:<path_to_xtx>:<path_to_alfworld>
Create a conda environment for HandMeThat, and install the requirements.
conda create -n hand-me-that python=3.9
conda activate hand-me-that
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
conda install numpy scipy pyyaml networkx tabulate
conda install h5py tqdm click transformers
conda install -c conda-forge importlib_metadata
pip install jericho lark textworld opencv-python ai2thor jacinle
python -m spacy download en_core_web_sm
This includes the required python packages from the third-party repositories.
We provide 2 versions of the HandMeThat dataset. Version 1 is the original dataset presented in HandMeThat paper. Version 2 is a later version with more data pieces on fewer tasks. Please refer to Dataset Generation section for more details.
Download the version 1 (V1) dataset from Google Drive link
and place the zipped file at ./datasets/v1
.
Unzip the dataset so that ./datasets/v1/HandMeThat_with_expert_demonstration
is a folder containing 10,000 json files.
Download the version 2 (V2) dataset from Google Drive link and place the files at ./datasets/v2
.
Unzip the dataset so that ./datasets/v2/HandMeThat_with_expert_demonstration
is a folder containing 116,146 json files.
The data split information is presented in ./datasets/v2/HandMeThat_data_info.json
.
Play a HandMeThat game:
from data_generation.text_interface.jericho_env import HMTJerichoEnv
import numpy as np
step_limit = 40
dataset = './datasets/v2/HandMeThat_with_expert_demonstration'
eval_env = HMTJerichoEnv(dataset, split='test', fully=False, step_limit=step_limit)
obs, info = eval_env.reset()
print(obs.replace('. ', '.\n'))
for _ in range(step_limit):
action = input('> ')
# uncomment the following part to get started with a random agent instead
# _ = input('Press [Enter] to continue')
# action = np.random.choice(info['valid'])
# print('Action:', action)
obs, reward, done, info = eval_env.step(action)
print(obs.replace('. ', '.\n'), '\n\n')
if done:
break
print('moves: {}, score: {}'.format(info['moves'], info['score']))
Run python main.py
to execute the quickstart code.
To generate HandMeThat dataset:
python data_generation/generation.py --num 1000 --quest_type bring_me
To generate HandMeThat data on some particular goal, use the argument --goal to specify the goal index.
- The object hierarchy and initial position sampling space are specified in text files.
- All current goals are listed in
./data_generation/sampling/goal_sampling.py
, and new goals can be specified using the given templates. - To specify the number of objects in each category, please refer to the code.
- V2 Only contain the tasks on 25 selected goal templates, that are more easily predictable by humans.
- V2 Only contain "bring me" type instructions, and mainly focus on pick-and-place tasks.
- We generate more data on each specific goal.
- We add "subgoal" to each data piece, which is a FOL sequence corresponding to the wanted actions, and the information can be used in goal inference.
- We revise the process of random truncation of human trajectory as well as how human generate an utterance, to ensure that most of the generated tasks are human-solvable.
This current release contains the basic training setting for Seq2Seq, DRRN, and offline-DRRN models. The models can be evaluated on validation and test split.
We tested each model on both fully- and partially-observable setting on all four hardness levels. These experiment results are presented in the main paper and supplementary materials. The hyperparameter we used are the default values in this released repository.
To train the model (e.g., 'DRRN' with 'fully' observable setting):
python scripts/train_rl.py --model DRRN --observability fully
To evaluate the model (e.g., validate) on specific hardness level (e.g., level1):
python scripts/eval_rl.py --model DRRN --observability fully --level level1 --eval_split validate --memory_file memory_5 --weight_file weights_5
Use --model offlineDRRN
for offline-DRRN setting.
To train the model (e.g., 'partially' observable setting):
python scripts/train_seq.py --observability partially
To evaluate the model (e.g., test) on specific hardness level (e.g., level1):
python scripts/eval_seq.py --observability partially --level level1 --eval_split test --eval_model_name weights_50000.pt
To evaluate the random agent:
python scripts/eval.py --agent random --level level1 --eval_split test