***NEW*** TIDEE on 2023 Room Rearrangement with improved navigation! See branch rearrange2023 (or click here)
This repo contains code and data for running TIDEE and the tidy task.
Note: We have tested this on a remote cluster with CUDA versions 10.2 and 11.1. The dependencies are for running the full TIDEE system. A reduced environment can be used if only running the tidy task and not the TIDEE networks.
(1) For training and running all TIDEE networks for the tidy task and room rearrangement, start by cloning the repository:
git clone git@github.com:Gabesarch/TIDEE.git
(1a) (optional) If you are using conda, create an environment:
conda create -n TIDEE python=3.8
You also will want to set CUDA paths. For example (on our tested machine with CUDA 11.1):
export CUDA_HOME="/opt/cuda/11.1.1"
export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:$LD_LIBRARY_PATH"
(2) Install PyTorch with the CUDA version you have. For example, run the following for CUDA 11.1:
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
(3) Install additional requirements:
pip install -r requirements.txt
(4) Install PyG with correct PyTorch and CUDA version. For example, run the following for PyTorch 1.8.1 & CUDA 11.1:
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.1+cu111.html
pip install torch-sparse==0.6.12 -f https://pytorch-geometric.com/whl/torch-1.8.1+cu111.html
pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-1.8.1+cu111.html
pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.8.1+cu111.html
pip install torch-geometric
(5) Install Detectron2 (needed for SOLQ detector) with correct PyTorch and CUDA version. E.g. for PyTorch 1.8 & CUDA 11.1:
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.8/index.html
(6) Build SOLQ deformable attention:
cd ./SOLQ/models/ops && sh make.sh && cd ../../..
To run the Ai2THOR simulator on a headless machine, you must either stat an X-server or use Ai2THOR's new headless mode.
To start an X-server with any of the scripts, you can simply append --start_startx
to the arguments. You can specify the X-server port use the --server_port
argument.
Alternatively, you can use Ai2THOR's new headless rendering by appending --do_headless_rendering
to the arguments.
The Tidy Task involves detecting and moving out of place objects to plausible places within the scene without any instructions. You can see task_base/messup.py
for our data generation code to move objects out of place. See task_base/example.py
for an example script of running the task with random actions. To run the tidy task, the tidy task dataset must be downloaded (see Dataset)
Our tidy task dataset contains 8000
training scenes, 200
validation scenes, and 100
testing scenes with five objects in each scene moved out of place. To run the tidy task with the generated scenes, download the scene metadata from here and place the extracted contents inside of the data
folder.
To run the full TIDEE pipeline on the tidy task, do the following:
(1) Download all model checkpoints (see Pretrained Networks) and add them to checkpoints
. Then, download the tidy task dataset (see Dataset) and add it to the data
folder.
(2) Download the visual memex graph data from here, and place the pickle file in the data
folder.
(3) Run TIDEE on the tidy task using the following command:
python main.py --mode TIDEE --do_predict_oop --eval_split test --do_visual_memex --do_vsn_search --do_visual_oop --do_add_semantic
Evaluation images can be logged by adding (for example) the following to the arguments:
--log_every 1 --save_object_images --image_dir tidy_task
And an .mp4 movie of each episode can be logged by adding (for example) the following to the arguments:
--create_movie --movie_dir tidy_task
This section details how to train the Out of Place Detector.
We first train SOLQ with two prediction heads (one for category, one for out of place). See models/aithor_solq.py
and models/aithor_solq_base.py
for code details, and arguments.py
for training argument details.
python main.py --mode solq --S 5 --data_batch_size 5 --lr_drop 7 --run_val --load_val_agent --val_load_dir ./data/val_data/aithor_tidee_oop --plot_boxes --plot_masks --randomize_scene_lighting_and_material --start_startx --do_predict_oop --load_base_solq --mess_up_from_loaded --log_freq 250 --val_freq 250 --set_name TIDEE_solq_oop
To train the visual and language detector, you can run the following (see models/aithor_bert_oop_visual.py
and models/aithor_solq_base.py
for details):
python main.py --mode visual_bert_oop --do_visual_and_language_oop --S 3 --data_batch_size 3 --run_val --load_val_agent --val_load_dir ./data/val_data/aithor_tidee_oop_VL --n_val 3 --load_train_agent --train_load_dir ./data/train_data/aithor_tidee_oop_VL --n_train 50 --randomize_scene_lighting_and_material --start_startx --do_predict_oop --mess_up_from_loaded --save_freq 2500 --log_freq 250 --val_freq 250 --max_iters 25000 --keep_latest 5 --start_one --score_threshold_oop 0.0 --score_threshold_cat 0.0 --set_name TIDEE_oop_vis_lang
The above will generate training and validation data from the simulator if the data does not already exist.
This section details how to train the Neural Associative Memory Graph Network.
To train the visual memex, the following steps are required:
(1) Make sure you have the SOLQ checkpoint (see Pretrained Networks) in the checkpoints
folder.
(2) (skip if already done for Visual Search Network) first generate some observations of the mapping phase to use for the scene graph features. These can be generated and saved by running the following command:
python main.py --mode generate_mapping_obs --start_startx --do_predict_oop --mapping_obs_dir ./data/mapping_obs
This will generate the mapping observations to mapping_obs_dir
(Note: this data will be ~200GB).
Or, alternatively, download the mapping observations from here and place the extracted contents in the data
folder.
(3) Train the graph network (see models/aithor_visrgcn.py
and models/aithor_visrgcn_base.py
for details):
python main.py --mode visual_memex --run_val --load_val_agent --do_predict_oop --radius_max 3.0 --num_mem_houses 5 --num_train_houses 15 --load_visual_memex --do_load_oop_nodes_and_supervision --vmemex_supervision_dir /projects/katefgroup/project_cleanup/tidee_final/vmemex_supervision_dir --only_include_receptacle --objects_per_scene 15 --scenes_per_batch 10 --mapping_obs_dir ./data/mapping_obs --load_model --load_model_path ./checkpoints/vrgcn-00002000.pth --set_name tidee_vmemex07
This section details how to train the Visual Search Network.
To train the Visual Search Network, the following steps are required:
(1) Make sure you have the SOLQ checkpoint (see Pretrained Networks) in the checkpoints
folder.
(2) (skip if already done for Neural Associative Memory Graph Network) first generate some observations of the mapping phase. These can be generated and saved by running the following command:
python main.py --mode generate_mapping_obs --start_startx --do_predict_oop --mapping_obs_dir ./data/mapping_obs
This will generate the mapping observations to mapping_obs_dir
(Note: this data will be ~200GB).
Or, alternatively, download the mapping observations from here and place the extracted contents in the data
folder.
(3) Train the graph network (see models/aithor_visualsearch.py
and models/aithor_visualsearch_base.py
for details):
python main.py --mode visual_search_network --run_val --objects_per_scene 3 --scenes_per_batch 6 --n_val 8 --objects_per_scene_val 2 --mapping_obs_dir ./data/mapping_obs --do_add_semantic --log_freq 250 --val_freq 250 --set_name tidee_vsn
To run the object goal navigation evaluation from the paper using the Visual Search Network, run:
python main.py --mode visual_search_network --eval_object_nav --object_navigation_policy_name vsn_search --load_model --load_model_path ./checkpoints/vsn-00013500.pth --tag tidee_object_nav_vsn --do_predict_oop --detector_threshold_object_nav 0.5 --visibilityDistance 1.0 --max_steps_object_goal_nav 200 --nms_threshold 0.5
To run the object goal navigation evaluation from the paper without the Visual Search Network, run:
python main.py --mode visual_search_network --eval_object_nav --object_navigation_policy_name random --tag tidee_object_nav_novsn --do_predict_oop --detector_threshold_object_nav 0.5 --visibilityDistance 1.0 --max_steps_object_goal_nav 200 --nms_threshold 0.5
All pretrained model checkpoints can be downloaded here.
For use with the tidy task or room rearrangement, place all checkpoints directly in the checkpoints
folder.
The evaluation code for the rearrangement challenge task is taken from Visual Room Rearrangement and is included in the current repo in rearrangement
modified to include estimated depth, noisy pose, noisy depth, and TIDEE config.
To run TIDEE on the 2022 rearrangement benchmark combined set (train, val, test), run (for example) the following:
python main.py --mode rearrangement --tag TIDEE_rearrengement_2022 --OT_dist_thresh 1.0 --thresh_num_dissimilar -1 --match_relations_walk --HORIZON_DT 30 --log_every 25 --dataset 2022 --eval_split combined
To run TIDEE on the 2021 rearrangement benchmark combined set (train, val, test), run (for example) the following:
python main.py --mode rearrangement --tag TIDEE_rearrengement_2021 --OT_dist_thresh 1.0 --thresh_num_dissimilar -1 --match_relations_walk --HORIZON_DT 30 --log_every 25 --dataset 2021 --eval_split combined
All metrics will be saved in the folder metrics
every log_every
episodes (specified by arguments).
To run with the open and close, append --do_open
.
Noisy measurements:
(1) To run using estimated depth, append --estimate_depth
.
(2) To run using noisy pose, append --noisy_pose
.
(3) To run using noisy depth, append --noisy_depth
.
If you like this paper, please cite us:
@inproceedings{sarch2022tidee,
title = "TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Common Sense Priors",
author = "Sarch, Gabriel and Fang, Zhaoyuan and Harley, Adam W. and Schydlo, Paul and Tarr, Michael J. and Gupta, Saurabh and Fragkiadaki, Katerina",
booktitle = "European Conference on Computer Vision",
year = "2022"}