Skip to content
Code for "Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation"
Jupyter Notebook Python Shell
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data update download script Mar 20, 2019
experiments/release update download script Mar 20, 2019
github_plots Pouring extracted code for Tactile Rewind Mar 16, 2019
Plotting Birdview.ipynb Pouring extracted code for Tactile Rewind Mar 16, 2019
README.md
Training Reranker.ipynb Train 28D reranker Aug 25, 2019
attn.py Pouring extracted code for Tactile Rewind Mar 16, 2019
env.py Pouring extracted code for Tactile Rewind Mar 16, 2019
eval.py Pouring extracted code for Tactile Rewind Mar 16, 2019
follower.py allow cache Aug 21, 2019
make_speaker.py Pouring extracted code for Tactile Rewind Mar 16, 2019
model.py Pouring extracted code for Tactile Rewind Mar 16, 2019
paths.py Pouring extracted code for Tactile Rewind Mar 16, 2019
refine_search.py Pouring extracted code for Tactile Rewind Mar 16, 2019
requirements.txt Update requirements.txt Mar 16, 2019
run_search.py allow cache Aug 21, 2019
running_mean_std.py Pouring extracted code for Tactile Rewind Mar 16, 2019
scorer.py Pouring extracted code for Tactile Rewind Mar 16, 2019
speaker.py Pouring extracted code for Tactile Rewind Mar 16, 2019
train.py Pouring extracted code for Tactile Rewind Mar 16, 2019
train_speaker.py
utils.py Pouring extracted code for Tactile Rewind Mar 16, 2019
vocab.py Pouring extracted code for Tactile Rewind Mar 16, 2019

README.md

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

This is the PyTorch implementation of our paper:

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Liyiming Ke, Xiujun Li, Yonatan Bisk, Ari Holtzman, Zhe Gan, Jingjing Liu, Jianfeng Gao, Yejin Choi, Siddhartha Srinivasa. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. (Oral)

Motivation

Bibtex

@inproceedings{ke2017tactile,
  title={Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation},
  author={Liyiming Ke, Xiujun Li, Yonatan Bisk, Ari Holtzman, Zhe Gan, Jingjing Liu, Jianfeng Gao, Yejin Choi, Siddhartha Srinivasa.},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019}
}

Installation and Usage

  1. Our code was developed on Anaconda-installed python3.6, Pytorch 0.4.1 with one GPU TITAN Xp.

Set up the environment

  1. Install the Matterport3DSimulator. Then, download the pre-computed image features:
cd matterport3D
mkdir -p img_features/
cd img_features/
wget https://storage.googleapis.com/bringmeaspoon/img_features/ResNet-152-imagenet.zip -O ResNet-152-imagenet.zip
unzip ResNet-152-imagenet.zip
cd .. 
# After this step, `img_features/` should contain `ResNet-152-imagenet.tsv`.
  1. Download this repo, extract the contents into matterport3D/tasks/R2R.

  2. Download the Room-to-Room dataset, the synthetic data and the speaker model proposed by Speaker-Follower Models for Vision-and-Language Navigation (NIPS2018).

./tasks/R2R/data/download.sh
./tasks/R2R/data/download_precomputed_augmentation.sh
./tasks/R2R/experiments/release/download_speaker_release.sh
  1. Install python requirements:

pip install -r tasks/R2R/requirements.txt

Training and launching agents

  1. Train a seq2seq follower agent.

    First, to train the SMNA agent described in Self-Monitoring Navigation Agent, store it in ./tasks/R2R/experiments/smna/:

    python tasks/R2R/train.py --use_pretraining
                              --pretrain_splits train literal_speaker_data_augmentation_paths 
                              --feedback_method sample2step
                              --experiment_name smna
    

    Then, with only the trained agent, launch the FAST framework and evaluate on validation seen and unseen.

    python tasks/R2R/run_search.py --job search
                                   --load_follower tasks/R2R/experiments/smna/snapshots/[name of the latest model]
                                   --max_episode_len 40
                                   --K 20
                                   --logit 
                                   --experiment_name FAST_short
                                   --early_stop
    

    Note that the name of the trained model contains the performance, so you have to find the name for your agent. One example would be smna/snapshots/follower_with_pretraining_cg_pm_sample_imagenet_mean_pooled_1heads_val_seen_iter_10_val_seen-success_rate=0.12.

  2. Train a goal reranker.

    First, cache the search queue. The following command will save the cached json file to the root of matterport3D folder:

    python tasks/R2R/run_search --job cache
                                --load_follower tasks/R2R/experiments/smna/snapshots/[name of the latest model]
                                --max_episode_len 40
                                --K 20
                                --logit
                                --experiment_name cacheFAST
    

    Second, move Training Reranker.ipynb to the root of matterport3D folder. Sequentially run through it to produce a goal reranker under tasks/R2R/experiments/candidates_ranker_{} where {} is the performance.

    Finally, launch the framework with goal reranker.

    python tasks/R2R/run_search.py --job search
                                   --load_follower tasks/R2R/experiments/smna/snapshots/[name of the latest model]
                                   --max_episode_len 55
                                   --K 20
                                   --logit 
                                   --experiment_name tmp
                                   --load_reranker tasks/R2R/experiments/[name of the reranker model]
    

Reproducing results & Release model

If you want to skip all the steps above, here is my trained SMNA model (smna_model) and intermediate files (cache_XXX.json) that I used to produce the result in the paper: Google Drive,

Directory Structure

The main entrance to the framework is run_search.py. Follower agents live in follower.py and the main framework lives in def _rollout_with_search() within the Seq2SeqAgent class. Speaker agent in speaker.py. Various PyTorch modules are in attn.py, model.py.

  • attn.py
  • env.py
  • eval.py
  • follower.py
  • model.py
  • paths.py
  • refine_search.py
  • run_search.py
  • running_mean_std.py
  • speaker.py
  • train.py
  • utils.py
  • vocab.py

Resources and Tips

  1. If you have problem installing Matterport3DSimulator, Chih-Yao Ma, the author of Self-Monitoring Navigation Agent for Vision-and-Language Navigation (ICLR2019), has a great installation guide.

  2. The code is taken out from an actively developed repo. Some codes might be redundant but are kept here to allow the program to run.

  3. --image_feature_type none flag allow you to verify that your script is running, without loading the actual ImageFeatures.

  4. If you need more help, file an issue and drop me an email (kayke@cs_dot_washington_dot_edu), I'd get back to you as soon as I can!

Acknowledgements

This repository is built upon Speaker-Follower Models for Vision-and-Language Navigation (NIPS2018). The repository reproduced Self-Monitoring Navigation Agent for Vision-and-Language Navigation (ICLR2019).

You can’t perform that action at this time.