Skip to content

YuankaiQi/ORIST

Repository files navigation

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

This is the repository of ORIST (ICCV 2021).

Some code in this repo are copied/modified from opensource implementations made available by PyTorch, HuggingFace, OpenNMT, Nvidia, and UNITER The object features are extracted using BUTD, with expanded object bounding boxes of REVERIE.

Features of the Code

  • Implemented distributed data parallel training (pytorch).
  • Some code optimization for fast training

Requirements

  • Install Docker with GPU support (There are lots of tutorials, just google it.)
  • Pull the docker image:
docker pull qykshr/ubuntu:orist 

Quick Start

  1. Download the processed data and pretrained models:

  2. Build Matterport3D simulator

    Build OSMesa version using CMake:

    mkdir build && cd build
    cmake -DOSMESA_RENDERING=ON ..
    make
    cd ../

    Other versions can refer to here

  3. Run inference:

    sh eval_scripts/xxx.sh

  4. Run training:

    sh run_scripts/xxx.sh

Citation

If this code or data is useful for your research, please consider citing:

@inproceedings{orist,
  author    = {Yuankai Qi and
               Zizheng Pan and
               Yicong Hong and
               Ming{-}Hsuan Yang and
               Anton van den Hengel and
               Qi Wu},
  title     = {The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation},
  booktitle   = {ICCV},
  pages     = {1655--1664},
  year      = {2021}
}

@inproceedings{reverie,
  author    = {Yuankai Qi and
               Qi Wu and
               Peter Anderson and
               Xin Wang and
               William Yang Wang and
               Chunhua Shen and
               Anton van den Hengel},
  title     = {{REVERIE:} Remote Embodied Visual Referring Expression in Real Indoor
               Environments},
  booktitle = {CVPR},
  pages     = {9979--9988},
  year      = {2020}
}

About

Know What and Know Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published