The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

This is the repository of ORIST (ICCV 2021).

Some code in this repo are copied/modified from opensource implementations made available by PyTorch, HuggingFace, OpenNMT, Nvidia, and UNITER The object features are extracted using BUTD, with expanded object bounding boxes of REVERIE.

Features of the Code

Implemented distributed data parallel training (pytorch).
Some code optimization for fast training

Requirements

Install Docker with GPU support (There are lots of tutorials, just google it.)
Pull the docker image:

docker pull qykshr/ubuntu:orist

Quick Start

Download the processed data and pretrained models:
- Processed data:
- For evaluation only:
- For training:
Build Matterport3D simulator

Build OSMesa version using CMake:
```
mkdir build && cd build
cmake -DOSMESA_RENDERING=ON ..
make
cd ../
```
Other versions can refer to here
Run inference:

sh eval_scripts/xxx.sh
Run training:

sh run_scripts/xxx.sh

Citation

If this code or data is useful for your research, please consider citing:

@inproceedings{orist,
  author    = {Yuankai Qi and
               Zizheng Pan and
               Yicong Hong and
               Ming{-}Hsuan Yang and
               Anton van den Hengel and
               Qi Wu},
  title     = {The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation},
  booktitle   = {ICCV},
  pages     = {1655--1664},
  year      = {2021}
}

@inproceedings{reverie,
  author    = {Yuankai Qi and
               Qi Wu and
               Peter Anderson and
               Xin Wang and
               William Yang Wang and
               Chunhua Shen and
               Anton van den Hengel},
  title     = {{REVERIE:} Remote Embodied Visual Referring Expression in Real Indoor
               Environments},
  booktitle = {CVPR},
  pages     = {9979--9988},
  year      = {2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
ckpt		ckpt
cmake		cmake
config		config
data		data
eval_scripts		eval_scripts
include		include
matterport_utils		matterport_utils
model		model
optim		optim
pybind11		pybind11
run_scripts		run_scripts
src		src
utils		utils
CMakeLists.txt		CMakeLists.txt
README.md		README.md
agent_r2r_multiloss.py		agent_r2r_multiloss.py
env.py		env.py
eval.py		eval.py
ndh_param.py		ndh_param.py
oracle.py		oracle.py
orist.png		orist.png
r2r_param_multiloss.py		r2r_param_multiloss.py
r2r_utils.py		r2r_utils.py
speaker.py		speaker.py
train_ndh_multiloss.py		train_ndh_multiloss.py
train_r2r_multiloss.py		train_r2r_multiloss.py
train_reverie_multiloss.py		train_reverie_multiloss.py

YuankaiQi/ORIST

Folders and files

Latest commit

History

Repository files navigation

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

Features of the Code

Requirements

Quick Start

Citation

About

Resources

Stars

Watchers

Forks

Languages