KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation

Xiangyang Li and Zihan Wang and Jiahao Yang and Yaowei Wang and Shuqiang Jiang

This repository is the official implementation of KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation

Vision-and-language navigation (VLN) is the task to enable an embodied agent to navigate to a remote location following the natural language instruction in real scenes. Most of the previous approaches utilize the entire features or object-centric features to represent navigable candidates. However, these representations are not efficient enough for an agent to navigate to the target location. As knowledge provides crucial information which is complementary to visible content, in this paper, we propose a knowledge enhanced reasoning model (KERM) to leverage knowledge to improve agent navigation ability. Specifically, we first retrieve facts for the navigation views from the constructed knowledge base. And than we build a knowledge enhanced reasoning network, containing purification, fact-aware interaction, and instruction-guided aggregation modules, to integrate the visual features, history features, instruction features, and fact features for action prediction. Extensive experiments are conducted on the REVERIE, R2R, and SOON datasets. Experimental results demonstrate the effectiveness of the proposed method.

Requirements

Install Matterport3D simulators: follow instructions here.

export PYTHONPATH=Matterport3DSimulator/build:$PYTHONPATH

Install requirements:

conda create --name KERM python=3.8.0
conda activate KERM
pip install -r requirements.txt

Download dataset from Dropbox, including processed annotations, features and pretrained models from VLN-DUET. Put the data in datasets directory.
Download pretrained lxmert, and some files in directory bert-base can be downloaded from bert-base-uncased.

mkdir -p datasets/pretrained 
wget https://nlp.cs.unc.edu/data/model_LXRT.pth -P datasets/pretrained

Download preprocessed data and features of KERM from Baidu Netdisk, including features of knowledge base (vg.json), annotations of retrieved facts (knowledge.json), cropped image features (clip_crop_image.hdf5), and annotations of VisualGenome dataset (vg_annotations). Put the kerm_data in datasets directory.
Download trained KERM models from Baidu Netdisk.

Build knowledge base

The preprocessed knowledge data is provided, you can skip this part.

cd preprocess
python3 get_knowledge_base.py  # Build knowledge base from VisualGenome dataset (vg.json).
python3 get_fact_feature.py  # Get the features of knowledge base (vg.hdf5).
python3 get_crop_image_feature.py  # Get cropped image features (clip_crop_image.hdf5).
python3 retrieve_facts.py  # Retrieve knowledge facts for all visual regions (knowledge.json).

Pretraining

Combine behavior cloning and auxiliary proxy tasks in pretraining:

cd pretrain_src
bash run_reverie.sh # (run_soon.sh, run_r2r.sh, run_r4r.sh)

Fine-tuning & Evaluation

Use pseudo interative demonstrator to fine-tune the model:

cd knowledge_nav_src
bash scripts/run_reverie.sh # (run_soon.sh, run_r2r.sh)

Citation

@InProceedings{Li2023KERM,
  author  = {Xiangyang Li and Zihan Wang and Jiahao Yang and Yaowei Wang and Shuqiang Jiang},
  title   = {{KERM: K}nowledge Enhanced Reasoning for Vision-and-Language Navigation},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month     = {June},
  pages     = {2583-2592},
  year    = {2023},
  }

Acknowledgments

Our code is based on VLN-DUET, Xmodal-Ctx and CLIP (ViT-B/16). Thanks for their great works!

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github/workflows		.github/workflows
datasets/kerm_data		datasets/kerm_data
knowledge_nav_src		knowledge_nav_src
preprocess		preprocess
pretrain_src		pretrain_src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

datasets/kerm_data

datasets/kerm_data

knowledge_nav_src

knowledge_nav_src

preprocess

preprocess

pretrain_src

pretrain_src

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation

Xiangyang Li and Zihan Wang and Jiahao Yang and Yaowei Wang and Shuqiang Jiang

Requirements

Build knowledge base

Pretraining

Fine-tuning & Evaluation

Citation

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

xiangyangli-cn/KERM

Folders and files

Latest commit

History

Repository files navigation

KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation

Xiangyang Li and Zihan Wang and Jiahao Yang and Yaowei Wang and Shuqiang Jiang

Requirements

Build knowledge base

Pretraining

Fine-tuning & Evaluation

Citation

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Languages