pip install -r requirements.txt
Install Matterport3D simulators:
git submodule update --init --recursive
sudo apt-get install libjsoncpp-dev libepoxy-dev libglm-dev libosmesa6 libosmesa6-dev libglew-dev libopencv-dev
sudo apt-get install libopencv-dev
mkdir build && cd build
cmake -DEGL_RENDERING=ON ..
# Replace the above line with following if it doesn't work:
# cmake -DOSMESA_RENDERING=ON ..
make -j8
bash ./tasks/R2R/data/download.sh
bash run/agent_clip_vit16.bash 0 # 0 is the id of GPU
bash run/speaker_clip_vit16.bash 0
bash run/foam_envdrop_clip_vit16.bash 0
@inproceedings{dou2022foam,
title={FOAM: A Follower-aware Speaker Model for Vision-and-Language Navigation},
author={Dou, Zi-Yi and Peng, Nanyun},
booktitle={Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
year={2022},
}
The code is based on EnvDrop and CLIP-ViL-VLN. We thank Hao Tan for the help with preprocessing.