Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

RMM: A Recursive Mental Model for Dialogue Navigation

This repository contains code for the paper RMM: A Recursive Mental Model for Dialog Navigation.

  title={RMM: A Recursive Mental Model for Dialog Navigation},
  author={Homero Roman Roman and Yonatan Bisk and Jesse Thomason and Asli Celikyilmaz and Jianfeng Gao},
  booktitle={Findings of the 2020 Conference on Empirical Methods in Natural Language Processing},

Installation / Build Instructions

This repository is built from the Matterport3DSimulator codebase. The original installation instructions are included at In this document we outline the instructions necessary to work with the CVDN task.

We recommend using the mattersim Dockerfile to install the simulator. The simulator can also be built without docker but satisfying the project dependencies may be more difficult.


  • Ubuntu 16.04
  • Nvidia GPU with driver >= 384
  • Install docker with gpu support
  • Note: CUDA / CuDNN toolkits do not need to be installed (these are provided by the docker image)

Building using Docker

Build the docker image:

docker build -t cvdn .

Run the docker container, mounting both the git repo and the dataset:

docker run -it --volume `pwd`:/root/mount/Matterport3DSimulator -w /root/mount/Matterport3DSimulator cvdn

CVDN Dataset Download

Download the train, val_seen, val_unseen, and test splits of the whole CVDN dataset by executing the following script:


Matterport3D Dataset Download

To use the simulator you must first download the Matterport3D Dataset which is available after requesting access here. The download script that will be provided allows for downloading of selected data types.

The experiments rely on the ResNet-152-imagenet features which must be pre-processed before hand.

Pre-processed features can be obtained as follows:

mkdir -p img_features/
cd img_features/
wget -O
cd ..

Train and Evaluate


Pretraining is done using the classic speaker follower setup.

Agent pretraining:

python src/ --train_datasets=CVDN --eval_datasets=CVDN

Speaker pretraining:

python src/ --entity=speaker --train_datasets=CVDN --eval_datasets=CVDN

Pre-trained models are already included in results/baseline/CVDN_train_eval_CVDN/G1/v1/steps_4

Training and evaluating RMM

To train RMM with single branch evaluation run the following command:

python src/ --mode=gameplay --rl_mode=agent_speaker --train_datasets=CVDN --eval_datasets=CVDN

And to train RMM with multiple branch evaluation using the Action Probabilities, run the following command:

python src/ --mode=gameplay --eval_branching=3 --action_probs_branching --train_datasets=CVDN --eval_datasets=CVDN

Results are by default saved in


val_unseen_gps.csv will contain the Goal Progresses for all the evaluation entries at each time step a question is asked as well as a final goal progress for that entry.

Optional functionality

Including the flag --target_only indicates the agent to not ask questions and only use the target as textual guidance. Similarly, including the flag --current_q_a_only indicates that the agent will only use the latest question-answer pair and discard its dialogue history.


This repository is built upon the Matterport3DSimulator codebase.

The CVDN dataset was collected by Thomason et al. as outlined in the paper Vision-and-Dialog Navigation


This repository contains code for the paper RMM: A Recursive Mental Model for Dialog Navigation






No releases published


No packages published