Skip to content
Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries, ECCV 2018
Branch: master
Clone or download
Latest commit 9fa3a3a Jan 22, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
dmn_pytorch Typo correction, thanks to @laubravo Aug 23, 2018
examples Update README Jul 12, 2018
.codecov.yml Update README Jul 10, 2018
.coveragerc Omit __init__ and test files on linter Jul 6, 2018
.gitignore Update gitignore Oct 30, 2017
Jenkinsfile Clean __pycache__ Jul 6, 2018
LICENSE Initial commit Sep 29, 2017 Update Sep 22, 2018 Minor download data script update Nov 8, 2017 remove typo Jan 22, 2019


License Codacy Badge

PyTorch code for Dynamic Multimodal Instance Segmentation guided by natural language queries, ECCV 2018.

Project Page

A dark horse between three lighter horses


To execute this, you must have Python 3.6.*, PyTorch, Visdom, cupy, Cython, Numpy and Matplotlib installed. To accomplish this, we recommend installing the Anaconda Python distribution and use conda to install the dependencies, as it follows:

conda install matplotlib numpy cython
conda install pytorch torchvision cuda90 -c pytorch
conda install aria2 -c bioconda
pip install -U visdom opencv-python cupy-cuda90 pynvrtc tqdm

You will also require the ReferIt loader library, which you can clone from: To install it, you can use pip as it follows:

pip install git+

Finally, you will need to install the Simple Recurrent Unit (SRU):

pip install -U git+ --no-deps

Conda packages will be created on future releases.

Dataset download

Additionally, you must download the ReferIt, UNC, UNC+ and GRef datasets. To accomplish this, we provide the bash script that will take care of the required downloads.

bash download_data --path $PATH_TO_STORE_THE_DATASETS


Dataset Name Original Name Splits
referit RefCLEF train, val, trainval, test
unc RefCOCO train, val, testA, testB
unc+ RefCOCO+ train, val, testA, testB
gref RefCOCOg train, val


To train the model, you will need to provide the path to the directory that contains the aforementioned datasets, as well to other parameters required to train the model. To train the model with the low-resolution setup described on the original paper, please execute:

python -u -m dmn_pytorch.train --data $PATH_TO_STORE_THE_DATASETS --dataset $DATASET --val $SPLIT_TO_EVALUATE --backend dpn92 --num-filters 10 --lang-layers 3 --mix-we --save-folder $PATH_TO_STORE WEIGHT_SNAPSHOTS --snapshot $PATH_TO_THE_SNAPSHOT_FILE --accum-iters 1

To train the model on high-resolution, you just need to add the --high-res and --upsamp-amplification 32 flags to the previous command. Note: The snapshot file must correspond to the low resolution weights.

To inspect all the available parameters and their description, please execute python -m dmn_pytorch.train --help. Please refer to the datasets table displayed above to get more information about the dataset names and their respective available splits.


To evaluate the model, you can define the --eval-first and --epochs 0 parameter flags to dmn_pytorch.train as it follows:

python -u -m dmn_pytorch.train --data $PATH_TO_STORE_THE_DATASETS --dataset $DATASET --val $SPLIT_TO_EVALUATE --backend dpn92 --num-filters 10 --lang-layers 3 --mix-we --save-folder $PATH_TO_STORE WEIGHT_SNAPSHOTS --snapshot $PATH_TO_THE_SNAPSHOT_FILE --epochs 0 --eval-first

Results Visualization

Additionally, you can visualize the results of the DMN model with a set of pretrained weights on visdom. To do so, you can execute the dmn_pytorch.visdom_display script as it follows:

python -m dmn_pytorch.visdom_display --data $PATH_TO_STORE_THE_DATASETS --dataset $DATASET --split $SPLIT_TO_EVALUATE --backend dpn92 --num-filters 10 --lang-layers 3 --mix-we --num-images $NUMBER_OF_EXAMPLES_TO_DISPLAY --snapshot $PATH_TO_THE_SNAPSHOT_FILE --no-eval --visdom http://$HOST:$PORT --env $NAME_OF_THE_VISDOM_ENV


The pretrained weights provided below were trained on two phases: during the low-resolution phase, the DMN was trained on UNC during 24 epochs with a constant learning rate, which then were fine-tuned for the remaining datasets during 10 epochs. Finally, the high-resolution phase was done over all the datasets using the weights from the previous phase during a total number of 4 epochs.

Dataset Examples High-Resolution Pretrained Weights Splits Performance (mIoU)
Referit Referit Examples Link val 0.5328
test 0.5281
UNC UNC Examples Link val 0.4978
testA 0.5484
testB 0.4520
UNC+ UNC+ Examples Link val 0.3888
testA 0.4425
testB 0.3249
GRef GRef Examples Link val 0.3764

External Installation

The DMN can be used and imported as a regular Python package on your scripts. To install it, you can use pip:

pip install -U .

Then you can import it as it follows:

from dmn_pytorch import DMN

Contribution Guidelines

We follow PEP8 and PEP257 style guidelines. Feel free to send a PR or create an issue if you have any problem/question.


  title={Dynamic Multimodal Instance Segmentation guided by natural language queries},
  author={{Margffoy-Tuay}, E. and {P{\'e}rez}, J.~C. and {Botero}, E. and
	{Arbel{\'a}ez}, P.},
  journal={European Conference on Computer Vision (ECCV)},
You can’t perform that action at this time.