Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



7 Commits

Repository files navigation

Developed by Freda Shi and Jiayuan Mao.

This repo includes the implementation of our paper "Learning Visually-Grounded Semantics from Contrastive Adversarial Samples" at COLING 2018.



  • Python3
  • PyTorch 0.3.0
  • NLTK
  • spacy
  • word2number
  • TensorBoard
  • NumPy
  • Jacinle (Jiayuan's personal toolbox, required by two evaluation experiments)


We apply VSE++ (Faghri et al., 2017) as our base model. To reproduce the baseline numbers of VSE++, please follow the instructions here by the authors. We found that their results are easy to reproduce!


We use the same datasets as VSE++ (Faghri et al., 2017). Use the following commands to download the data of VSE to the root folder


and unzip the tars with

tar -xvf vocab.tar
tar -xvf data.tar

You may also need GloVe, as we apply GloVe.840B.300d as the initialization of the word embeddings. We also provide a custom subset of GloVe embeddings at VSE_C/data/glove.pkl.

Reproduce our Experiments

Generate Contrastive Adversarial Samples

The following commands generates specific types of contrastive adversarial samples of sentences. Note that the script will create folders in the initial data path, e.g., ../data/coco_precomp/noun_ex/.

cd adversarial_attack
python3 $ --data_path $DATA_PATH --data_name $DATA_NAME

$TYPE can be one of noun, numeral or relation. Here is an example command:

python3 --data_path ../data --data_name coco_precomp

Train VSE-C

Similar to VSE++, VSE-C supports training with contrastive adversarial samples in the text domain. After obtaining the contrastive adversarial samples, we can train an example noun-typed VSE-C with the following command (after generating noun-typed contrastive adversarial samples):

cd VSE_C
python3 --data_path ../data/ --data_name coco_precomp \
    --logger_name runs/coco_noun --learning_rate 0.001 --text_encoder_type gru \
    --max_violation --worker 10 --img_dim 2048 --use_external_captions

The model will be saved into the logger folder, e.g., runs/resnet152_noun. Please refer to VSE_C/ for more detailed description on hyper-parameters. Note that you also need to create

We have tested the model on GPU (CUDA 8.0). If you have any problem on training VSE-C on a different environment, please feel free to make an issue.

Evaluate VSE-C

In-Domain Evaluation

Here shows an example of the in-domain evaluation. Please run the code in Python3 or IPython3.

from VSE_C.vocab import Vocabulary
import VSE_C.evaluation
evaluation.eval_with_single_extended('runs/coco_noun', 'data/', 'coco_precomp', 'test')

Object Alignment

We provide our training script (testing is associated in the evaluation procedure) at evaluation/object_alignment. Please refer to the script for detailed usage.

Saliency Visualization (Jacinle required)

evaluation/saliency_visualization/ provides the script for saliency visualization. Please refer to the script for detailed usage. The visualized saliency images will be like:

Sentence Completion (Fill-in-the-Blanks, Jacinle required)

First, we need to generate datasets for sentence completion by

cd evaluations/completion
python3 --input $INPUT_PATH --output $OUTPUT_PATH

Then, run

python3 -m evaluations.completion.completion_train $ARGS
python3 -m evaluations.completion.completion_test $ARGS

for training and testing sentence completion models. Please refer to evaluation scripts for further descriptions of arguments.


If you find VSE-C useful, please consider citing:

    title={Learning Visually-Grounded Semantics from Contrastive Adversarial Samples},
    author={Shi, Haoyue and Mao, Jiayuan and Xiao, Tete and Jiang, Yuning and Sun, Jian},
    booktitle={Proceedings of the 27th International Conference on Computational Linguistics},




[COLING 2018] Learning Visually-Grounded Semantics from Contrastive Adversarial Samples.







No releases published


No packages published