Skip to content

CpuKnows/SNLI-VE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SNLI-VE

Experiments with multi-modal entailment using an early fusion model and an attention model over words and image objects. https://github.com/CpuKnows/SNLI-VE

SNLI-VE corpus compiled by Xie et al. (2018)

Data

Setup

For full setup instructions see INSTALL.md

SNLI-VE Models

Fasttext hypothesis only baseline

Run scripts/create_fasttext_datasets.py to generate files for fasttext. Run scripts/create_snli_hard.py to create hard dataset splits.

Train fasttext model and make predictions:

fasttext supervised -input fasttext_train.txt -ouput fasttext_hyp_only -wordNgrams 2
fasttext predict fasttext_hyp_only.bin fasttext_<split>.txt 1 > prediction_<split>.txt 

Detectron bounding boxes for ROI Attention models

Run inference for bounding boxes:

DETECTRON=/path/to/detectron
SNLIVE=/path/to/SNLI-VE
python $DETECTRON/tools/infer_snlive.py \
    --cfg $DETECTRON/configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_2x.yaml \
    --output-dir $SNLIVE/data/detectron \
    --output-ext json \
    --image-ext jpg \
    --wts $DETECTRON/weights/e2e_mask_rcnn_R-50-FPN_2x_model.pkl \
    $SNLIVE/data/flickr30k-images

The custom detection script can be found in scripts/infer_snlive.py

SNLI-VE training and inference

Create smaller data subsets for training runs scripts/subset_snli_ve_data.py

Training:

allennlp train experiments/<EXPERIMENT_NAME>.json \
    --serialization-dir models/<EXPERIMENT_NAME> \
    --include-package snli_ve

Evaluation for fusion models:

allennlp predict \
    --output-file data/predictions/<OUTPUT>.json \
    --silent \
    --cuda-device -1 \
    --predictor snlive_fusion_predictor \
    --include-package snli_ve \
    models/<EXPERIMENT_NAME>/model.tar.gz \
    data/snli_ve_<SPLIT>.jsonl

Evaluation for ROI Attention models:

allennlp predict \
    --output-file data/predictions/<OUTPUT>.json \
    --silent \
    --cuda-device -1 \
    --predictor snlive_roi_predictor \
    --include-package snli_ve \
    models/<EXPERIMENT_NAME>/model.tar.gz \
    data/snli_ve_<SPLIT>.jsonl

Results

Total dataset

Validation set Test set
Model Overall Entailed Neutral Contradict Overall Entailed Neutral Contradict
Hypothesis only 64.50 - - - 64.20 - - -
Early fusion 62.86 68.97 64.61 54.96 63.09 69.31 65.38 54.56
Early fusion with ELMo 67.05 70.15 62.23 68.78 67.07 69.36 62.63 69.23
ROI Attention 63.34 70.46 64.85 54.69 63.47 69.98 65.64 54.76

Hard dataset

Validation set Test set
Model Overall Entailed Neutral Contradict Overall Entailed Neutral Contradict
Hypothesis only - - - - - - - -
Early fusion 21.97 26.36 27.45 12.24 21.89 25.50 27.75 12.47
Early fusion with ELMo 32.19 33.42 27.19 36.48 32.09 31.16 27.40 37.86
ROI Attention 19.49 25.83 23.65 09.49 19.70 24.99 23.79 10.62

Citations

Ning Xie, Farley Lai, Derek Doran, and Asim Kadav. "Visual Entailment Task for Visually-Grounded Language Learning." arXiv preprint arXiv:1811.10582 (2018).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages