PyTorch code for the EMNLP 2021 paper "Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization". See the arxiv paper here.
This code has been tested on torch==1.11.0.dev20211014 (nightly) and torchvision==0.12.0.dev20211014 (nightly)
Download the PororoSV dataset and associated files from here and save it as ./data
.
Download GloVe embeddings (glove.840B.300D) from here. The default location of the embeddings is ./data/
(see ./dcsgan/miscc/config.py
).
To install the Berkeley Neural Parser with SpaCy:
pip install benepar
To extract parses for PororoSV:
python parse.py --dataset pororo --data_dir <path-to-data-directory>
We use the Dense Captioning Model implementation available here. Download the pretrained model as outlined in their repository. To extract dense captions for PororoSV:
python describe_pororosv.py --config_json <path-to-config> --lut_path <path-to-VG-regions-dict-lite.pkl> --model_checkpoint <path-to-model-checkpoint> --img_path <path-to-data-directory> --box_per_img 10 --batch_size 1
To train VLC-StoryGAN for PororoSV:
python train_gan.py --cfg ./cfg/pororo_s1_vlc.yml --data_dir <path-to-data-directory> --dataset pororo
\
Unless specified, the default output root directory for all model checkpoints is ./out/
Please see here for evaluation models for character classification-based scores, BLEU2/3 and R-Precision.
To evaluate Frechet Inception Distance (FID):
python eval_vfid --img_ref_dir <path-to-image-directory-original images> --img_gen_dir <path-to-image-directory-generated-images> --mode <mode>
More details coming soon.
@inproceedings{maharana2021integrating,
title={Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization},
author={Maharana, Adyasha and Bansal, Mohit},
booktitle={EMNLP},
year={2021}
}