Transitional Adaptation of Pretrained Models for Visual Storytelling (TAPM)

Authors: Youngjae Yu1∗, Jiwan Chung*, Heeseung Yun, Jongseok Kim, Gunhee Kim
Paper: CVPR2021 (pdf, slide, video)

Introduction

PyTorch code for the CVPR 2021 paper "Transitional Adaptation of Pretrained Models for Visual Storytelling".

We propose an explicit visual adaptation step to harmonize the visual encoder with the pretrained language models. Our simple adaptation objective aims to bridge the gap between the nature of the information stored in the visual encoder and the language decoder.

Requirements

Python 3.7 PyTorch 1.5

The other dependencies are specified in the requirements.txt file.

installation

git clone $THIS_REPO
cd $THIS_REPO
pip install requirements_primary.txt
pip install requirements.txt

download stanfordnlp.download('en_ewt')

Data Preparation

Store the datasets in $THIS_REPO/data e.g. data/LSMDC and data/VIST

For detailed instructions on how to extract relevant features, please refer to our guide on Dataset Preperation

LSMDC 2019

Please follow the instructions on Download to download the dataset.

Text

From the downloaded files, extract and move the task1 folder to under $THIS_REPO/data/LSMDC directory.

Features

The above link contains the two features: I3D and Resnet152. Extract and move both features to under $THIS_REPO/data/LSMDC/features directory.

ResNext Features

We also provide alternative features extracted with ResNext. Note that to reproduce our results you need these features instead of the official ones. Download

VIST

Please follow the instructions on Download to download the dataset.

Text

Download the Stories of Images-in-Sequence (SIS) set, extract and move the folder to under $THIS_REPO/data/VIST directory. e.g. data/VIST/sis

Features

The above link contains the raw image files.

Images

Use Resnet152 pretrained on ImageNet to extract features for each image. Store the features with numpy.save following the below structure.

resnet/
  train/
    {image_id}.npy
  test/
  val/

Box

Use Faster-RCNN model to extract object classification logits. Store the features with numpy.save following the below structure.

rcnn/
  train/
    {image_id}.npy
  test/
  val/

vilbert

Use VILBERT model to extract last hidden state vector. Store the features with pickle.dump following the below structure.

rcnn/
  train/
    {album_id}/
      {image_id}.pickle
  test/
  val/

Train

LSMDC 2019

cd code
python cli.py train with model=no_gt_sos fix_gpt_epoch=5 feature_names="['video', 'images']"

VIST

python cli.py train with model=no_gt_sos fix_gpt_epoch=3 feature_names="['images', 'box']" use_vist=True

with additional vilbert features

cd code
python cli.py train with model=no_gt_sos fix_gpt_epoch=3 feature_names="['images', 'box', 'vilbert']" use_vist=True

Run Scripts

python cli.py scripts with cript=[SCRIPT_NAME] (additional args)

Please take a look at the config.py file for more options.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
code		code
guides		guides
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt
requirements_primary.txt		requirements_primary.txt

License

JiwanChung/tapm

Folders and files

Latest commit

History

Repository files navigation

Transitional Adaptation of Pretrained Models for Visual Storytelling (TAPM)

Introduction

Requirements

installation

Data Preparation

LSMDC 2019

Text

Features

ResNext Features

VIST

Text

Features

Images

Box

vilbert

Train

LSMDC 2019

VIST

Run Scripts

About

Resources

License

Stars

Watchers

Forks

Languages