Skip to content
[CVPR 2019] Pytorch code for Audio Visual Scene-Aware Dialog
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
decoders
encoders first Jun 13, 2019
utils
README.md
convert_json_to_visdial_style.py
dataloader.py
env.yml
evaluate.py
makejson_with_options.py
prepro.py
svqa_utils.py adding svqa_utils Jul 30, 2019
train.py first Jun 13, 2019

README.md

Audio-Visual Scene-Aware Dialog

code for the paper: AVSD Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Stefan Lee, Peter Anderson, Irfan Essa, Devi Parikh, Dhruv Batra, Anoop Cherian, Tim K. Marks, Chiori Hori

website: video-dialog.com

This code has been developed upon batra-mlp-lab/visdial-challenge-starter-pytorch

Setup

# create and activate environment
conda env create -n avsd -f=env.yml
conda activate avsd

Data

  • download 'split'.json data at: video-dialog.com

Workflow

  • Build dialogs json file with otions using makejson_with_options.py (output: 'split'_options.json)

  • Adapt JSON format using convert_json_to_visdial_style.py (output: 'split'_options_2.json can be renamed after to 'split'_options.json)

  • Build tokenized captions, dialogs and image paths with prepro.py (output: dialogs.h5 and params.json)

  • Build the image features (if working with images) using prepro_img_vgg16.lua or prepro_img_resnet.lua from the batra-mlp-lab/visdial-challenge-starter-pytorch (output: data_img.h5)

  • Build video features I3D (output: data_video.h5) https://github.com/piergiaj/pytorch-i3d.git

  • Build audio features AENET (output: data_audio.h5) https://github.com/znaoya/aenet.git

  • Training: python train.py

  • evaluation: python evaluate.py --use_gt

If you find this code useful in your research, please consider citing:

@article{DBLP:journals/corr/abs-1901-09107,
  author    = {Huda Alamri and
               Vincent Cartillier and
               Abhishek Das and
               Jue Wang and
               Stefan Lee and
               Peter Anderson and
               Irfan Essa and
               Devi Parikh and
               Dhruv Batra and
               Anoop Cherian and
               Tim K. Marks and
               Chiori Hori},
  title     = {Audio-Visual Scene-Aware Dialog},
  journal   = {CoRR},
  volume    = {abs/1901.09107},
  year      = {2019},
  url       = {http://arxiv.org/abs/1901.09107},
  archivePrefix = {arXiv},
  eprint    = {1901.09107},
  timestamp = {Sat, 02 Feb 2019 16:56:00 +0100},
  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1901-09107},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

License

BSD

You can’t perform that action at this time.