Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Audio-Visual Scene-Aware Dialog

code for the paper: AVSD Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Stefan Lee, Peter Anderson, Irfan Essa, Devi Parikh, Dhruv Batra, Anoop Cherian, Tim K. Marks, Chiori Hori

website: video-dialog.com

This code has been developed upon batra-mlp-lab/visdial-challenge-starter-pytorch

Setup

# create and activate environment
conda env create -n avsd -f=env.yml
conda activate avsd

Data

  • download 'split'.json data at: video-dialog.com
  • Extracted video, audio, and dialog features can be downloaded from here

Workflow

  • Build dialogs json file with otions using makejson_with_options.py (output: 'split'_options.json)

  • Adapt JSON format using convert_json_to_visdial_style.py (output: 'split'_options_2.json can be renamed after to 'split'_options.json)

  • Build tokenized captions, dialogs and image paths with prepro.py (output: dialogs.h5 and params.json)

  • Build the image features (if working with images) using prepro_img_vgg16.lua or prepro_img_resnet.lua from the batra-mlp-lab/visdial-challenge-starter-pytorch (output: data_img.h5)

  • Build video features I3D (output: data_video.h5) https://github.com/piergiaj/pytorch-i3d.git

  • Build audio features AENET (output: data_audio.h5) https://github.com/znaoya/aenet.git

  • Training: python train.py

  • evaluation: python evaluate.py --use_gt

If you find this code useful in your research, please consider citing:

@article{DBLP:journals/corr/abs-1901-09107,
  author    = {Huda Alamri and
               Vincent Cartillier and
               Abhishek Das and
               Jue Wang and
               Stefan Lee and
               Peter Anderson and
               Irfan Essa and
               Devi Parikh and
               Dhruv Batra and
               Anoop Cherian and
               Tim K. Marks and
               Chiori Hori},
  title     = {Audio-Visual Scene-Aware Dialog},
  journal   = {CoRR},
  volume    = {abs/1901.09107},
  year      = {2019},
  url       = {http://arxiv.org/abs/1901.09107},
  archivePrefix = {arXiv},
  eprint    = {1901.09107},
  timestamp = {Sat, 02 Feb 2019 16:56:00 +0100},
  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1901-09107},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

License

BSD

About

[CVPR 2019] Pytorch code for Audio Visual Scene-Aware Dialog

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages