Skip to content
[CVPR 2019] Pytorch code for Audio Visual Scene-Aware Dialog
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
encoders first Jun 13, 2019
env.yml adding svqa_utils Jul 30, 2019 first Jun 13, 2019

Audio-Visual Scene-Aware Dialog

code for the paper: AVSD Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Stefan Lee, Peter Anderson, Irfan Essa, Devi Parikh, Dhruv Batra, Anoop Cherian, Tim K. Marks, Chiori Hori


This code has been developed upon batra-mlp-lab/visdial-challenge-starter-pytorch


# create and activate environment
conda env create -n avsd -f=env.yml
conda activate avsd


  • download 'split'.json data at:


  • Build dialogs json file with otions using (output: 'split'_options.json)

  • Adapt JSON format using (output: 'split'_options_2.json can be renamed after to 'split'_options.json)

  • Build tokenized captions, dialogs and image paths with (output: dialogs.h5 and params.json)

  • Build the image features (if working with images) using prepro_img_vgg16.lua or prepro_img_resnet.lua from the batra-mlp-lab/visdial-challenge-starter-pytorch (output: data_img.h5)

  • Build video features I3D (output: data_video.h5)

  • Build audio features AENET (output: data_audio.h5)

  • Training: python

  • evaluation: python --use_gt

If you find this code useful in your research, please consider citing:

  author    = {Huda Alamri and
               Vincent Cartillier and
               Abhishek Das and
               Jue Wang and
               Stefan Lee and
               Peter Anderson and
               Irfan Essa and
               Devi Parikh and
               Dhruv Batra and
               Anoop Cherian and
               Tim K. Marks and
               Chiori Hori},
  title     = {Audio-Visual Scene-Aware Dialog},
  journal   = {CoRR},
  volume    = {abs/1901.09107},
  year      = {2019},
  url       = {},
  archivePrefix = {arXiv},
  eprint    = {1901.09107},
  timestamp = {Sat, 02 Feb 2019 16:56:00 +0100},
  biburl    = {},
  bibsource = {dblp computer science bibliography,}



You can’t perform that action at this time.