Official Tensorflow Implementation of the paper "Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning" in CVPR 2018, with code, model and prediction results.
Clone or download
Latest commit 65cc2ff Jan 8, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore Update .gitignore Oct 6, 2018
checkpoints Add files via upload Oct 6, 2018
dataset/ActivityNet_Captions Add files via upload Dec 4, 2018
densevid_eval-master Update Dec 5, 2018
LICENSE Add files via upload Oct 6, 2018 Update Jan 8, 2019 Add files via upload Dec 4, 2018
method.png Add files via upload Oct 22, 2018 Add files via upload Dec 25, 2018 Update Dec 10, 2018 Add files via upload Dec 25, 2018 Add files via upload Dec 25, 2018


Tensorflow Implementation of the Paper Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning by Jingwen Wang et al. in CVPR 2018.

alt text


  title={Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning},
  author={Wang, Jingwen and Jiang, Wenhao and Ma, Lin and Liu, Wei and Xu, Yong},

Data Preparation

Please download annotation data and C3D features from the website ActivityNet Captions. The ActivityNet C3D features with stride of 64 frames (used in my paper) can be found here.

Please follow the script dataset/ActivityNet_Captions/preprocess/anchors/ to obtain clustered anchors and their pos/neg weights (for handling imbalance class problem). I already put the generated files in dataset/ActivityNet_Captions/preprocess/anchors/.

Please follow the script dataset/ActivityNet_Captions/preprocess/ to build word dictionary and to build train/val/test encoded sentence data.

Hyper Parameters

The configuration (from my experiments) is given in, including model setup, training options, and testing options. You may want to set max_proposal_num=1000 if saving valiation time is not the first priority.


Train dense-captioning model using the script

First pre-train the proposal module (you may need to slightly modify the code to support batch size of 32, using batch size of 1 could lead to unsatisfactory performance). The pretrained proposal model can be found in Then train the whole dense-captioning model by setting train_proposal=True and train_caption=True. To understand the proposal module, I refer you to the original SST paper and also my tensorflow implementation of SST.


Follow the script to make proposal predictions and to evaluate the predictions. Use max_proposal_num=1000 to generate .json test file and then use script "python2 -s [json_file] -ppv 100" to evaluate the performance (the joint ranking requres to drop items that are less confident).


Please note that the official evaluation metric has been updated (Line 194). In the paper, old metric is reported (but still, you can compare results from different methods, all CVPR-2018 papers report old metric).

Pre-trained Model & Results

[Deprecated] The predicted results for val/test set can be found here.

The pre-trained model and validation/test prediction can be found here. On validation set the model obtained 9.77 METEOR score using and 5.42 METEOR score using On test set the model obtained 4.49 METEOR score returned by the ActivityNet server.




Other versions may also work.


  1. I corrected some naming errors and simplified the proposal loss using tensorflow built-in function.
  2. I uploaded C3D features with stride of 64 frames (used in my paper). You can find it here.
  3. I uploaded val/test results of both without joint ranking and with joint ranking.
  4. I uploaded video_fps.json and updated
  5. Due to large file constraint, you may need to download data/paraphrase-en.gz here and put it in densevid_eval-master/coco-caption/pycocoevalcap/meteor/data/.
  6. I corrected multi-rnn mistake casused by get_rnn_cell() function (see
  7. I updated evaluation code. "" is used in my paper, "" is used since ActivityNet Captions 2018 Challenge.
  8. I removed too small anchors and too large anchors, resulting into 120 anchors.
  9. I modified and to correct the loss weighting.
  10. I corrected the mistake from the evaluator ( & You can match the code with
  11. I uploaded the pretrained model. Please also download the updated code.