Skip to content

Implementation for "Joint Event Detection and Description in Continuous Video Streams"

License

Notifications You must be signed in to change notification settings

VisionLearningGroup/JEDDi-Net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Joint Event Detection and Description in Continuous Video Streams

Code released by Huijuan Xu (Boston University).

Introduction

We present the Joint Event Detection and Description Network (JEDDi-Net) that solves the dense captioning task in an end-to-end fashion. Our model continuously encodes the input video stream with three-dimensional convolutional layers, proposes variable-length temporal events based on pooled features, and transcribes the event proposals into captions with the consideration of visual and language context.

License

JEDDi-Net is released under the MIT License (refer to the LICENSE file for details).

Citing JEDDi-Net

If you find JEDDi-Net useful in your research, please consider citing:

@article{xu2019joint,
title={Joint Event Detection and Description in Continuous Video Streams},
author={Xu, Huijuan and Li, Boyang and Ramanishka, Vasili and Sigal, Leonid and Saenko, Kate},
journal={2019 IEEE Winter Conference on Applications of Computer Vision (WACV)},
year={2019}
}

Contents

  1. Installation
  2. Preparation
  3. Training
  4. Testing

Installation:

  1. Clone the JEDDi-Net repository.

    git clone --recursive git@github.com:VisionLearningGroup/JEDDi-Net.git
  2. Build Caffe3d with pycaffe (see: Caffe installation instructions).

    Note: Caffe must be built with Python support!

cd ./caffe3d

# If have all of the requirements installed and your Makefile.config in place, then simply do:
make -j8 && make pycaffe
  1. Build JEDDi-Net lib folder.

    cd ./lib    
    make

Preparation:

  1. Download the ground truth annatations and videos in ActivityNet Captions dataset.

  2. Extract frames from downloaded videos in 25 fps.

  3. Generate the pickle data for training and testing JEDDi-Net model.

    cd ./preprocess
    # generate training data
    python generate_train_roidb_sorted.py
    # generate validation data
    python generate_val_roidb.py  

Training:

  1. Download the separately-trained segment proposal network(SPN) and captioning models ./pretrain/ .

  2. In JEDDi-Net root folder, run:

    bash ./experiments/denseCap_jeddiNet_end2end/script_train.sh

Testing:

  1. Download one sample JEDDi-Net model to ./snapshot/ .

    One JEDDi-Net model on ActivityNet Captions dataset is provided in: caffemodel .

    The provided JEDDi-Net model has the METEOR score ~8.58% on the validation set.

  2. In JEDDi-Net root folder, generate the prediction log file on the validation set.

    bash ./experiments/denseCap_jeddiNet_end2end/test/script_test.sh 
  3. Generate the results.json file from the prediction log file.

    cd ./experiments/denseCap_jeddiNet_end2end/test/
    bash bash.sh
  4. Follow the evaluation code to get the evaluation results.

About

Implementation for "Joint Event Detection and Description in Continuous Video Streams"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages