Joint Event Detection and Description in Continuous Video Streams

Code released by Huijuan Xu (Boston University).

Introduction

We present the Joint Event Detection and Description Network (JEDDi-Net) that solves the dense captioning task in an end-to-end fashion. Our model continuously encodes the input video stream with three-dimensional convolutional layers, proposes variable-length temporal events based on pooled features, and transcribes the event proposals into captions with the consideration of visual and language context.

License

JEDDi-Net is released under the MIT License (refer to the LICENSE file for details).

Citing JEDDi-Net

If you find JEDDi-Net useful in your research, please consider citing:

@article{xu2019joint,
title={Joint Event Detection and Description in Continuous Video Streams},
author={Xu, Huijuan and Li, Boyang and Ramanishka, Vasili and Sigal, Leonid and Saenko, Kate},
journal={2019 IEEE Winter Conference on Applications of Computer Vision (WACV)},
year={2019}
}

Installation:

Clone the JEDDi-Net repository.

git clone --recursive git@github.com:VisionLearningGroup/JEDDi-Net.git

Build Caffe3d with pycaffe (see: Caffe installation instructions).

Note: Caffe must be built with Python support!

cd ./caffe3d

# If have all of the requirements installed and your Makefile.config in place, then simply do:
make -j8 && make pycaffe

Build JEDDi-Net lib folder.
```
cd ./lib    
make
```

Preparation:

Download the ground truth annatations and videos in ActivityNet Captions dataset.
Extract frames from downloaded videos in 25 fps.

Generate the pickle data for training and testing JEDDi-Net model.

cd ./preprocess
# generate training data
python generate_train_roidb_sorted.py
# generate validation data
python generate_val_roidb.py

Training:

Download the separately-trained segment proposal network(SPN) and captioning models ./pretrain/ .

In JEDDi-Net root folder, run:

bash ./experiments/denseCap_jeddiNet_end2end/script_train.sh

Testing:

Download one sample JEDDi-Net model to ./snapshot/ .

One JEDDi-Net model on ActivityNet Captions dataset is provided in: caffemodel .

The provided JEDDi-Net model has the METEOR score ~8.58% on the validation set.
In JEDDi-Net root folder, generate the prediction log file on the validation set.
```
bash ./experiments/denseCap_jeddiNet_end2end/test/script_test.sh 
```

Generate the results.json file from the prediction log file.

cd ./experiments/denseCap_jeddiNet_end2end/test/
bash bash.sh

Follow the evaluation code to get the evaluation results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Joint Event Detection and Description in Continuous Video Streams

Introduction

License

Citing JEDDi-Net

Contents

Installation:

Preparation:

Training:

Testing:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
caffe3d		caffe3d
experiments/denseCap_jeddiNet_end2end		experiments/denseCap_jeddiNet_end2end
lib		lib
preprocess		preprocess
LICENSE		LICENSE
README.md		README.md

License

VisionLearningGroup/JEDDi-Net

Folders and files

Latest commit

History

Repository files navigation

Joint Event Detection and Description in Continuous Video Streams

Introduction

License

Citing JEDDi-Net

Contents

Installation:

Preparation:

Training:

Testing:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages