ConvRS

Convolutional Reconstruction-to-Sequence for Video Captioning

For video captioning, the commonly used method is LSTM decoder with an attention mechanism. Although LSTM owns a memory cell to memorize history information, it is still limited to several time steps. The reason is long-term information is gradually diluted at each time step. To alleviate this problem, we propose a convolutional reconstruction-to-sequence model for video captioning. In the following, some results are generated by the ConvRS model.

Setting Up and Data Preparation

We used tensorflow 1.0.0, python 2.7 and CUDA 8.0 for this project. Before using this code, you should download MSVD and MSRVTT2016 dataset from this link, i.e., http://www.cs.utexas.edu/users/ml/clamp/videoDescription/ and http://ms-multimedia-challenge.com/2016/dataset. For MSVD dataset, we select 40 equally-spaced frames from each video and feed them into ResNet-152 to extract a 2048-dimensional representation. For MSRVTT dataset, we select 20 equally-spaced frames from each video and feed then into ResNet-152 to extract a 2048-dimensional representation.

Training

CUDA_VISIBLE_DEVICES=0 python train.py

Test

CUDA_VISIBLE_DEVICES=0 python test.py

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Dictionary		Dictionary
caption_eval		caption_eval
datasets		datasets
model		model
pic		pic
README.md		README.md
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConvRS

Convolutional Reconstruction-to-Sequence for Video Captioning

Setting Up and Data Preparation

Training

Test

About

Releases

Packages

Languages

AmingWu/ConvRS

Folders and files

Latest commit

History

Repository files navigation

ConvRS

Convolutional Reconstruction-to-Sequence for Video Captioning

Setting Up and Data Preparation

Training

Test

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages