Skip to content
/ ConvRS Public

Convolutional Reconstruction-to-Sequence for Video Captioning

Notifications You must be signed in to change notification settings

AmingWu/ConvRS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ConvRS

Convolutional Reconstruction-to-Sequence for Video Captioning

Task For video captioning, the commonly used method is LSTM decoder with an attention mechanism. Although LSTM owns a memory cell to memorize history information, it is still limited to several time steps. The reason is long-term information is gradually diluted at each time step. To alleviate this problem, we propose a convolutional reconstruction-to-sequence model for video captioning. Task In the following, some results are generated by the ConvRS model. Task

Setting Up and Data Preparation

We used tensorflow 1.0.0, python 2.7 and CUDA 8.0 for this project. Before using this code, you should download MSVD and MSRVTT2016 dataset from this link, i.e., http://www.cs.utexas.edu/users/ml/clamp/videoDescription/ and http://ms-multimedia-challenge.com/2016/dataset. For MSVD dataset, we select 40 equally-spaced frames from each video and feed them into ResNet-152 to extract a 2048-dimensional representation. For MSRVTT dataset, we select 20 equally-spaced frames from each video and feed then into ResNet-152 to extract a 2048-dimensional representation.

Training

CUDA_VISIBLE_DEVICES=0 python train.py

Test

CUDA_VISIBLE_DEVICES=0 python test.py

About

Convolutional Reconstruction-to-Sequence for Video Captioning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published