Generating Video Description using Sequence-to-sequence Model with Temporal Attention
Python Shell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
coco-caption @ 3f0fe9b
code
data/Y2T
docs
.gitignore
.gitmodules
LICENSE
README.md
download.sh
example.sh

README.md

Sequence-to-sequence Model with Temporal Attention

seq2seq_temporal_attention is a tool for automatic video captioning. This is an implementation of Generating Video Description using Sequence-to-sequence Model with Temporal Attention (PDF).

Requirements (Linux/Mac)

For requirements for Windows, read docs/requirements-windows.md.

Examples

Captioning a video

To test out the tool, run example.sh. It gives a caption for an excerpt of the video titled playing wool ball with my cat : ). Our models were trained on Microsoft Video Description Dataset.

cat video thumbnail

git clone git@github.com:aistairc/seq2seq_temporal_attention.git --recursive
./download.sh
./example.sh --gpu GPU_ID  # It will generate a caption *a cat is playing with a toy*

Note: In most cases, setting GPU_ID to 0 will work. If you want to run it without GPU, set the parameter to -1.

Training

This is an example command to train.

cd code
python chainer_seq2seq_att.py \
    --mode train \
    --gpu GPU_ID \
    --batchsize 40 \
    --dropout 0.3 \
    --align ('dot'|'bilinear'|'concat'|'none') \
    --feature feature_file_name \
    output_folder

Test

There are two ways for test, test and test-batch. The latter runs much faster, but it does not use beam search. Be careful to specify which alignment model you want to use. It has to match your pre-trained model, in order to make it work correctly.

cd code
python chainer_seq2seq_att.py \
    --mode ('test'|'test-batch') \
    --gpu GPU_ID \
    --model path_to_model_file \
    --align ('dot'|'bilinear'|'concat'|'none') \
    --feature feature_file_name \
    output_folder