Skip to content

HaydenFaulkner/Attributes_SVO_Video_Captioning

Repository files navigation

Attributes and SVOs for Video Captioning

This implementation is based on "Syntax-Aware Action Targeting for Video Captioning" (code) which is based on "Consensus-based Sequence Training for Video Captioning".

Dependencies

  • Python 3.6
  • PyTorch 1.1
  • CUDA 10.0

This repo includes an edited version (coco-caption) of the Python 3 coco evaluation protocols (edited to load CIDEr corpus)

Data

The datasets and their features can be downloaded from my Google Drive along with pre-trained models resultant from the experiments:

Experiments

View my experiments and results

Train

To train on MSVD:

python train.py --dataset msvd 
                --captioner_type lstm 
                --model_id lstm_1 
                --batch_size 8 
                --test_batch_size 8 
                --max_epochs 100

To train on MSR-VTT:

python train.py --dataset msrvtt 
                --captioner_type lstm 
                --model_id lstm_1  
                --batch_size 8 
                --test_batch_size 4 
                --max_epochs 200

Test / Evaluate

Testing occurs automatically at the end of training, if you would like to run separately use evaluate.py To evaluate on MSVD:

python evaluate.py --dataset msvd 
                   --captioner_type lstm 
                   --model_id lstm_1 
                   --test_batch_size 8 

To evaluate on MSR-VTT:

python evaluate.py --dataset msrvtt 
                   --captioner_type lstm 
                   --model_id lstm_1  
                   --test_batch_size 4 

Acknowledgements

  • PyTorch implementation of SAAT
  • PyTorch implementation of CST
  • PyTorch implementation of SCST

About

LSTM RNN and Transformer networks video captioning on MSVD and MSR-VTT using attributes and SVOS

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published