Skip to content

BigBoyZYX/Video_Caption

Repository files navigation

Video captioning models in Pytorch (Work in progress)

This repository contains Pytorch implementation of video captioning SOTA models from 2015-2019 on MSVD and MSRVTT datasets. Details are given in below table

Model Datasets Paper name Year Status
Mean Pooling MSVD, MSRVTT Translating videos to natural language using deep recurrent neural networks[1] 2015 Implemented
S2VT MSVD, MSRVTT Sequence to Sequence - Video to Text[2] 2015 Implemented
SA-LSTM MSVD, MSRVTT Describing videos by exploiting temporal structure[3] 2015 Implemented
Recnet MSVD, MSRVTT Reconstruction Network for Video Captioning[4] 2018 Implemented
MARN MSVD, MSRVTT Memory-Attended Recurrent Network for Video Captioning[5] 2019 Implemented

*More recent models will be added in future

Environment

  • Ubuntu 18.04
  • CUDA 11.0
  • Nvidia GeForce RTX 2080Ti

Requirements

  • Java 8
  • Python 3.8.5
    • Pytorch 1.7.0
    • Other Python libraries specified in requirements.txt

How to Use

Step 1. Setup python virtual environment

$ virtualenv .env
$ source .env/bin/activate
(.env) $ pip install --upgrade pip
(.env) $ pip install -r requirements.txt

Step 2. Prepare data, path and hyperparameter settings

  1. Extract features from network you want to use, and locate them at <PROJECT ROOT>/<DATASET>/features/<DATASET>_APPEARANCE_<NETWORK>_<FRAME_LENGTH>.hdf5. To extracted features follow the repository here. Or simply download the already extracted features from given table and locate them in <PROJECT ROOT>/<DATASET>/features/

    Dataset Feature Type Inception-v4 InceptionResNetV2 ResNet-101 REsNext-101
    MSVD Appearance link link link -
    MSR-VTT Appearance link link link -
    MSVD Motion - - - link

You can change hyperparameters by modifying config.py.

Step 3. Prepare Evaluation Codes

Clone evaluation codes from the official coco-evaluation repo.

(.env) $ git clone https://github.com/tylin/coco-caption.git
(.env) $ mv coco-caption/pycocoevalcap .
(.env) $ rm -rf coco-caption

Or simply copy the pycocoevalcap folder and its contents in the project root.

Step 4. Training

Follow the demo given in video_captioning.ipynb.

Step 5. Inference

Follow the demo given in video_captioning.ipynb.

Quantitative Results

*MSVD

Model Pretrained model BLEU4 METEOR ROUGE_L CIDER Pretrained
Mean Pooling Inceptionv4 42.2 31.6 68.2 69.7 link
SA-LSTM InceptionvResNetV2 45.5 32.5 69.0 78.0 link
S2VT Inceptionv4 - - - - -
RecNet (global ) Inceptionv4 - - - - -
RecNet (local) Inceptionv4 - - - - -
MARN Inceptionv4 - - - - -

*MSRVTT

Model Pretrained model BLEU4 METEOR ROUGE_L CIDER Pretrained
Mean Pooling Inceptionv4 34.9 25.5 58.12 35.76 link
SA-LSTM Inceptionv4 - - - - -
S2VT Inceptionv4 - - - - -
RecNet (global ) Inceptionv4 - - - - -
RecNet (local) Inceptionv4 - - - - -
MARN Inceptionv4 - - - - -

References

[1] S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. In NAACLHLT, 2015.

[2] Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond J. Mooney, Trevor Darrell and Kate Saenko. Sequence to Sequence - Video to Text. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015

[3] Yao, Li, et al. "Describing videos by exploiting temporal structure." Proceedings of the IEEE international conference on computer vision. 2015.

[4] Wang, Bairui, et al. "Reconstruction Network for Video Captioning." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

[5] Wenjie Pei, Jiyuan Zhang, Xiangrong Wang, Lei Ke, Xiaoyong Shen, and Yu-Wing Tai. Memory-attended recurrent network for video captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8347–8356, 2019

Acknowlegement

I got some of the coding ideas and the extracted video features from hobincar/pytorch-video-feature-extractor Many thanks!

About

My first DL Net.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published