Skip to content
/ DPC Public

Video Representation Learning by Dense Predictive Coding. Tengda Han, Weidi Xie, Andrew Zisserman.

License

Notifications You must be signed in to change notification settings

TengdaHan/DPC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video Representation Learning by Dense Predictive Coding

This repository contains the implementation of Dense Predictive Coding (DPC).

Links: [Arxiv] [Video] [Project page]

arch

DPC Results

Original result from our paper:

Pretrain Dataset Resolution Backbone Finetune Acc@1 (UCF101) Finetune Acc@1 (HMDB51)
UCF101 128x128 2d3d-R18 60.6 -
Kinetics400 128x128 2d3d-R18 68.2 34.5
Kinetics400 224x224 2d3d-R34 75.7 35.7

Also re-implemented by other researchers:

Pretrain Dataset Resolution Backbone Finetune Acc@1 (UCF101) Finetune Acc@1 (HMDB51)
UCF101 128x128 2d3d-R18 61.35 @kayush95 45.31 @kayush95

News

Installation

The implementation should work with python >= 3.6, pytorch >= 0.4, torchvision >= 0.2.2.

The repo also requires cv2 (conda install -c menpo opencv), tensorboardX >= 1.7 (pip install tensorboardX), joblib, tqdm, ipdb.

Prepare data

Follow the instructions here.

Self-supervised training (DPC)

Change directory cd DPC/dpc/

  • example: train DPC-RNN using 2 GPUs, with 3D-ResNet18 backbone, on Kinetics400 dataset with 128x128 resolution, for 300 epochs

    python main.py --gpu 0,1 --net resnet18 --dataset k400 --batch_size 128 --img_dim 128 --epochs 300
    
  • example: train DPC-RNN using 4 GPUs, with 3D-ResNet34 backbone, on Kinetics400 dataset with 224x224 resolution, for 150 epochs

    python main.py --gpu 0,1,2,3 --net resnet34 --dataset k400 --batch_size 44 --img_dim 224 --epochs 150
    

Evaluation: supervised action classification

Change directory cd DPC/eval/

  • example: finetune pretrained DPC weights (replace {model.pth.tar} with pretrained DPC model)

    python test.py --gpu 0,1 --net resnet18 --dataset ucf101 --batch_size 128 --img_dim 128 --pretrain {model.pth.tar} --train_what ft --epochs 300
    
  • example (continued): test the finetuned model (replace {finetune_model.pth.tar} with finetuned classifier model)

    python test.py --gpu 0,1 --net resnet18 --dataset ucf101 --batch_size 128 --img_dim 128 --test {finetune_model.pth.tar}
    

DPC-pretrained weights

It took us more than 1 week to train the 3D-ResNet18 DPC model on Kinetics-400 with 128x128 resolution, and it tooks about 6 weeks to train the 3D-ResNet34 DPC model on Kinetics-400 with 224x224 resolution (with 4 Nvidia P40 GPUs).

Download link:

Citation

If you find the repo useful for your research, please consider citing our paper:

@InProceedings{Han19dpc,
  author       = "Tengda Han and Weidi Xie and Andrew Zisserman",
  title        = "Video Representation Learning by Dense Predictive Coding",
  booktitle    = "Workshop on Large Scale Holistic Video Understanding, ICCV",
  year         = "2019",
}

For any questions, welcome to create an issue or contact Tengda Han (htd@robots.ox.ac.uk).

About

Video Representation Learning by Dense Predictive Coding. Tengda Han, Weidi Xie, Andrew Zisserman.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages