SlowOnly

Introduction

[ALGORITHM]

@inproceedings{feichtenhofer2019slowfast,
  title={Slowfast networks for video recognition},
  author={Feichtenhofer, Christoph and Fan, Haoqi and Malik, Jitendra and He, Kaiming},
  booktitle={Proceedings of the IEEE international conference on computer vision},
  pages={6202--6211},
  year={2019}
}

Model Zoo

Kinetics-400

config	resolution	gpus	backbone	pretrain	top1 acc	top5 acc	inference_time(video/s)	gpu_mem(M)	ckpt	log	json
slowonly_r50_4x16x1_256e_kinetics400_rgb	short-side 256	8x4	ResNet50	None	72.76	90.51	x	3168	ckpt	log	json
slowonly_r50_video_4x16x1_256e_kinetics400_rgb	short-side 320	8x2	ResNet50	None	72.90	90.82	x	8472	ckpt	log	json
slowonly_r50_8x8x1_256e_kinetics400_rgb	short-side 256	8x4	ResNet50	None	74.42	91.49	x	5820	ckpt	log	json
slowonly_r50_4x16x1_256e_kinetics400_rgb	short-side 320	8x2	ResNet50	None	73.02	90.77	4.0 (40x3 frames)	3168	ckpt	log	json
slowonly_r50_8x8x1_256e_kinetics400_rgb	short-side 320	8x3	ResNet50	None	74.93	91.92	2.3 (80x3 frames)	5820	ckpt	log	json
slowonly_imagenet_pretrained_r50_4x16x1_150e_kinetics400_rgb	short-side 320	8x2	ResNet50	ImageNet	73.39	91.12	x	3168	ckpt	log	json
slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb	short-side 320	8x4	ResNet50	ImageNet	75.55	92.04	x	5820	ckpt	log	json
slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb	short-side 320	8x2	ResNet50	ImageNet	74.54	91.73	x	4435	ckpt	log	json
slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb	short-side 320	8x4	ResNet50	ImageNet	76.07	92.42	x	8895	ckpt	log	json
slowonly_r50_4x16x1_256e_kinetics400_flow	short-side 320	8x2	ResNet50	ImageNet	61.79	83.62	x	8450	ckpt	log	json
slowonly_r50_8x8x1_196e_kinetics400_flow	short-side 320	8x4	ResNet50	ImageNet	65.76	86.25	x	8455	ckpt	log	json

Kinetics-400 Data Benchmark

In data benchmark, we compare two different data preprocessing methods: (1) Resize video to 340x256, (2) Resize the short edge of video to 320px, (3) Resize the short edge of video to 256px.

config	resolution	gpus	backbone	Input	pretrain	top1 acc	top5 acc	testing protocol	ckpt	log	json
slowonly_r50_randomresizedcrop_340x256_4x16x1_256e_kinetics400_rgb	340x256	8x2	ResNet50	4x16	None	71.61	90.05	10 clips x 3 crops	ckpt	log	json
slowonly_r50_randomresizedcrop_320p_4x16x1_256e_kinetics400_rgb	short-side 320	8x2	ResNet50	4x16	None	73.02	90.77	10 clips x 3 crops	ckpt	log	json
slowonly_r50_randomresizedcrop_256p_4x16x1_256e_kinetics400_rgb	short-side 256	8x4	ResNet50	4x16	None	72.76	90.51	10 clips x 3 crops	ckpt	log	json

Kinetics-400 OmniSource Experiments

config	resolution	backbone	pretrain	w. OmniSource	top1 acc	top5 acc	ckpt	log	json
slowonly_r50_4x16x1_256e_kinetics400_rgb	short-side 320	ResNet50	None	❌	73.0	90.8	ckpt	log	json
x	x	ResNet50	None	✔️	76.8	92.5	ckpt	x	x
slowonly_r101_8x8x1_196e_kinetics400_rgb	x	ResNet101	None	❌	76.5	92.7	ckpt	x	x
x	x	ResNet101	None	✔️	80.4	94.4	ckpt	x	x

Kinetics-600

config	resolution	gpus	backbone	pretrain	top1 acc	top5 acc	ckpt	log	json
slowonly_r50_video_8x8x1_256e_kinetics600_rgb	short-side 256	8x4	ResNet50	None	77.5	93.7	ckpt	log	json

Kinetics-700

config	resolution	gpus	backbone	pretrain	top1 acc	top5 acc	ckpt	log	json
slowonly_r50_video_8x8x1_256e_kinetics700_rgb	short-side 256	8x4	ResNet50	None	65.0	86.1	ckpt	log	json

GYM99

config	resolution	gpus	backbone	pretrain	top1 acc	mean class acc	ckpt	log	json
slowonly_imagenet_pretrained_r50_4x16x1_120e_gym99_rgb	short-side 256	8x2	ResNet50	ImageNet	79.3	70.2	ckpt	log	json
slowonly_kinetics_pretrained_r50_4x16x1_120e_gym99_flow	short-side 256	8x2	ResNet50	Kinetics	80.3	71.0	ckpt	log	json
1: 1 Fusion					83.7	74.8

Notes:

The gpus indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default. According to the Linear Scaling Rule, you may set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu.
The inference_time is got by this benchmark script, where we use the sampling frames strategy of the test setting and only care about the model inference time, not including the IO time and pre-processing time. For each setting, we use 1 gpu and set batch size (videos per gpu) to 1 to calculate the inference time.

For more details on data preparation, you can refer to Kinetics400 in Data Preparation.

Train

You can use the following command to train a model.

python tools/train.py ${CONFIG_FILE} [optional arguments]

Example: train SlowOnly model on Kinetics-400 dataset in a deterministic option with periodic validation.

python tools/train.py configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py \
    --work-dir work_dirs/slowonly_r50_4x16x1_256e_kinetics400_rgb \
    --validate --seed 0 --deterministic

For more details, you can refer to Training setting part in getting_started.

Test

You can use the following command to test a model.

python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]

Example: test SlowOnly model on Kinetics-400 dataset and dump the result to a json file.

python tools/test.py configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py \
    checkpoints/SOME_CHECKPOINT.pth --eval top_k_accuracy mean_class_accuracy \
    --out result.json --average-clips=prob

For more details, you can refer to Test a dataset part in getting_started.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SlowOnly

Introduction

Model Zoo

Kinetics-400

Kinetics-400 Data Benchmark

Kinetics-400 OmniSource Experiments

Kinetics-600

Kinetics-700

GYM99

Train

Test

Files

README.md

Latest commit

History

README.md

File metadata and controls

SlowOnly

Introduction

Model Zoo

Kinetics-400

Kinetics-400 Data Benchmark

Kinetics-400 OmniSource Experiments

Kinetics-600

Kinetics-700

GYM99

Train

Test