Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Latest commit 4d3434e Nov 7, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
data first commit Jul 30, 2018
dataset update Jul 30, 2018
network fix Jul 31, 2018
test first commit Jul 30, 2018
train first commit Jul 30, 2018
LICENSE Initial commit Jul 30, 2018 first commit Jul 30, 2018 first commit Jul 30, 2018 first commit Jul 30, 2018 first commit Jul 30, 2018 first commit Jul 30, 2018 first commit Jul 30, 2018

Multi-Fiber Networks for Video Recognition

This repository contains the code and trained models of:

Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng. "Multi-Fiber Networks for Video Recognition" (PDF).


We use MXNet @92053bd for image classification and PyTorch 0.4.0a0@a83c240 for video classification.


The inputs are substrated by mean RGB = [ 124, 117, 104 ], and then multiplied by 0.0167.


Train motion from scratch:


Fine-tune with pre-trained model:




Evaluate the trained model:

cd test
# the default setting is to test trained model on ucf-101 (split1)


Image Recognition (ImageNet-1k)

Single Model, Single Crop Validation Accuracy:

Model Params FLOPs Top-1 Top-5 MXNet Model
ResNet-18 (reproduced) 11.7 M 1.8 G 71.4 % 90.2 % GoogleDrive
ResNet-18 (MF embedded) 9.6 M 1.6 G 74.3 % 92.1 % GoogleDrive
MF-Net (N=16) 5.8 M 861 M 74.6 % 92.0 % GoogleDrive

Video Recognition (UCF-101, HMDB51, Kinetics)

Model Params Target Dataset Top-1
MF-Net (3D) 8.0 M Kinetics 72.8 %
MF-Net (3D) 8.0 M UCF-101 96.0 %*
MF-Net (3D) 8.0 M HMDB51 74.6 %*

* accuracy averaged on slip1, slip2, and slip3.

Trained Models

Model Target Dataset PyTorch Model
MF-Net (2D) ImageNet-1k GoogleDrive
MF-Net (3D) Kinetics GoogleDrive
MF-Net (3D) UCF-101 (split1) GoogleDrive
MF-Net (3D) HMDB51 (split1) GoogleDrive

Other Resources

ImageNet-1k Trainig/Validation List:

ImageNet-1k category name mapping table:

Kinetics Dataset:

UCF-101 Dataset:

HMDB51 Dataset:


Do I need to convert the raw videos to specific format?

  • Our `dataiter' supports reading from raw videos and can tolerate corrupted videos.

How can I make the training faster?

  • Decoding frames from compressed videos consumes quite a lot CPU resources which is the bottleneck for the speed. You can try to convert the downloaded videos to other format or reduce the quality of the video. For example:
# convet to sort_edge_length = 360
ffmpeg -y -i ${SRC_VID} -c:v mpeg4 -filter:v "scale=min(iw\,(360*iw)/min(iw\,ih)):-1" -b:v 640k -an ${DST_VID}
# or, convet to sort_edge_length = 256
ffmpeg -y -i ${SRC_VID} -c:v mpeg4 -filter:v "scale=min(iw\,(256*iw)/min(iw\,ih)):-1" -b:v 512k -an ${DST_VID}
# or, convet to sort_edge_length = 160
ffmpeg -y -i ${SRC_VID} -c:v mpeg4 -filter:v "scale=min(iw\,(160*iw)/min(iw\,ih)):-1" -b:v 240k -an ${DST_VID}
  • Find another computer with better CPU.
  • The group convolution may not be well optimized.


If you use our code/model in your work or find it is helpful, please cite the paper:

  title={Multi-Fiber networks for Video Recognition},
  author={Chen, Yunpeng and Kalantidis, Yannis and Li, Jianshu and Yan, Shuicheng and Feng, Jiashi},
  booktitle={European Conference on Computer Vision (ECCV)},
You can’t perform that action at this time.