Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Latest commit 4d3434e Nov 7, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data first commit Jul 30, 2018
dataset update Jul 30, 2018
exps
network fix Jul 31, 2018
test first commit Jul 30, 2018
train first commit Jul 30, 2018
.gitignore
Hosts
LICENSE Initial commit Jul 30, 2018
README.md
dist.sh first commit Jul 30, 2018
kill_python.sh first commit Jul 30, 2018
run_dist.sh first commit Jul 30, 2018
run_local.sh
train_hmdb51.py first commit Jul 30, 2018
train_kinetics.py first commit Jul 30, 2018
train_model.py first commit Jul 30, 2018
train_ucf101.py

README.md

Multi-Fiber Networks for Video Recognition

This repository contains the code and trained models of:

Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng. "Multi-Fiber Networks for Video Recognition" (PDF).

Implementation

We use MXNet @92053bd for image classification and PyTorch 0.4.0a0@a83c240 for video classification.

Normalization

The inputs are substrated by mean RGB = [ 124, 117, 104 ], and then multiplied by 0.0167.

Usage

Train motion from scratch:

python train_kinetics.py

Fine-tune with pre-trained model:

python train_ucf101.py

or

python train_hmdb51.py

Evaluate the trained model:

cd test
# the default setting is to test trained model on ucf-101 (split1)
python evaluate_video.py

Results

Image Recognition (ImageNet-1k)

Single Model, Single Crop Validation Accuracy:

Model Params FLOPs Top-1 Top-5 MXNet Model
ResNet-18 (reproduced) 11.7 M 1.8 G 71.4 % 90.2 % GoogleDrive
ResNet-18 (MF embedded) 9.6 M 1.6 G 74.3 % 92.1 % GoogleDrive
MF-Net (N=16) 5.8 M 861 M 74.6 % 92.0 % GoogleDrive

Video Recognition (UCF-101, HMDB51, Kinetics)

Model Params Target Dataset Top-1
MF-Net (3D) 8.0 M Kinetics 72.8 %
MF-Net (3D) 8.0 M UCF-101 96.0 %*
MF-Net (3D) 8.0 M HMDB51 74.6 %*

* accuracy averaged on slip1, slip2, and slip3.

Trained Models

Model Target Dataset PyTorch Model
MF-Net (2D) ImageNet-1k GoogleDrive
MF-Net (3D) Kinetics GoogleDrive
MF-Net (3D) UCF-101 (split1) GoogleDrive
MF-Net (3D) HMDB51 (split1) GoogleDrive

Other Resources

ImageNet-1k Trainig/Validation List:

ImageNet-1k category name mapping table:

Kinetics Dataset:

UCF-101 Dataset:

HMDB51 Dataset:

FAQ

Do I need to convert the raw videos to specific format?

  • Our `dataiter' supports reading from raw videos and can tolerate corrupted videos.

How can I make the training faster?

  • Decoding frames from compressed videos consumes quite a lot CPU resources which is the bottleneck for the speed. You can try to convert the downloaded videos to other format or reduce the quality of the video. For example:
# convet to sort_edge_length = 360
ffmpeg -y -i ${SRC_VID} -c:v mpeg4 -filter:v "scale=min(iw\,(360*iw)/min(iw\,ih)):-1" -b:v 640k -an ${DST_VID}
# or, convet to sort_edge_length = 256
ffmpeg -y -i ${SRC_VID} -c:v mpeg4 -filter:v "scale=min(iw\,(256*iw)/min(iw\,ih)):-1" -b:v 512k -an ${DST_VID}
# or, convet to sort_edge_length = 160
ffmpeg -y -i ${SRC_VID} -c:v mpeg4 -filter:v "scale=min(iw\,(160*iw)/min(iw\,ih)):-1" -b:v 240k -an ${DST_VID}
  • Find another computer with better CPU.
  • The group convolution may not be well optimized.

Citation

If you use our code/model in your work or find it is helpful, please cite the paper:

@inproceedings{chen2018multifiber,
  title={Multi-Fiber networks for Video Recognition},
  author={Chen, Yunpeng and Kalantidis, Yannis and Li, Jianshu and Yan, Shuicheng and Feng, Jiashi},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2018}
}
You can’t perform that action at this time.