Skip to content

Latest commit

 

History

History
70 lines (50 loc) · 12.6 KB

File metadata and controls

70 lines (50 loc) · 12.6 KB

Omni-sourced Webly-supervised Learning for Video Recognition

Haodong Duan, Yue Zhao, Yuanjun Xiong, Wentao Liu, Dahua Lin

In ECCV, 2020. Paper

pipeline

Model Zoo

Kinetics-400 Model Release

We currently released 4 models trained with OmniSource framework, including both 2D and 3D architectures. We compare the performance of models trained with or without OmniSource in the following table.

Model Modality Pretrained Backbone Input Resolution Top-1 (Baseline / OmniSource (Delta)) Top-5 (Baseline / OmniSource (Delta))) Download
TSN RGB ImageNet ResNet50 3seg 340x256 70.6 / 73.6 (+ 3.0) 89.4 / 91.0 (+ 1.6) Baseline / OmniSource
TSN RGB IG-1B ResNet50 3seg short-side 320 73.1 / 75.7 (+ 2.6) 90.4 / 91.9 (+ 1.5) Baseline / OmniSource
SlowOnly RGB Scratch ResNet50 4x16 short-side 320 72.9 / 76.8 (+ 3.9) 90.9 / 92.5 (+ 1.6) Baseline / OmniSource
SlowOnly RGB Scratch ResNet101 8x8 short-side 320 76.5 / 80.4 (+ 3.9) 92.7 / 94.4 (+ 1.7) Baseline / OmniSource

Benchmark on Mini-Kinetics

We release a subset of web dataset used in the OmniSource paper. Specifically, we release the web data in the 200 classes of Mini-Kinetics. The statistics of those datasets is detailed in preparing_omnisource. To obtain those data, you need to fill in a data request form. After we received your request, the download link of these data will be send to you. For more details on the released OmniSource web dataset, please refer to preparing_omnisource.

We benchmark the OmniSource framework on the released subset, results are listed in the following table (we report the Top-1 and Top-5 accuracy on Mini-Kinetics validation). The cbenchmark can be used as a baseline for video recognition with web data.

TSN-8seg-ResNet50

Setting Top-1 Top-5 ckpt json log
Baseline 77.4 93.6 ckpt json log
+GG-img 78.0 93.6 ckpt json log
+[GG-IG]-img 78.6 93.6 ckpt json log
+IG-vid 80.6 95.0 ckpt json log
+KRaw 78.6 93.2 ckpt json log
OmniSource 81.3 94.8 ckpt json log

SlowOnly-8x8-ResNet50

Setting Top-1 Top-5 ckpt json log
Baseline 78.6 93.9 ckpt json log
+GG-img 80.8 95.0 ckpt json log
+[GG-IG]-img 81.3 95.2 ckpt json log
+IG-vid 82.4 95.6 ckpt json log
+KRaw 80.3 94.5 ckpt json log
OmniSource 82.9 95.8 ckpt json log

We also list the benchmark in the original paper which run on Kinetics-400 for comparison:

Model Baseline +GG-img +[GG-IG]-img +IG-vid +KRaw OmniSource
TSN-3seg-ResNet50 70.6 / 89.4 71.5 / 89.5 72.0 / 90.0 72.0 / 90.3 71.7 / 89.6 73.6 / 91.0
SlowOnly-4x16-ResNet50 73.8 / 90.9 74.5 / 91.4 75.2 / 91.6 75.2 / 91.7 74.5 / 91.1 76.6 / 92.5

Citing OmniSource

If you find OmniSource useful for your research, please consider citing the paper using the following BibTeX entry.

[ALGORITHM]

@article{duan2020omni,
  title={Omni-sourced Webly-supervised Learning for Video Recognition},
  author={Duan, Haodong and Zhao, Yue and Xiong, Yuanjun and Liu, Wentao and Lin, Dahua},
  journal={arXiv preprint arXiv:2003.13042},
  year={2020}
}