Skip to content

Latest commit

 

History

History
106 lines (89 loc) · 12.2 KB

MODEL_ZOO.md

File metadata and controls

106 lines (89 loc) · 12.2 KB

MODEL ZOO


TAdaConvV2

Kinetics 710 pretrained

arch. pt. #frames ckp.
TAdaFormer-B/16 CLIP 16 ckp
TAdaFormer-L/14 CLIP 16 ckp
TAdaFormer-L/14 CLIP 32 ckp
TAdaFormer-L/14 CLIP 64 ckp

Kinetics 400

arch. pt. #frames GFLOPS top1 ckp.
TAdaConvNeXtV2-T IN1K 16 47x3x4 79.6 ckp
TAdaConvNeXtV2-T IN1K 32 94x3x4 80.8 ckp
TAdaConvNeXtV2-S IN1K 16 91x3x4 80.8 ckp
TAdaConvNeXtV2-S IN1K 32 183x3x4 81.9 ckp
TAdaConvNeXtV2-S IN21K 32 183x3x4 82.9 ckp
TAdaConvNeXtV2-B IN1K 16 162x3x4 81.4 ckp
TAdaConvneXtV2-B IN1K 32 324x3x4 82.3 ckp
TAdaConvNeXtV2-B IN21K 32 324x3x4 83.7 ckp
arch. pt. #frames GFLOPS top1 ckp.
TAdaFormer-B/16 CLIP 16 153x3x4 84.5 ckp
TAdaFormer-L/14 CLIP 16 703x3x4 87.6 ckp
TAdaFormer-B/16 CLIP+K710 16 153x3x4 86.6 ckp
TAdaFormer-L/14 CLIP+K710 16 703x3x4 88.9 ckp
TAdaFormer-L/14 CLIP+K710 32 1406x3x4 89.5 ckp
TAdaFormer-L/14 CLIP+K710 64 2812x3x4 89.9 ckp

Something-Something

The checkpoints in this part is provided for SSV2.

arch. pt. #frames GFLOPS SSV1 SSV2 ckp.
TAdaConvNeXtV2-T IN1K+K400 16 47x3x2 54.1 67.2 ckp
TAdaConvNeXtV2-T IN1K+K400 32 94x3x2 56.4 69.8 ckp
TAdaConvNeXtV2-S IN1K+K400 16 91x3x2 55.6 68.4 ckp
TAdaConvNeXtV2-S IN1K+K400 32 183x3x2 58.5 70.0 ckp
TAdaConvNeXtV2-S IN21K+K400 32 183x3x2 59.7 70.6 ckp
TAdaConvneXtV2-B IN21K+K400 32 324x3x2 60.7 71.1 ckp
arch. pt. #frames GFLOPS SSV1 SSV2 ckp.
TAdaFormer-B/16 CLIP 16 187x3x2 59.2 70.4 ckp
TAdaFormer-B/16 CLIP 32 374x3x2 61.2 71.3 ckp
TAdaFormer-L/14 CLIP 16 858x3x2 62.0 72.4 ckp
TAdaFormer-L/14 CLIP 32 1716x3x2 63.7 73.6 ckp

TAdaConv

Kinetics-400

architecture depth init clips x crops #frames x sampling rate acc@1 acc@5 checkpoint config
TAda2D R50 IN-1K 10 x 3 8 x 8 76.7 92.6 [google drive][baidu(code:p06d)] tada2d_8x8.yaml
TAda2D R50 IN-1K 10 x 3 16 x 5 77.4 93.1 [google drive][baidu(code:6k8h)] tada2d_16x5.yaml
ViViT Fact. Enc. B16x2 IN-21K 4 x 3 32 x 2 79.4 94.0 [google drive][baidu(code:1t51)] vivit_fac_enc_b16x2.yaml

Something-Something

architecture depth init clips x crops #frames acc@1 acc@5 checkpoint config
TAda2D R50 IN-1K 2 x 3 8 64.2 88.0 [google drive][baidu(code:dlil)] tada2d_8f.yaml
TAda2D R50 IN-1K 2 x 3 16 65.6 89.1 [google drive][baidu(code:f857)] tada2d_16f.yaml

Epic-Kitchens Action Recognition

architecture init resolution clips x crops #frames x sampling rate action acc@1 verb acc@1 noun acc@1 checkpoint config
ViViT Fact. Enc.-B16x2 K700 320 4 x 3 32 x 2 46.3 67.4 58.9 [google drive][baidu(code:rinh)] vivit_fac_enc.yaml
ir-CSN-R152 K700 224 10 x 3 32 x 2 44.5 68.4 55.9 [google drive][baidu(code:s0uj)] csn.yaml

Epic-Kitchens Temporal Action Localization

feature classification type IoU@0.1 IoU@0.2 IoU@0.3 IoU@0.4 IoU@0.5 Avg checkpoint config
ViViT ViViT Verb 22.90 21.93 20.74 19.08 16.00 20.13 [google drive][baidu(code:3sud)] vivit-os-local.yaml
ViViT ViViT Noun 28.95 27.38 25.52 22.67 18.95 24.69 [google drive][baidu(code:3sud)] vivit-os-local.yaml
ViViT ViViT Action 20.82 19.93 18.67 17.02 15.06 18.30 [google drive][baidu(code:3sud)] vivit-os-local.yaml
TAda2D TAda2D Verb 19.70 18.49 17.41 15.50 12.78 16.78 [google drive][baidu(code:d01j)] -
TAda2D TAda2D Noun 20.54 19.32 17.94 15.77 13.39 17.39 [google drive][baidu(code:d01j)] -
TAda2D TAda2D Action 15.15 14.32 13.59 12.18 10.65 13.18 [google drive][baidu(code:d01j)] -

MoSI

Note: for the following models, decord 0.4.1 are used rather than the default 0.6.0 for the codebase.

Pre-trained

dataset backbone checkpoint config
HMDB51 R-2D3D-18 [google drive][baidu(code:ahqg)] pt-hmdb/r2d3ds.yaml
HMDB51 R(2+1)D-10 [google drive][baidu(code:1ktb)] pt-hmdb/r2p1d.yaml
UCF101 R-2D3D-18 [google drive][baidu(code:61uw)] pt-ucf/r2d3ds.yaml
UCF101 R(2+1)D-10 [google drive][baidu(code:drq2)] pt-ucf/r2p1d.yaml

Finetuned

dataset backbone acc@1 acc@5 checkpoint config
HMDB51 R-2D3D-18 46.93 74.71 [google drive][baidu(code:2puu)] ft-hmdb/r2d3ds.yaml
HMDB51 R(2+1)D-10 51.83 78.63 [google drive][baidu(code:hgnc)] ft-hmdb/r2p1d.yaml
UCF101 R-2D3D-18 71.75 89.14 [google drive][baidu(code:ndt6)] ft-ucf/r2d3ds.yaml
UCF101 R(2+1)D-10 82.79 95.78 [google drive][baidu(code:ecsf)] ft-ucf/r2p1d.yaml