Skip to content

Latest commit

 

History

History
24 lines (17 loc) · 1.26 KB

File metadata and controls

24 lines (17 loc) · 1.26 KB

Model Zoo

Note

  • For all the pretraining and finetuning, we adopt spaese/uniform sampling.
  • #Frame $=$ #input_frame $\times$ #crop $\times$ #clip
  • #input_frame means how many frames are input for model per inference
  • #crop means spatial crops (e.g., 3 for left/right/center)
  • #clip means temporal clips (e.g., 4 means repeted sampling four clips with different start indices)
Model Pretrain-I Pretrain-V Finetuned #Frame Weight
ViViM-T deit,IN1K - K400 16x3x4 🤗 HF link
ViViM-S deit,IN1K - K400 16x3x4 🤗 HF link

Models and Usage

Kinetics-400

Method #Frame Top-1 Acc Top-5 Acc shell
ViViM-T (Ours) 16 77.51 93.27 script.sh
ViViM-S (Ours) 16 80.47 94.75 script.sh