We currenent release the code and models for:
- Kintics-400
- Kinetics-600
- Something-Something V1
- Something-Something V2
05/21/2022
Lightweight models are released, which surpass X3D and MoViNet.
01/13/2022
Pretrained models on Kinetics-400, Kinetics-600, Something-Something V1&V2 d
The followed models and logs can be downloaded on Google Drive: total_models, total_logs.
We also release the models on Baidu Cloud: total_models (gphp), total_logs (q5bw).
- All the
config.yaml
in ourexp
are NOT the training config actually used, since some hyperparameters are changed in therun.sh
ortest.sh
. - All the models are pretrained on ImageNet-1K without Token Labeling and Layer Scale. You can find those pre-trained models in image_classification. Reason can be found in issue #12.
- #Frame = #input_frame x #crop x #clip
- #input_frame means how many frames are input for model per inference
- #crop means spatial crops (e.g., 3 for left/right/center)
- #clip means temporal clips (e.g., 4 means repeted sampling four clips with different start indices)
Model | #Frame | Resolution | FLOPs | Top1 | Model | Log | Shell |
---|---|---|---|---|---|---|---|
UniFormer-XXS | 4x1x1 | 128 | 1.0G | 63.2 | run.sh/config | ||
UniFormer-XXS | 4x1x1 | 160 | 1.6G | 65.8 | run.sh/config | ||
UniFormer-XXS | 8x1x1 | 128 | 2.0G | 68.3 | run.sh/config | ||
UniFormer-XXS | 8x1x1 | 160 | 3.3G | 71.4 | run.sh/config | ||
UniFormer-XXS | 16x1x1 | 128 | 4.2G | 73.3 | run.sh/config | ||
UniFormer-XXS | 16x1x1 | 160 | 6.9G | 75.1 | run.sh/config | ||
UniFormer-XXS | 32x1x1 | 160 | 15.4G | 77.9 | run.sh/config | ||
UniFormer-XS | 32x1x1 | 192 | 34.2G | 78.6 | run.sh/config |
We adopt sparse sampling method for lightweight models. And to avoid loss NAN , we use the following techiniques:
- Close mixed precision training.
- Use weaker data augmentation.
- Add Layer Scale.
Model | #Frame | Sampling Stride | FLOPs | Top1 | Model | Log | Shell |
---|---|---|---|---|---|---|---|
UniFormer-S | 8x1x4 | 8 | 70G | 78.4 | run.sh/config | ||
UniFormer-S | 16x1x4 | 4 | 167G | 80.8 | run.sh/config | ||
UniFormer-S | 16x1x4 | 8 | 167G | 80.8 | run.sh/config | ||
UniFormer-S | 32x1x4 | 4 | 438G | 82.0 | - | run.sh/config | |
UniFormer-B | 8x1x4 | 8 | 161G | 79.8 | run.sh/config | ||
UniFormer-B | 16x1x4 | 4 | 387G | 82.0 | run.sh/config | ||
UniFormer-B | 16x1x4 | 8 | 387G | 81.7 | run.sh/config | ||
UniFormer-B | 32x1x4 | 4 | 1036G | 82.9 | run.sh/config |
Model | #Frame | Sampling Stride | FLOPs | Top1 | Model | Log | Shell |
---|---|---|---|---|---|---|---|
UniFormer-S | 16x1x4 | 4 | 167G | 82.8 | run.sh/config | ||
UniFormer-S | 16x1x4 | 8 | 167G | 82.7 | run.sh/config | ||
UniFormer-B | 16x1x4 | 4 | 387G | 84.0 | run.sh/config | ||
UniFormer-B | 16x1x4 | 8 | 387G | 83.4 | run.sh/config | ||
UniFormer-B | 32x1x4 | 4 | 1036G | 84.5* | run.sh/config |
* Since Kinetics-600 is too large to train (>1 month in single node with 8 A100 GPUs), we provide model trained in multi node (around 2 weeks with 32 V100 GPUs), but the result is lower due to the lack of tuning hyperparameters.
For Multi-node training, please install submitit or follow the training scripts in our UniFormerV2.
Model | Pretrain | #Frame | FLOPs | Top1 | Model | Log | Shell |
---|---|---|---|---|---|---|---|
UniFormer-S | K400 | 16x3x1 | 125G | 57.2 | run.sh/config | ||
UniFormer-S | K600 | 16x3x1 | 125G | 57.6 | run.sh/config | ||
UniFormer-S | K400 | 32x3x1 | 329G | 58.8 | run.sh/config | ||
UniFormer-S | K600 | 32x3x1 | 329G | 59.9 | run.sh/config | ||
UniFormer-B | K400 | 16x3x1 | 290G | 59.1 | run.sh/config | ||
UniFormer-B | K600 | 16x3x1 | 290G | 58.8 | run.sh/config | ||
UniFormer-B | K400 | 32x3x1 | 777G | 60.9 | run.sh/config | ||
UniFormer-B | K600 | 32x3x1 | 777G | 61.0 | run.sh/config |
Model | Pretrain | #Frame | FLOPs | Top1 | Model | Log | Shell |
---|---|---|---|---|---|---|---|
UniFormer-S | K400 | 16x3x1 | 125G | 67.7 | run.sh/config | ||
UniFormer-S | K600 | 16x3x1 | 125G | 69.4 | run.sh/config | ||
UniFormer-S | K400 | 32x3x1 | 329G | 69.0 | run.sh/config | ||
UniFormer-S | K600 | 32x3x1 | 329G | 70.4 | run.sh/config | ||
UniFormer-B | K400 | 16x3x1 | 290G | 70.4 | run.sh/config | ||
UniFormer-B | K600 | 16x3x1 | 290G | 70.2 | run.sh/config | ||
UniFormer-B | K400 | 32x3x1 | 777G | 71.1 | run.sh/config | ||
UniFormer-B | K600 | 32x3x1 | 777G | 71.2 | run.sh/config |
Model | #Frame | Sampling Stride | FLOPs | Top1 | Model | Log | Shell |
---|---|---|---|---|---|---|---|
UniFormer-S | 16x3x5 | 4 | 625G | 98.3 | run.sh/config |
Model | #Frame | Sampling Stride | FLOPs | Top1 | Model | Log | Shell |
---|---|---|---|---|---|---|---|
UniFormer-S | 16x3x5 | 4 | 625G | 77.5 | run.sh/config |
Please follow the installation instructions in INSTALL.md. You may follow the instructions in DATASET.md to prepare the datasets.
-
Download the pretrained models in our repository.
-
Simply run the training scripts in exp as followed:
bash ./exp/uniformer_s8x8_k400/run.sh
[Note]:
-
Due to some bugs in the SlowFast repository, the program will be terminated in the final testing.
-
During training, we follow the SlowFast repository and randomly crop videos for validation. For accurate testing, please follow our testing scripts.
-
For more config details, you can read the comments in
slowfast/config/defaults.py
. -
To avoid out of memory, you can use
torch.utils.checkpoint
(inconfig.yaml
orrun.sh
):MODEL.USE_CHECKPOINT True # whether use checkpoint MODEL.CHECKPOINT_NUM [0, 0, 4, 0] # index for using checkpoint in every stage
We provide testing example as followed:
bash ./exp/uniformer_s8x8_k400/test.sh
Specifically, we need to create our new config for testing and run multi-crop/multi-clip test:
-
Copy the training config file
config.yaml
and create new testing configtest.yaml
. -
Change the hyperparameters of data (in
test.yaml
ortest.sh
):DATA: TRAIN_JITTER_SCALES: [224, 224] TEST_CROP_SIZE: 224
-
Set the number of crops and clips (in
test.yaml
ortest.sh
):Multi-clip testing for Kinetics
TEST.NUM_ENSEMBLE_VIEWS 4 TEST.NUM_SPATIAL_CROPS 1
Multi-crop testing for Something-Something
TEST.NUM_ENSEMBLE_VIEWS 1 TEST.NUM_SPATIAL_CROPS 3
-
You can also set the checkpoint path via:
TEST.CHECKPOINT_FILE_PATH your_model_path
If you find this repository useful, please use the following BibTeX entry for citation.
@misc{li2022uniformer,
title={Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning},
author={Kunchang Li and Yali Wang and Peng Gao and Guanglu Song and Yu Liu and Hongsheng Li and Yu Qiao},
year={2022},
eprint={2201.04676},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
This repository is built based on SlowFast repository.