-
Notifications
You must be signed in to change notification settings - Fork 72
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #10 from congee524/add_videomae_modelzoo
[Doc] add modelzoo for videomae
- Loading branch information
Showing
3 changed files
with
72 additions
and
22 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,34 +1,26 @@ | ||
# VideoMAE | ||
代码继承自官方库 [VideoMAE](https://github.com/MCG-NJU/VideoMAE),没有太多修改,主要增加多帧大分辨率部分,完善 data aug,修改以适应集群环境 | ||
The code is modified from [VideoMAE](https://github.com/MCG-NJU/VideoMAE), and the following features have been added: | ||
|
||
- support adjusting the input resolution and number of the frames when fine-tuning (The original offical codebase only support adjusting the number of frames) | ||
- support applying repeated augmentation when pre-training | ||
|
||
## Installation | ||
- python 3.6 or higher | ||
- pytorch 1.8 or higher (推荐 pytorch 1.12 及以上,有效降低显存占用) | ||
- pytorch 1.8 or higher | ||
- timm==0.4.8/0.4.12 | ||
- deepspeed==0.5.8 (`DS_BUILD_OPS=1 pip install deepspeed`) | ||
- deepspeed==0.5.8 | ||
- TensorboardX | ||
- decord | ||
- einops | ||
- opencv-python | ||
- petrel sdk (用于读取 ceph 上数据,若直接读取本地磁盘不用安装) | ||
|
||
pytorch 推荐 1.12 或以上的版本,能有效降低现存,timm 版本过高有 API 不兼容的风险,deepspeed 需要编译安装,由于服务器环境问题,部分算子无法安装,可以跳过(例如 `DS_BUILD_OPS=1 DS_BUILD_AIO=0 pip install deepspeed`) | ||
|
||
## Data | ||
data list 存放在 `/mnt/petrelfs/share_data/huangbingkun/data` 中, 可以将前缀 `s3://video_pub` 修改为可公共访问的 `/mnt/petrelfs/videointern`,直接从磁盘读取数据 | ||
|
||
## PreTrain | ||
训练脚本在 `scripts/pretrain` 文件夹中,都为 slurm 训练版本,参数细节参考[VideoMAE-PRETRAIN](https://github.com/MCG-NJU/VideoMAE/blob/main/PRETRAIN.md),运行示例: | ||
|
||
``` | ||
bash scripts/pretrain/slurm_train_vit_h_hybrid_pt.sh ${JOB_NAME} | ||
``` | ||
- (optional) petrel sdk (for reading the data on ceph) | ||
|
||
## Finetune | ||
训练脚本在 `scripts/finetune` 文件夹中,都为 slurm 训练版本,参数细节参考[VideoMAE-FINETUNE]https://github.com/MCG-NJU/VideoMAE/blob/main/FINETUNE.md),运行示例: | ||
## ModelZoo | ||
|
||
``` | ||
bash scripts/finetune/slurm_train_vit_h_k400_ft.sh ${JOB_NAME} | ||
``` | ||
| Backbone | Pretrain Data | Finetune Data | Epoch | \#Frame | Pre-train | Fine-tune | Top-1 | Top-5 | | ||
| :------: | :-----: | :-----:| :---: | :-------: | :----------------------: | :--------------------: | :---: | :---: | | ||
| ViT-B | UnlabeledHybrid | Kinetics-400 | 800 | 16 x 5 x 3 | [vit_b_hybrid_pt_800e.pth](https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/pretrain/videomae/vit_b_hybrid_pt_800e.pth) | [vit_b_hybrid_pt_800e_k400_ft.pth](https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/pretrain/videomae/vit_b_hybrid_pt_800e_k400_ft.pth) | 81.52 | 94.88 | | ||
| ViT-B | UnlabeledHybrid | Something-Something V2 | 800 | 16 x 2 x 3 | same as above | [vit_b_hybrid_pt_800e_ssv2_ft.pth](https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/pretrain/videomae/vit_b_hybrid_pt_800e_ssv2_ft.pth) | 71.22 | 93.31 | | ||
|
||
若只测试结果,在最后添加 `--eval` 即可 | ||
## Others | ||
Please refer to [VideoMAE](https://github.com/MCG-NJU/VideoMAE) for Data, Pretrain and Finetune sections. |
32 changes: 32 additions & 0 deletions
32
Pretrain/VideoMAE/scripts/finetune/dist_train_vit_b_k400_ft.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Set the path to save checkpoints | ||
OUTPUT_DIR='YOUR_PATH/k400_videomae_pretrain_base_patch16_224_frame_16x4_tube_mask_ratio_0.9_e800/eval_lr_1e-3_epoch_100' | ||
# path to Kinetics set (train.csv/val.csv/test.csv) | ||
DATA_PATH='YOUR_PATH/list_kinetics-400' | ||
# path to pretrain model | ||
MODEL_PATH='YOUR_PATH/k400_videomae_pretrain_base_patch16_224_frame_16x4_tube_mask_ratio_0.9_e800/checkpoint-799.pth' | ||
|
||
# batch_size can be adjusted according to number of GPUs | ||
# this script is for 64 GPUs (8 nodes x 8 GPUs) | ||
OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 \ | ||
--master_port 12320 --nnodes=8 --node_rank=$1 --master_addr=$2 \ | ||
run_class_finetuning.py \ | ||
--model vit_base_patch16_224 \ | ||
--data_path ${DATA_PATH} \ | ||
--finetune ${MODEL_PATH} \ | ||
--log_dir ${OUTPUT_DIR} \ | ||
--output_dir ${OUTPUT_DIR} \ | ||
--batch_size 16 \ | ||
--input_size 224 \ | ||
--short_side_size 224 \ | ||
--save_ckpt_freq 10 \ | ||
--num_frames 16 \ | ||
--sampling_rate 4 \ | ||
--num_workers 8 \ | ||
--opt adamw \ | ||
--lr 1e-3 \ | ||
--opt_betas 0.9 0.999 \ | ||
--weight_decay 0.05 \ | ||
--test_num_segment 5 \ | ||
--test_num_crop 3 \ | ||
--epochs 100 \ | ||
--dist_eval --enable_deepspeed |
26 changes: 26 additions & 0 deletions
26
Pretrain/VideoMAE/scripts/pretrain/dist_train_vit_b_k400_pt.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Set the path to save checkpoints | ||
OUTPUT_DIR='YOUR_PATH/k400_videomae_pretrain_base_patch16_224_frame_16x4_tube_mask_ratio_0.9_e800' | ||
# Set the path to Kinetics train set. | ||
DATA_PATH='YOUR_PATH/list_kinetics-400/train.csv' | ||
|
||
# batch_size can be adjusted according to number of GPUs | ||
# this script is for 64 GPUs (8 nodes x 8 GPUs) | ||
OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 \ | ||
--master_port 12320 --nnodes=8 --node_rank=$1 --master_addr=$2 \ | ||
run_mae_pretraining.py \ | ||
--data_path ${DATA_PATH} \ | ||
--mask_type t_consist \ | ||
--mask_ratio 0.9 \ | ||
--model pretrain_mae_base_patch16_224 \ | ||
--decoder_depth 4 \ | ||
--batch_size 64 \ | ||
--num_frames 16 \ | ||
--sampling_rate 4 \ | ||
--num_workers 16 \ | ||
--opt adamw \ | ||
--opt_betas 0.9 0.95 \ | ||
--warmup_epochs 40 \ | ||
--save_ckpt_freq 200 \ | ||
--epochs 801 \ | ||
--log_dir ${OUTPUT_DIR} \ | ||
--output_dir ${OUTPUT_DIR} |