Skip to content

Commit

Permalink
Merge pull request #10 from congee524/add_videomae_modelzoo
Browse files Browse the repository at this point in the history
[Doc] add modelzoo for videomae
  • Loading branch information
yinanhe committed Feb 2, 2023
2 parents 834d9a3 + 9992aa2 commit 8453b66
Show file tree
Hide file tree
Showing 3 changed files with 72 additions and 22 deletions.
36 changes: 14 additions & 22 deletions Pretrain/VideoMAE/README.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,26 @@
# VideoMAE
代码继承自官方库 [VideoMAE](https://github.com/MCG-NJU/VideoMAE),没有太多修改,主要增加多帧大分辨率部分,完善 data aug,修改以适应集群环境
The code is modified from [VideoMAE](https://github.com/MCG-NJU/VideoMAE), and the following features have been added:

- support adjusting the input resolution and number of the frames when fine-tuning (The original offical codebase only support adjusting the number of frames)
- support applying repeated augmentation when pre-training

## Installation
- python 3.6 or higher
- pytorch 1.8 or higher (推荐 pytorch 1.12 及以上,有效降低显存占用)
- pytorch 1.8 or higher
- timm==0.4.8/0.4.12
- deepspeed==0.5.8 (`DS_BUILD_OPS=1 pip install deepspeed`)
- deepspeed==0.5.8
- TensorboardX
- decord
- einops
- opencv-python
- petrel sdk (用于读取 ceph 上数据,若直接读取本地磁盘不用安装)

pytorch 推荐 1.12 或以上的版本,能有效降低现存,timm 版本过高有 API 不兼容的风险,deepspeed 需要编译安装,由于服务器环境问题,部分算子无法安装,可以跳过(例如 `DS_BUILD_OPS=1 DS_BUILD_AIO=0 pip install deepspeed`

## Data
data list 存放在 `/mnt/petrelfs/share_data/huangbingkun/data` 中, 可以将前缀 `s3://video_pub` 修改为可公共访问的 `/mnt/petrelfs/videointern`,直接从磁盘读取数据

## PreTrain
训练脚本在 `scripts/pretrain` 文件夹中,都为 slurm 训练版本,参数细节参考[VideoMAE-PRETRAIN](https://github.com/MCG-NJU/VideoMAE/blob/main/PRETRAIN.md),运行示例:

```
bash scripts/pretrain/slurm_train_vit_h_hybrid_pt.sh ${JOB_NAME}
```
- (optional) petrel sdk (for reading the data on ceph)

## Finetune
训练脚本在 `scripts/finetune` 文件夹中,都为 slurm 训练版本,参数细节参考[VideoMAE-FINETUNE]https://github.com/MCG-NJU/VideoMAE/blob/main/FINETUNE.md),运行示例:
## ModelZoo

```
bash scripts/finetune/slurm_train_vit_h_k400_ft.sh ${JOB_NAME}
```
| Backbone | Pretrain Data | Finetune Data | Epoch | \#Frame | Pre-train | Fine-tune | Top-1 | Top-5 |
| :------: | :-----: | :-----:| :---: | :-------: | :----------------------: | :--------------------: | :---: | :---: |
| ViT-B | UnlabeledHybrid | Kinetics-400 | 800 | 16 x 5 x 3 | [vit_b_hybrid_pt_800e.pth](https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/pretrain/videomae/vit_b_hybrid_pt_800e.pth) | [vit_b_hybrid_pt_800e_k400_ft.pth](https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/pretrain/videomae/vit_b_hybrid_pt_800e_k400_ft.pth) | 81.52 | 94.88 |
| ViT-B | UnlabeledHybrid | Something-Something V2 | 800 | 16 x 2 x 3 | same as above | [vit_b_hybrid_pt_800e_ssv2_ft.pth](https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/pretrain/videomae/vit_b_hybrid_pt_800e_ssv2_ft.pth) | 71.22 | 93.31 |

若只测试结果,在最后添加 `--eval` 即可
## Others
Please refer to [VideoMAE](https://github.com/MCG-NJU/VideoMAE) for Data, Pretrain and Finetune sections.
32 changes: 32 additions & 0 deletions Pretrain/VideoMAE/scripts/finetune/dist_train_vit_b_k400_ft.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Set the path to save checkpoints
OUTPUT_DIR='YOUR_PATH/k400_videomae_pretrain_base_patch16_224_frame_16x4_tube_mask_ratio_0.9_e800/eval_lr_1e-3_epoch_100'
# path to Kinetics set (train.csv/val.csv/test.csv)
DATA_PATH='YOUR_PATH/list_kinetics-400'
# path to pretrain model
MODEL_PATH='YOUR_PATH/k400_videomae_pretrain_base_patch16_224_frame_16x4_tube_mask_ratio_0.9_e800/checkpoint-799.pth'

# batch_size can be adjusted according to number of GPUs
# this script is for 64 GPUs (8 nodes x 8 GPUs)
OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 \
--master_port 12320 --nnodes=8 --node_rank=$1 --master_addr=$2 \
run_class_finetuning.py \
--model vit_base_patch16_224 \
--data_path ${DATA_PATH} \
--finetune ${MODEL_PATH} \
--log_dir ${OUTPUT_DIR} \
--output_dir ${OUTPUT_DIR} \
--batch_size 16 \
--input_size 224 \
--short_side_size 224 \
--save_ckpt_freq 10 \
--num_frames 16 \
--sampling_rate 4 \
--num_workers 8 \
--opt adamw \
--lr 1e-3 \
--opt_betas 0.9 0.999 \
--weight_decay 0.05 \
--test_num_segment 5 \
--test_num_crop 3 \
--epochs 100 \
--dist_eval --enable_deepspeed
26 changes: 26 additions & 0 deletions Pretrain/VideoMAE/scripts/pretrain/dist_train_vit_b_k400_pt.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Set the path to save checkpoints
OUTPUT_DIR='YOUR_PATH/k400_videomae_pretrain_base_patch16_224_frame_16x4_tube_mask_ratio_0.9_e800'
# Set the path to Kinetics train set.
DATA_PATH='YOUR_PATH/list_kinetics-400/train.csv'

# batch_size can be adjusted according to number of GPUs
# this script is for 64 GPUs (8 nodes x 8 GPUs)
OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 \
--master_port 12320 --nnodes=8 --node_rank=$1 --master_addr=$2 \
run_mae_pretraining.py \
--data_path ${DATA_PATH} \
--mask_type t_consist \
--mask_ratio 0.9 \
--model pretrain_mae_base_patch16_224 \
--decoder_depth 4 \
--batch_size 64 \
--num_frames 16 \
--sampling_rate 4 \
--num_workers 16 \
--opt adamw \
--opt_betas 0.9 0.95 \
--warmup_epochs 40 \
--save_ckpt_freq 200 \
--epochs 801 \
--log_dir ${OUTPUT_DIR} \
--output_dir ${OUTPUT_DIR}

0 comments on commit 8453b66

Please sign in to comment.