OpenGVLab · yinanhe · Feb 2, 2023 · Feb 2, 2023 · Feb 2, 2023
diff --git a/Pretrain/VideoMAE/README.md b/Pretrain/VideoMAE/README.md
@@ -1,34 +1,26 @@
 # VideoMAE
-代码继承自官方库 [VideoMAE](https://github.com/MCG-NJU/VideoMAE)，没有太多修改，主要增加多帧大分辨率部分，完善 data aug，修改以适应集群环境
+The code is modified from [VideoMAE](https://github.com/MCG-NJU/VideoMAE), and the following features have been added:
+
+- support adjusting the input resolution and number of the frames when fine-tuning (The original offical codebase only support adjusting the number of frames)
+- support applying repeated augmentation when pre-training
 
 ## Installation
 - python 3.6 or higher
-- pytorch 1.8 or higher (推荐 pytorch 1.12 及以上，有效降低显存占用)
+- pytorch 1.8 or higher
 - timm==0.4.8/0.4.12
-- deepspeed==0.5.8 (`DS_BUILD_OPS=1 pip install deepspeed`)
+- deepspeed==0.5.8
 - TensorboardX
 - decord
 - einops
 - opencv-python
-- petrel sdk (用于读取 ceph 上数据，若直接读取本地磁盘不用安装)
-
-pytorch 推荐 1.12 或以上的版本，能有效降低现存，timm 版本过高有 API 不兼容的风险，deepspeed 需要编译安装，由于服务器环境问题，部分算子无法安装，可以跳过（例如 `DS_BUILD_OPS=1 DS_BUILD_AIO=0 pip install deepspeed`）
-
-## Data
-data list 存放在 `/mnt/petrelfs/share_data/huangbingkun/data` 中， 可以将前缀 `s3://video_pub` 修改为可公共访问的 `/mnt/petrelfs/videointern`，直接从磁盘读取数据
-
-## PreTrain
-训练脚本在 `scripts/pretrain` 文件夹中，都为 slurm 训练版本，参数细节参考[VideoMAE-PRETRAIN](https://github.com/MCG-NJU/VideoMAE/blob/main/PRETRAIN.md)，运行示例：
-
-```
-bash scripts/pretrain/slurm_train_vit_h_hybrid_pt.sh ${JOB_NAME}
-```
+- (optional) petrel sdk (for reading the data on ceph)
 
-## Finetune
-训练脚本在 `scripts/finetune` 文件夹中，都为 slurm 训练版本，参数细节参考[VideoMAE-FINETUNE]https://github.com/MCG-NJU/VideoMAE/blob/main/FINETUNE.md)，运行示例：
+## ModelZoo
 
-```
-bash scripts/finetune/slurm_train_vit_h_k400_ft.sh ${JOB_NAME}
-```
+| Backbone | Pretrain Data | Finetune Data | Epoch | \#Frame | Pre-train | Fine-tune | Top-1 | Top-5 |
+| :------: | :-----: | :-----:| :---: | :-------: | :----------------------: | :--------------------: | :---: | :---: |
+| ViT-B | UnlabeledHybrid | Kinetics-400 | 800 | 16 x 5 x 3 | [vit_b_hybrid_pt_800e.pth](https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/pretrain/videomae/vit_b_hybrid_pt_800e.pth) | [vit_b_hybrid_pt_800e_k400_ft.pth](https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/pretrain/videomae/vit_b_hybrid_pt_800e_k400_ft.pth) | 81.52 | 94.88 |
+| ViT-B | UnlabeledHybrid | Something-Something V2 | 800 | 16 x 2 x 3 | same as above | [vit_b_hybrid_pt_800e_ssv2_ft.pth](https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/pretrain/videomae/vit_b_hybrid_pt_800e_ssv2_ft.pth) | 71.22 | 93.31 |
 
-若只测试结果，在最后添加 `--eval` 即可
+## Others
+Please refer to [VideoMAE](https://github.com/MCG-NJU/VideoMAE) for Data, Pretrain and Finetune sections.
diff --git a/Pretrain/VideoMAE/scripts/finetune/dist_train_vit_b_k400_ft.sh b/Pretrain/VideoMAE/scripts/finetune/dist_train_vit_b_k400_ft.sh
@@ -0,0 +1,32 @@
+# Set the path to save checkpoints
+OUTPUT_DIR='YOUR_PATH/k400_videomae_pretrain_base_patch16_224_frame_16x4_tube_mask_ratio_0.9_e800/eval_lr_1e-3_epoch_100'
+# path to Kinetics set (train.csv/val.csv/test.csv)
+DATA_PATH='YOUR_PATH/list_kinetics-400'
+# path to pretrain model
+MODEL_PATH='YOUR_PATH/k400_videomae_pretrain_base_patch16_224_frame_16x4_tube_mask_ratio_0.9_e800/checkpoint-799.pth'
+
+# batch_size can be adjusted according to number of GPUs
+# this script is for 64 GPUs (8 nodes x 8 GPUs)
+OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 \
+        --master_port 12320 --nnodes=8  --node_rank=$1 --master_addr=$2 \
+        run_class_finetuning.py \
+        --model vit_base_patch16_224 \
+        --data_path ${DATA_PATH} \
+        --finetune ${MODEL_PATH} \
+        --log_dir ${OUTPUT_DIR} \
+        --output_dir ${OUTPUT_DIR} \
+        --batch_size 16 \
+        --input_size 224 \
+        --short_side_size 224 \
+        --save_ckpt_freq 10 \
+        --num_frames 16 \
+        --sampling_rate 4 \
+        --num_workers 8 \
+        --opt adamw \
+        --lr 1e-3 \
+        --opt_betas 0.9 0.999 \
+        --weight_decay 0.05 \
+        --test_num_segment 5 \
+        --test_num_crop 3 \
+        --epochs 100 \
+        --dist_eval --enable_deepspeed
diff --git a/Pretrain/VideoMAE/scripts/pretrain/dist_train_vit_b_k400_pt.sh b/Pretrain/VideoMAE/scripts/pretrain/dist_train_vit_b_k400_pt.sh
@@ -0,0 +1,26 @@
+# Set the path to save checkpoints
+OUTPUT_DIR='YOUR_PATH/k400_videomae_pretrain_base_patch16_224_frame_16x4_tube_mask_ratio_0.9_e800'
+# Set the path to Kinetics train set. 
+DATA_PATH='YOUR_PATH/list_kinetics-400/train.csv'
+
+# batch_size can be adjusted according to number of GPUs
+# this script is for 64 GPUs (8 nodes x 8 GPUs)
+OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 \
+        --master_port 12320 --nnodes=8 --node_rank=$1 --master_addr=$2 \
+        run_mae_pretraining.py \
+        --data_path ${DATA_PATH} \
+        --mask_type t_consist  \
+        --mask_ratio 0.9 \
+        --model pretrain_mae_base_patch16_224 \
+        --decoder_depth 4 \
+        --batch_size 64 \
+        --num_frames 16 \
+        --sampling_rate 4 \
+        --num_workers 16 \
+        --opt adamw \
+        --opt_betas 0.9 0.95 \
+        --warmup_epochs 40 \
+        --save_ckpt_freq 200 \
+        --epochs 801 \
+        --log_dir ${OUTPUT_DIR} \
+        --output_dir ${OUTPUT_DIR}