Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] add modelzoo for videomae #10

Merged
merged 2 commits into from
Feb 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 14 additions & 22 deletions Pretrain/VideoMAE/README.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,26 @@
# VideoMAE
代码继承自官方库 [VideoMAE](https://github.com/MCG-NJU/VideoMAE),没有太多修改,主要增加多帧大分辨率部分,完善 data aug,修改以适应集群环境
The code is modified from [VideoMAE](https://github.com/MCG-NJU/VideoMAE), and the following features have been added:

- support adjusting the input resolution and number of the frames when fine-tuning (The original offical codebase only support adjusting the number of frames)
- support applying repeated augmentation when pre-training

## Installation
- python 3.6 or higher
- pytorch 1.8 or higher (推荐 pytorch 1.12 及以上,有效降低显存占用)
- pytorch 1.8 or higher
- timm==0.4.8/0.4.12
- deepspeed==0.5.8 (`DS_BUILD_OPS=1 pip install deepspeed`)
- deepspeed==0.5.8
- TensorboardX
- decord
- einops
- opencv-python
- petrel sdk (用于读取 ceph 上数据,若直接读取本地磁盘不用安装)

pytorch 推荐 1.12 或以上的版本,能有效降低现存,timm 版本过高有 API 不兼容的风险,deepspeed 需要编译安装,由于服务器环境问题,部分算子无法安装,可以跳过(例如 `DS_BUILD_OPS=1 DS_BUILD_AIO=0 pip install deepspeed`)

## Data
data list 存放在 `/mnt/petrelfs/share_data/huangbingkun/data` 中, 可以将前缀 `s3://video_pub` 修改为可公共访问的 `/mnt/petrelfs/videointern`,直接从磁盘读取数据

## PreTrain
训练脚本在 `scripts/pretrain` 文件夹中,都为 slurm 训练版本,参数细节参考[VideoMAE-PRETRAIN](https://github.com/MCG-NJU/VideoMAE/blob/main/PRETRAIN.md),运行示例:

```
bash scripts/pretrain/slurm_train_vit_h_hybrid_pt.sh ${JOB_NAME}
```
- (optional) petrel sdk (for reading the data on ceph)

## Finetune
训练脚本在 `scripts/finetune` 文件夹中,都为 slurm 训练版本,参数细节参考[VideoMAE-FINETUNE]https://github.com/MCG-NJU/VideoMAE/blob/main/FINETUNE.md),运行示例:
## ModelZoo

```
bash scripts/finetune/slurm_train_vit_h_k400_ft.sh ${JOB_NAME}
```
| Backbone | Pretrain Data | Finetune Data | Epoch | \#Frame | Pre-train | Fine-tune | Top-1 | Top-5 |
| :------: | :-----: | :-----:| :---: | :-------: | :----------------------: | :--------------------: | :---: | :---: |
| ViT-B | UnlabeledHybrid | Kinetics-400 | 800 | 16 x 5 x 3 | [vit_b_hybrid_pt_800e.pth](https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/pretrain/videomae/vit_b_hybrid_pt_800e.pth) | [vit_b_hybrid_pt_800e_k400_ft.pth](https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/pretrain/videomae/vit_b_hybrid_pt_800e_k400_ft.pth) | 81.52 | 94.88 |
| ViT-B | UnlabeledHybrid | Something-Something V2 | 800 | 16 x 2 x 3 | same as above | [vit_b_hybrid_pt_800e_ssv2_ft.pth](https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/pretrain/videomae/vit_b_hybrid_pt_800e_ssv2_ft.pth) | 71.22 | 93.31 |

若只测试结果,在最后添加 `--eval` 即可
## Others
Please refer to [VideoMAE](https://github.com/MCG-NJU/VideoMAE) for Data, Pretrain and Finetune sections.
32 changes: 32 additions & 0 deletions Pretrain/VideoMAE/scripts/finetune/dist_train_vit_b_k400_ft.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Set the path to save checkpoints
OUTPUT_DIR='YOUR_PATH/k400_videomae_pretrain_base_patch16_224_frame_16x4_tube_mask_ratio_0.9_e800/eval_lr_1e-3_epoch_100'
# path to Kinetics set (train.csv/val.csv/test.csv)
DATA_PATH='YOUR_PATH/list_kinetics-400'
# path to pretrain model
MODEL_PATH='YOUR_PATH/k400_videomae_pretrain_base_patch16_224_frame_16x4_tube_mask_ratio_0.9_e800/checkpoint-799.pth'

# batch_size can be adjusted according to number of GPUs
# this script is for 64 GPUs (8 nodes x 8 GPUs)
OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 \
--master_port 12320 --nnodes=8 --node_rank=$1 --master_addr=$2 \
run_class_finetuning.py \
--model vit_base_patch16_224 \
--data_path ${DATA_PATH} \
--finetune ${MODEL_PATH} \
--log_dir ${OUTPUT_DIR} \
--output_dir ${OUTPUT_DIR} \
--batch_size 16 \
--input_size 224 \
--short_side_size 224 \
--save_ckpt_freq 10 \
--num_frames 16 \
--sampling_rate 4 \
--num_workers 8 \
--opt adamw \
--lr 1e-3 \
--opt_betas 0.9 0.999 \
--weight_decay 0.05 \
--test_num_segment 5 \
--test_num_crop 3 \
--epochs 100 \
--dist_eval --enable_deepspeed
26 changes: 26 additions & 0 deletions Pretrain/VideoMAE/scripts/pretrain/dist_train_vit_b_k400_pt.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Set the path to save checkpoints
OUTPUT_DIR='YOUR_PATH/k400_videomae_pretrain_base_patch16_224_frame_16x4_tube_mask_ratio_0.9_e800'
# Set the path to Kinetics train set.
DATA_PATH='YOUR_PATH/list_kinetics-400/train.csv'

# batch_size can be adjusted according to number of GPUs
# this script is for 64 GPUs (8 nodes x 8 GPUs)
OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 \
--master_port 12320 --nnodes=8 --node_rank=$1 --master_addr=$2 \
run_mae_pretraining.py \
--data_path ${DATA_PATH} \
--mask_type t_consist \
--mask_ratio 0.9 \
--model pretrain_mae_base_patch16_224 \
--decoder_depth 4 \
--batch_size 64 \
--num_frames 16 \
--sampling_rate 4 \
--num_workers 16 \
--opt adamw \
--opt_betas 0.9 0.95 \
--warmup_epochs 40 \
--save_ckpt_freq 200 \
--epochs 801 \
--log_dir ${OUTPUT_DIR} \
--output_dir ${OUTPUT_DIR}