Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

This repository is built upon BEiT, thanks very much!

Now, we only implement the pretrain process according to the paper, and can't guarantee the performance reported in the paper can be reproduced!

Difference

At the same time, shuffle and unshuffle operations don't seem to be directly accessible in pytorch, so we use another method to realize this process:

For shuffle, we used the method of randomly generating mask-map (14x14) in BEiT, where mask=0 illustrates keep the token, mask=1 denotes drop the token (not participating caculation in Encoder). Then all visible tokens (mask=0) are put into encoder network.
For unshuffle, we get the postion embeddings (with adding the shared mask token) of all mask tokens according to the mask-map and then concate them with the visible tokens (from encoder), and put them into the decoder network to recontrust.

TODO

implement the finetune process
reuse the model in modeling_pretrain.py
caculate the normalized pixels target
add the cls token in the encoder
...

Setup

pip install -r requirements.txt

Run

# Set the path to save checkpoints
OUTPUT_DIR='output/'
# path to imagenet-1k train set
DATA_PATH='../ImageNet_ILSVRC2012/train'


OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 run_mae_pretraining.py \
        --data_path ${DATA_PATH} \
        --mask_ratio 0.75 \
        --model pretrain_mae_base_patch16_224 \
        --batch_size 128 \
        --opt_betas 0.9 0.95 \
        --warmup_epochs 40 \
        --epochs 1600 \
        --output_dir ${OUTPUT_DIR}

Result

model	pretrain	finetune	accuracy
vit-base	400e	100e	83.1%

(the full code of finetune will be released tommorw or the day after tommorw)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
dataset_folder.py		dataset_folder.py
datasets.py		datasets.py
engine_for_pretraining.py		engine_for_pretraining.py
masking_generator.py		masking_generator.py
modeling_finetune.py		modeling_finetune.py
modeling_pretrain.py		modeling_pretrain.py
optim_factory.py		optim_factory.py
requirements.txt		requirements.txt
run_mae_pretraining.py		run_mae_pretraining.py
transforms.py		transforms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Difference

TODO

Setup

Run

Result

About

Releases

Packages

Languages

Zikun97/MAE-pytorch

Folders and files

Latest commit

History

Repository files navigation

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Difference

TODO

Setup

Run

Result

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages