Skip to content

A curated list of awesome masked autoencoder in self-supervised learning

Notifications You must be signed in to change notification settings

ChaoningZhang/Awesome-Masked-Autoencoder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 

Repository files navigation

Awesome-Masked-Autoencoder

MAE

A curated list of awesome masked autoencoder papeprs in self-supervised learning

We have currently finished a survey on masked autoencoder.

We will wrap it up as soon as possible (stay tuned!)

Masked Image Modeling

Masked Autoencoders Are Scalable Vision Learners
SimMIM: A Simple Framework for Masked Image Modeling \

BEiT: Bert pre-training of image transformers
mc-BEiT: Multi-choice discretization for image bert pre-training
PeCo: Perceptual codebook for bert pre-training of vision transformers
Context autoencoder for self-supervised representation learning \

Green hierarchical vision transformer for masked image modeling
Uniform masking: Enabling mae pre-training for pyramid-based vision transformers with locality
Hivit: Hierarchical vision transformer meets masked image modeling
Efficient self-supervised vision pretraining with local masked reconstruction
Object-wise Masked Autoencoders for Fast Pre-training
MixMIM: Mixed and masked image modeling for efficient visual representation learning
MixMIM: Mixed and masked image modeling for efficient visual representation learning
ConvMAE: Masked Convolution Meets Masked Autoencoders
Unleash- ing vanilla vision transformer with masked image modeling for object detection
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN
Corrupted Image Modeling for Self-Supervised Visual Pre-Training
Are Large-scale Datasets Necessary for Self-Supervised Pre-training?
On Data Scaling in Masked Image Modeling
Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains
MultiMAE: Multi-modal Multi-task Masked Autoencoders
Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers
Masked frequency modeling for self-supervised visual pre-training
How to understand masked autoencoders?
Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks
MST: Masked Self-Supervised Transformer for Visual Representation
RePre: Improving Self-Supervised Vision Transformer with Reconstructive Pre-training
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation
Siamese Image Modeling for Self-Supervised Vision Representation Learning
iBOT: Image BERT Pre-Training with Online Tokenizer
Masked Autoencoders are Robust Data Augmentors
Self Pre-training with Masked Autoencoders for Medical Image Analysis
Masked Image Modeling Advances 3D Medical Image Analysis
Student Collaboration Improves Self-Supervised Learning: Dual-Loss Adaptive Masked Autoencoder for Brain Cell Image Analysis
Global Contrast Masked Autoencoders Are Powerful Pathological Representation Learners
Self-distillation Augmented Masked Autoencoders for Histopathological Image Classification
Masked Autoencoders Pre-training in Multiple Instance Learning for Whole Slide Image Classification
Masked Siamese ConvNets \

Beyond Images

Videos

BEVT: BERT Pretraining of Video Transformers
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
OmniMAE: Single Model Masked Pretraining on Images and Videos
Masked autoencoders as spatiotemporal learners
MaskViT: Masked Visual Pre-Training for Video Prediction \

Vision and Language

VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
An Empirical Study of Training End-to-End Vision-and-Language Transformers
Data Efficient Masked Language Modeling for Vision and Language
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
VL-BEiT: Generative Vision-Language Pretraining \

Point Clouds

Masked Autoencoders for Point Cloud Self-supervised Learning
Masked Discrimination for Self-Supervised Learning on Point Clouds
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds \

Graph

MGAE: Masked Autoencoders for Self-Supervised Learning on Graphs
Graph Masked Autoencoders with Transformers
GraphMAE: Self-Supervised Masked Graph Autoencoders
MaskGAE: Masked Graph Modeling Meets Graph Autoencoders
Masked Molecule Modeling: A New Paradigm of Molecular Representation Learning for Chemistry Understanding \

Audio

Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer
Masked Autoencoders that Listen Group masked autoencoder based density estimator for audio anomaly detection

Reinforcement Learning

Masked Visual Pre-training for Motor Control
Masked World Models for Visual Control \

Others

Time Series Generation with Masked Autoencoder
MET: Masked Encoding for Tabular Data

About

A curated list of awesome masked autoencoder in self-supervised learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published