ROPIM: Pre-training with Random Orthogonal Projection Image Modeling

[ICLR 2024 Spotlight] The official repository of Self-Supervised Learning methods "ROPIM", Pre-training with Random Orthogonal Projection Image Modeling

ROPIM is a self-supervised learning technique based on count sketching, which reduces local semantic information under the bounded noise variance. While Masked Image Modelling (MIM) introduces Binary noise, ROPIM proposes a continous masking strategy. Continuous masking allows for larger number of masking patterns compared to binary masking.

Citation

If you find our work useful for your research, please consider giving a star ⭐ and citation 🍺:

@inproceedings{
haghighat2024ROPIM,
title={Pre-training with Random Orthogonal Projection Image Modeling},
author={Maryam Haghighat and Peyman Moghadam and Shaheer Mohamed and Piotr Koniusz},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=z4Hcegjzph}
}

Usage

Setup conda environment and install required packages:

# Create environment
conda create -n ropim python=3.8 -y
conda activate ropim

# Install requirements
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge

# Clone ROPIM repo
git clone https://github.com/csiro-robotics/ROPIM
cd ROPIM

# Install other requirements
pip install -r requirements.txt

Pre-training with ROPIM

For pre-training models with ROPIM, run:

python3 -m torch.distributed.launch --nnodes <num-of-nodes> --nproc_per_node <num-of-gpus-per-node> --node_rank <node-rank> --master_addr <hostname> \
main_ropim.py --world_size <total-num-of-gpus> \ 
--cfg <config-file> --data-path <imagenet-path>/train [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag> --spatial_sketching_threshold <threshol-for-sketching_ratio>]

Fine-tuning pre-trained models

For fine-tuning models pre-trained by ROPIM, run:

python3 -m torch.distributed.launch --nnodes <num-of-nodes> --nproc_per_node <num-of-gpus-per-node> --node_rank <node-rank> --master_addr <hostname> \
main_finetune.py --world_size <total-num-of-gpus> \ 
--cfg <config-file> --data-path <imagenet-path> --pretrained <pretrained-ckpt> [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]

Acknowledgement

This research was funded by the Machine Learning and Artificial Intelligence Future Science Platform (MLAI FSP) and Science Digital at the Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia.

This code is built using the timm library, the BEiT repository and the SimMIM repository.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
configs		configs
data		data
figures		figures
models		models
LICENSE		LICENSE
README.md		README.md
config.py		config.py
logger.py		logger.py
lr_scheduler.py		lr_scheduler.py
main_finetune.py		main_finetune.py
main_ropim.py		main_ropim.py
optimizer.py		optimizer.py
requirements.txt		requirements.txt
utils.py		utils.py

License

csiro-robotics/ROPIM

Folders and files

Latest commit

History

Repository files navigation

ROPIM: Pre-training with Random Orthogonal Projection Image Modeling

Citation

Usage

Pre-training with ROPIM

Fine-tuning pre-trained models

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Languages