This repository contains implementation of the paper Label-Focused Inductive Bias over Latent Object Features in Visual Classification published on ICLR 2024
We use pytorch Multi-processing Distributed Data Parallel Training for training Label-focused Latent-object Biasing (LLB) method
LLB takes visual features from Vision Transformer (ViT) and proceeds the following steps,
- First, learns intermediate latent object features in an unsupervised manner,
- decouples their visual dependencies by assigning new independent embedding parameters,
- it captures structured features optimized for the original classification task,
- it integrates the structured features with the original visual features for final, prediction
- ImageNet
- Places365
- iNaturalist2018
- ViT (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale) [paper] [git]
- MAE (Masked Autoencoders Are Scalable Vision Learners) [paper] [git]
- SWAG (Revisiting Weakly Supervised Pre-Training of Visual Perception Models) [paper] [git]
- Pre-trained weights of different backbones (check the link at the Backbone section)
- Save path to the pre-trained weights in
/models/__inti__.py
- Also, save path to the data in
/utils/general.py
- torch==1.12.0
- torchvision==0.13.0
- timm==0.9.2
- numpy==1.21.5
- A100 GPUs * 8
To pre-train ViT-Base with multi-processing distributed training, run the following codes.
torchrun --nproc_per_node=4 train.py \
--amp \
--seed 123 \
--save \
--method timm_augreg_in21k_ft_in1k \
--encoder ViT \
--vit_size Base \
--transfer \
--freeze \
--alpha 0.8 \
--num_nvit_layers 6 \
--target_layer 11 \
--object_size 2048 \
--dataset ImageNet \
--batch_size 128 \
--epochs 70 \
--lr_scheduler CosineAnnealingLR \
--opt Adam
- We run the model in multi-GPUs using multi-processing distributed using pytorch native Distributed Data Parallel (DDP)
- Set
--dataset
toimageet
if train LLB with ImageNet1K dataset- For others, use
places365
orinaturalist2018
- For others, use
- To use ImageNet21K pre-trained ViT for backbone, use 'timm_augreg_in21k_ft_in1k' for
--method
- For SWAG, use
swag_ig_ft_plc365
,swag_ig_ft_in1k
,swag_ig_ft_inat18
for--method
- for MAE, use
mae_in1k_ft_in1k
,mae_in1k_ft_plc365
,mae_in1k_ft_inat18
for--method
- For SWAG, use
- Use to
--transfer
and--freeze
to load and freeze the pre-trained backbone weights. - Set the number of LLB layer with
--num_nvit_layers
- Set the visual feature layer of backbone with
--target_layer
- Set the number of latent object of LLB with
--object_size