[ICLR 2024]Label-focused Latent-object Biasing (LLB) Implementation

This repository contains implementation of the paper Label-Focused Inductive Bias over Latent Object Features in Visual Classification published on ICLR 2024

We use pytorch Multi-processing Distributed Data Parallel Training for training Label-focused Latent-object Biasing (LLB) method

LLB takes visual features from Vision Transformer (ViT) and proceeds the following steps,

First, learns intermediate latent object features in an unsupervised manner,
decouples their visual dependencies by assigning new independent embedding parameters,
it captures structured features optimized for the original classification task,
it integrates the structured features with the original visual features for final, prediction

Dataset

ImageNet
Places365
iNaturalist2018

Backbone

ViT (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale) [paper] [git]
MAE (Masked Autoencoders Are Scalable Vision Learners) [paper] [git]
SWAG (Revisiting Weakly Supervised Pre-Training of Visual Perception Models) [paper] [git]

Requirements

Pre-trained weights of different backbones (check the link at the Backbone section)
Save path to the pre-trained weights in /models/__inti__.py
Also, save path to the data in /utils/general.py

Environment

torch==1.12.0
torchvision==0.13.0
timm==0.9.2
numpy==1.21.5
A100 GPUs * 8

Train

To pre-train ViT-Base with multi-processing distributed training, run the following codes.

torchrun --nproc_per_node=4 train.py \
--amp \
--seed 123 \
--save \
--method timm_augreg_in21k_ft_in1k \
--encoder ViT \
--vit_size Base \
--transfer \
--freeze \
--alpha 0.8 \
--num_nvit_layers 6 \
--target_layer 11 \
--object_size 2048 \
--dataset ImageNet \
--batch_size 128 \
--epochs 70 \
--lr_scheduler CosineAnnealingLR \
--opt Adam

We run the model in multi-GPUs using multi-processing distributed using pytorch native Distributed Data Parallel (DDP)
Set --dataset to imageet if train LLB with ImageNet1K dataset
- For others, use places365 or inaturalist2018
To use ImageNet21K pre-trained ViT for backbone, use 'timm_augreg_in21k_ft_in1k' for --method
- For SWAG, use swag_ig_ft_plc365, swag_ig_ft_in1k, swag_ig_ft_inat18 for --method
- for MAE, use mae_in1k_ft_in1k, mae_in1k_ft_plc365, mae_in1k_ft_inat18 for --method
Use to --transfer and --freeze to load and freeze the pre-trained backbone weights.
Set the number of LLB layer with --num_nvit_layers
Set the visual feature layer of backbone with --target_layer
Set the number of latent object of LLB with --object_size

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
core		core
dataset		dataset
images		images
models		models
utils		utils
README.md		README.md
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICLR 2024]Label-focused Latent-object Biasing (LLB) Implementation

Dataset

Backbone

Requirements

Environment

Train

About

Releases

Packages

Languages

GIST-IRR/LLB

Folders and files

Latest commit

History

Repository files navigation

[ICLR 2024]Label-focused Latent-object Biasing (LLB) Implementation

Dataset

Backbone

Requirements

Environment

Train

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages