Learning from Semantic Dictionaries (LEASE)

Codebase for LEASE — Learning From Semantic Dictionaries, a generative pre-training method for Vision Transformers based on joint Codebook learning (forthcoming in CVPR 2026)! Based on MAGE codebase, this repository includes all required code to pretrain and evaluate either Sorcen (Echo Contrast based model) and LEASE. Feel free to fork!

Models provided

Model	Description	Checkpoint
MAGE	Token reconstruction based unified architecture. Refer to the original repository!	🔗
Sorcen	Unified architecture which creates its own positive contrastive pairs during training	🔗
LEASE	Joint codebook training for efficient representation learning and image synthesis	🔗

All backbones are ViT-Base transformer encoders.

Results

Model	IN-1K LP	FID (uncond.)	IS
Sorcen	75.1%	9.61	90.96
MAGE	74.7%	11.1	81.17
MAGE†	75.0%	10.88	81.59
LEASE	76.7%	9.62	91.78

† MAGE results from reproduced from its original checkpoint.

Setup

Dependencies in requirements.txt. You also need two files at the repository root:

File	Description
`vqgan_jax_strongaug.ckpt`	VQGAN tokenizer weights (from MAGE)
`km_16k.npy`	16k-entry semantic codebook (from DiGIT)

Data

Both methods work with precomputed image tokens. Training expects a pre-tokenized .pt file under token_datasets/. The file must contain:

Key	Shape	Description
`tokens_vqgan`	`(N, 256)`	VQ-GAN (or a generative tokenizer) patch token indices
`labels`	`(N,)`	Class labels
`tokens_dino`	`(N, 256)`	Semantic dictionary tokens (required for LEASE, optional for Sorcen)

You can download precomputed IN-1k training set here :)

Training

LEASE — full run (ImageNet-1K, 1600 epochs)

bash launch_scripts/launch_pretrain_lease.sh

Key hyperparameters:

Argument	Value
`--model`	`lease_vit_base_patch16_single`
`--method`	`lease`
`--epochs`	`1600`
`--warmup_epochs`	`40`
`--blr`	`1.5e-4`
`--weight_decay`	`0.05`
`--batch_size`	`64` (per GPU, 64 GPUs → effective 4096)
`--mask_ratio_min/max`	`0.5 / 1.0`
`--mask_ratio_mu/std`	`0.55 / 0.25`

Sorcen — full run (ImageNet-1K, 1600 epochs)

bash launch_scripts/launch_pretrain_sorcen.sh

Key hyperparameters:

Argument	Value
`--model`	`sorcen_vit_base_patch16_single`
`--method`	`sorcen`
`--epochs`	`1600`
`--warmup_epochs`	`40`
`--blr`	`1.5e-4`
`--weight_decay`	`0.05`
`--batch_size`	`128` (per GPU, 32 GPUs → effective 4096)
`--mask_ratio_min/max`	`0.5 / 1.0`
`--mask_ratio_mu/std`	`0.55 / 0.25`

Evaluation

Linear probing

bash launch_scripts/launch_linprobe_lease.sh   
bash launch_scripts/launch_linprobe_sorcen.sh

Image generation (unconditional)

bash launch_scripts/launch_gen_uncond_lease.sh   
bash launch_scripts/launch_gen_uncond_sorcen.sh

Both scripts copy the checkpoint, verify it with an md5 checksum, record provenance, and skip generation if output already exists. Generated images are saved to generation_results/<experiment_name>/.

Citation

To appear in CVPR'26 proceedings...

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
launch_scripts		launch_scripts
taming		taming
util		util
LEASE.png		LEASE.png
README.md		README.md
engine_finetune.py		engine_finetune.py
engine_pretrain.py		engine_pretrain.py
gen_img_unconditional_lease.py		gen_img_unconditional_lease.py
gen_img_unconditional_sorcen.py		gen_img_unconditional_sorcen.py
main_linprobe.py		main_linprobe.py
main_pretrain_tk_1600.py		main_pretrain_tk_1600.py
models_lease_tk_1600.py		models_lease_tk_1600.py
models_vit.py		models_vit.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning from Semantic Dictionaries (LEASE)

Models provided

Results

Setup

Data

Training

LEASE — full run (ImageNet-1K, 1600 epochs)

Sorcen — full run (ImageNet-1K, 1600 epochs)

Evaluation

Linear probing

Image generation (unconditional)

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Learning from Semantic Dictionaries (LEASE)

Models provided

Results

Setup

Data

Training

LEASE — full run (ImageNet-1K, 1600 epochs)

Sorcen — full run (ImageNet-1K, 1600 epochs)

Evaluation

Linear probing

Image generation (unconditional)

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages