Name		Name	Last commit message	Last commit date
parent directory ..
*small_dataset		*small_dataset
img		img
recover		recover
relabel		relabel
validate		validate
README.md		README.md

README.md

SRe2L

Official PyTorch implementation of paper (NeurIPS 2023 spotlight):

"Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective"
Zeyuan Yin, Eric Xing, Zhiqiang Shen
MBZUAI, CMU

[Project Page] [Paper]

Catalog

Abstract

We present a new dataset condensation framework termed Squeeze (), Recover () and Relabel () (SRe²L) that decouples the bilevel optimization of model and synthetic data during training, to handle varying scales of datasets, model architectures and image resolutions for effective dataset condensation. The proposed method demonstrates flexibility across diverse dataset scales and exhibits multiple advantages in terms of arbitrary resolutions of synthesized images, low training cost and memory consumption with high-resolution training, and the ability to scale up to arbitrary evaluation network architectures. Extensive experiments are conducted on Tiny-ImageNet and full ImageNet-1K datasets. Under 50 IPC, our approach achieves the highest 42.5% and 60.8% validation accuracy on Tiny-ImageNet and ImageNet-1K, outperforming all previous state-of-the-art methods by margins of 14.5% and 32.9%, respectively. Our approach also outperforms MTT by approximately 52× (ConvNet-4) and 16× (ResNet-18) faster in speed with less memory consumption of 11.6× and 6.4× during data synthesis.

Distillation Animation

Kindly wait a few seconds for the animation visualizations to load.

Distilled ImageNet

Squeeze

For ImageNet-1K, we use the official PyTorch pre-trained models from Torchvision Model Zoo.
For Tiny-ImageNet-200, we adapt official Torchvision code to train the model from scratch. You can find the training code and checkpoints at tiny-imagenet.

Recover

More details in recover/README.md.

cd recover
sh recover.sh

Relabel

More details in relabel/README.md.

cd relabel
sh relabel.sh

Validate distilled dataset

We provide two kinds of validation code: FKD and Naive KD. FKD is the main validation code aligned with our paper for relabeled distilled images. Naive KD is an alternative validation code to quickly validate the performance of the distilled data without the relabel process. More details in validate/README.md.

cd validate
sh train_FKD.sh

Download

You can download distilled data and soft labels from .

dataset	resolution	iteration	IPC	files
ImageNet-1K	224x224	4K	50	images mixup labels / cutmix labels
ImageNet-1K	224x224	2K	50	images
ImageNet-1K	224x224	4K	200	images
Tiny-ImageNet-200	64x64	1K	50	images
Tiny-ImageNet-200	64x64	4K	100	images

Results

Our Top-1 accuracy (%) under different IPC settings on Tiny-ImageNet and ImageNet-1K datasets:

Citation

If you find our code useful for your research, please cite our paper.

@inproceedings{yin2023squeeze,
  title={Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective},
  author={Yin, Zeyuan and Xing, Eric and Shen, Zhiqiang},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SRe2L

SRe2L

*small_dataset

*small_dataset

img

img

recover

recover

relabel

relabel

validate

validate

README.md

README.md

README.md

SRe2L

Catalog

Abstract

Distillation Animation

Distilled ImageNet

Squeeze

Recover

Relabel

Validate distilled dataset

Download

Results

Citation

Files

SRe2L

Directory actions

More options

Directory actions

More options

Latest commit

History

SRe2L

Folders and files

parent directory

SRe2L

Catalog

Abstract

Distillation Animation

Distilled ImageNet

Squeeze

Recover

Relabel

Validate distilled dataset

Download

Results

Citation