[ICLR 2024] Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching

Project Page | Paper | Distilled Datasets

To achieve lossless dataset distillation, an intuitive idea is to increase the size of the synthetic dataset. However, previous dataset distillation methods tend to perform worse than random selection as IPC (i.e., data keep ratio) increases.

To address this issue, we find the difficulty of the generated patterns should be aligned with the size of the synthetic dataset (avoid generating patterns that are too easy or too difficult).

By doing so, our method remains effective in high IPC cases and achieves lossless dataset distillation for the very first time. What do easy patterns and hard patterns look like?

Getting Started

Create environment as follows

conda env create -f environment.yaml
conda activate distillation

Generate expert trajectories

cd buffer
python buffer_FTD.py --dataset=CIFAR10 --model=ConvNet --train_epochs=100 --num_experts=100 --zca --buffer_path=../buffer_storage/ --data_path=../dataset/ --rho_max=0.01 --rho_min=0.01 --alpha=0.3 --lr_teacher=0.01 --mom=0. --batch_train=256

Perform the distillation

cd distill
python DATM.py --cfg ../configs/xxxx.yaml

Evaluation

We provide a simple script for evaluating the distilled datasets.

cd distill
python evaluation.py --lr_dir=path_to_lr --data_dir=path_to_images --label_dir=path_to_labels --zca

Acknowledgement

Our code is built upon MTT and FTD.

Citation

If you find our code useful for your research, please cite our paper.

@inproceedings{guo2024lossless,
      title={Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching}, 
      author={Ziyao Guo and Kai Wang and George Cazenavette and Hui Li and Kaipeng Zhang and Yang You},
      year={2024},
      booktitle={The Twelfth International Conference on Learning Representations}
}

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
buffer		buffer
configs		configs
distill		distill
figures		figures
utils		utils
README.md		README.md
environment.yaml		environment.yaml
networks.py		networks.py
reparam_module.py		reparam_module.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

buffer

buffer

configs

configs

distill

distill

figures

figures

utils

utils

README.md

README.md

environment.yaml

environment.yaml

networks.py

networks.py

reparam_module.py

reparam_module.py

Repository files navigation

[ICLR 2024] Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching

Project Page | Paper | Distilled Datasets

Getting Started

Evaluation

Acknowledgement

Citation

About

Contributors 3

Languages

NUS-HPC-AI-Lab/DATM

Folders and files

Latest commit

History

Repository files navigation

[ICLR 2024] Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching

Project Page | Paper | Distilled Datasets

Getting Started

Evaluation

Acknowledgement

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages