This repository contains the official PyTorch implementation for the CVPR2025 paper "CacheQuant: Comprehensively Accelerated Diffusion Models". CacheQuant introduces a novel training-free paradigm that comprehensively accelerates diffusion models at both temporal and structural levels. The DPS selects the optimal cache schedule to minimize errors caused by caching and quantization. The DEC further mitigates the coupled and accumulated errors based on the strong correlation of feature maps.
Create and activate a suitable conda environment named cachequant by using the following command:
cd CacheQuant
conda env create -f environment.yaml
conda activate cachequantPre-trained models for DDPM are automatically downloaded by the code. For LDM and Stable Diffusion experiments, download relevant pre-trained models to mainldm/models/ldm following the instructions in the latent-diffusion and stable-diffusion repos.
Please download all original datasets used for evaluation from each dataset’s official website. We provide prompts for Stable Diffusion in mainldm/prompt.
- Obtain DPS and Calibration
python ./mainldm/sample_cachequant_imagenet_cali.py- Get Cached Features
python ./mainldm/sample_cachequant_imagenet_predadd.py- Calculate DEC for Cache
python ./err_add/imagenet/cache_draw.py --error cache- Get Quantized Parameters
python ./mainldm/sample_cachequant_imagenet_params.py- Calculate DEC for Quantization
python ./err_add/imagenet/cache_draw.py --error quant- Acceleration and Sample
python ./mainldm/sample_cachequant_imagenet_quant.py <--recon>The --recon to use reconstruction.
This work is built upon EDA-DM as the baseline. The repo provides code for all experiments. We use the LDM-4 on ImageNet as an example to illustrate the usage. Other experiments are implemented similarly. Our experiments are aligned with Deepcache: non-uniform caching is used for stable-diffusion, and for other models only when intervel is greater than 10. We use the guided-diffusion and clip-score to evaluate results. The accelerated diffusion models are deployed by utilizing CUTLASS and torch_quantizer.
If you find this work useful in your research, please consider citing our paper:
@inproceedings{liu2025cachequant,
title={Cachequant: Comprehensively accelerated diffusion models},
author={Liu, Xuewen and Li, Zhikai and Gu, Qingyi},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={23269--23280},
year={2025}
}
@article{liu2024enhanced,
title={Enhanced distribution alignment for post-training quantization of diffusion models},
author={Liu, Xuewen and Li, Zhikai and Xiao, Junrui and Gu, Qingyi},
journal={arXiv e-prints},
pages={arXiv--2401},
year={2024}
}


