[ICML 2026] Preserve-Then-Quantize: Balancing Rank Budgets for Quantization Error Reconstruction in LLMs
This is the official code for "Preserve-Then-Quantize: Balancing Rank Budgets for Quantization Error Reconstruction in LLMs".
TL;DR: SRR improves QER & QPEFT by splitting the rank budget between preserving dominant structure and reconstructing quantization error, guided by a principled, cheap selection criterion.
unzip icml_srr-main.zip
cd icml_srr-main
conda env create -f environment.yml
conda activate srr
pip install -r requirements.txtlm-eval-harness is installed automatically via pip install -r requirements.txt (lm_eval==0.4.7).
conda activate srr./experiments/ptq/run_srr_3bit_32rank.sh./experiments/ptq/run_srr_3bit_64rank.shBy default, this runs PTQ with SRR using qera-exact scaling on the LLaMA-2 7B model.
- Select GPU: Edit
export CUDA_VISIBLE_DEVICES=0in the.shscripts to choose the GPU ID. - Enable Zero-Shot Evaluation: Remove
--disable-lm-evalfrom the default settings. - Randomized SVD: Add
--apply-rand-svdto usetorch.svd_lowrankinstead of full SVD during SRR initialization. This speeds up the SVD computation for large weight matrices with minimal accuracy loss. Only applicable whenlr_initializeris set tosrrin the config.
Results are saved automatically to the ./checkpoints directory.
conda activate srrNavigate to the specific task directory you want to run. For example, to run a GLUE task:
cd experiments/qpeft/glueThen execute:
./srr.shBy default, this runs QPEFT with SRR using 4-bit MXINT quantization.
- GLUE: Adjust
task_listinsrr.shandlearning_rate_listinadapt_and_glue_train.sh. - GSM8K and SlimPajama: Select
modelandquant_bitsinsrr.sh, then modifylearning_rate_listin the corresponding training script (adapt_and_gsm8k_train.shoradapt_and_clm_train.sh).
Experiment results are saved in the ./checkpoints directory.
This codebase is built on top of QERA.
@inproceedings{cho2026preserve,
title={Preserve-Then-Quantize: Balancing Rank Budgets for Quantization Error Reconstruction in LLMs},
author={Cho, Yoonjun and Jeon, Dongjae and Kim, Soeun and Jeon, Moongyu and No, Albert},
booktitle={International Conference on Machine Learning},
year={2026}
}