Skip to content

cyoonjun/srr

Repository files navigation

[ICML 2026] Preserve-Then-Quantize: Balancing Rank Budgets for Quantization Error Reconstruction in LLMs

This is the official code for "Preserve-Then-Quantize: Balancing Rank Budgets for Quantization Error Reconstruction in LLMs".

Project page arXiv Lab

SRR main figure

TL;DR: SRR improves QER & QPEFT by splitting the rank budget between preserving dominant structure and reconstructing quantization error, guided by a principled, cheap selection criterion.

Env Setup

unzip icml_srr-main.zip
cd icml_srr-main
conda env create -f environment.yml
conda activate srr
pip install -r requirements.txt

lm-eval-harness is installed automatically via pip install -r requirements.txt (lm_eval==0.4.7).


PTQ with SRR

1. Activate Conda Environment

conda activate srr

2. Run the PTQ Script

./experiments/ptq/run_srr_3bit_32rank.sh
./experiments/ptq/run_srr_3bit_64rank.sh

By default, this runs PTQ with SRR using qera-exact scaling on the LLaMA-2 7B model.

3. Optional Configurations

  • Select GPU: Edit export CUDA_VISIBLE_DEVICES=0 in the .sh scripts to choose the GPU ID.
  • Enable Zero-Shot Evaluation: Remove --disable-lm-eval from the default settings.
  • Randomized SVD: Add --apply-rand-svd to use torch.svd_lowrank instead of full SVD during SRR initialization. This speeds up the SVD computation for large weight matrices with minimal accuracy loss. Only applicable when lr_initializer is set to srr in the config.

4. Check Results

Results are saved automatically to the ./checkpoints directory.


QPEFT with SRR

1. Activate Conda Environment

conda activate srr

2. Run the QPEFT Script

Navigate to the specific task directory you want to run. For example, to run a GLUE task:

cd experiments/qpeft/glue

Then execute:

./srr.sh

By default, this runs QPEFT with SRR using 4-bit MXINT quantization.

3. Optional Configurations

  • GLUE: Adjust task_list in srr.sh and learning_rate_list in adapt_and_glue_train.sh.
  • GSM8K and SlimPajama: Select model and quant_bits in srr.sh, then modify learning_rate_list in the corresponding training script (adapt_and_gsm8k_train.sh or adapt_and_clm_train.sh).

4. Check Results

Experiment results are saved in the ./checkpoints directory.


Acknowledgement

This codebase is built on top of QERA.


BibTeX

@inproceedings{cho2026preserve,
  title={Preserve-Then-Quantize: Balancing Rank Budgets for Quantization Error Reconstruction in LLMs},
  author={Cho, Yoonjun and Jeon, Dongjae and Kim, Soeun and Jeon, Moongyu and No, Albert},
  booktitle={International Conference on Machine Learning},
  year={2026}
}

About

Official code for "Preserve-Then-Quantize: Balancing Rank Budgets for Quantization Error Reconstruction in LLMs"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors