This repository extends SVD-LLM with Randomized SVD (RSVD) implementation for significantly faster compression of large language models.
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
Xin Wang, Yu Zheng, Zhongwei Wan, Mi Zhang
The Ohio State University, Michigan State UniversityInternational Conference on Learning Representations (ICLR) 2025
SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression
Xin Wang, Samiul Alam, Zhongwei Wan, Hui Shen, Mi Zhang
The Ohio State UniversityAnnual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL) 2025
RSVD provides significant speedup over standard SVD decomposition for large weight matrices while maintaining comparable model quality and perplexity.
- Faster Compression: RSVD uses randomized algorithms to approximate SVD, providing 1.3x-1.4x speedup for large matrices
- Memory Efficient: Computes only the top-k singular values/vectors needed for compression
- Comparable Quality: Maintains similar perplexity and model performance compared to standard SVD
- Configurable: Control the trade-off between speed and accuracy with
--rsvd_oversamplesand--rsvd_n_iterparameters
Comprehensive benchmarks comparing RSVD vs SVD across compression ratios (0.1-0.5) are available in benchmark_results_comparison/. Visualizations can be generated using scripts in the graphs/ folder:
perplexity_graph.py- Model perplexity comparisonefficiency_graph.py- Throughput comparisoncompression_speedup_graph.py- Compression time comparisonparameters_retained_graph.py- Parameter retention analysis
Important: Keep the transformers package at exactly version 4.35.2, as the compressed model structure has modifications in the component/ folder.
- Create a conda environment with Python 3.9:
conda create -n compress python=3.9
conda activate compress- Clone the repository:
git clone https://github.com/cangokmen/RSVD-LLM.git
cd RSVD-LLM- Install dependencies:
pip install -r requirements.txtbash compress_llama.shCompresses LLaMA-7B with 20% compression ratio using standard SVD and evaluates perplexity and efficiency.
# Set compression ratio (0.1 to 0.5)
export RATIO=0.2
# Run compression benchmark
bash compress_svd_vs_rsvd.sh
# Run evaluation benchmark
bash evaluate_svd_vs_rsvd.shThis benchmarks both RSVD and SVD compression methods, measuring compression time, perplexity, and throughput.
--rsvd_oversamples: Number of additional samples for improved accuracy (default: 10)--rsvd_n_iter: Number of power iterations for better approximation (default: 2)
For compression ratios ≤ 0.3, use truncation-aware data whitening:
python SVDLLM.py \
--step 1 \
--ratio 0.2 \
--model jeffwan/llama-7b-hf \
--whitening_nsamples 256 \
--dataset wikitext2 \
--seed 0 \
--model_seq_len 2048 \
--save_path ./compressed_models \
--rsvd_oversamples 10 \
--rsvd_n_iter 2Faster compression: Reduce --rsvd_n_iter to 1 for ~25% faster compression with minimal quality loss.
We first update the compressed weight matrix U and then V with LoRA fine-tuning.
python LoRA.py \
--prune_model COMPRESSED_MODEL_PATH \
--data_path yahma/alpaca-cleaned \
--output_dir LORA_OUTPUT_PATH \
--lora_r 8 \
--num_epochs 2 \
--learning_rate 1e-4 \
--batch_size 64
SVD-LLM can also be integrated with quantization methods to achieve a better compression. Here is the example of how to integrate SVD-LLM (20% compression ratio) with GPTQ-4bit to compress LLaMA-7B
bash svdllm_gptq.sh
Perplexity Evaluation:
python SVDLLM.py \
--step 4 \
--model_path ./compressed_models/your_model.ptDownload the c4 dataset from this link and place the JSON files in utils/.
Efficiency Evaluation:
python SVDLLM.py \
--step 5 \
--model_path ./compressed_models/your_model.ptGenerate comparison graphs from benchmark results:
cd graphs
python perplexity_graph.py
python efficiency_graph.py
python compression_speedup_graph.py
python parameters_retained_graph.pyRSVD-LLM/
├── SVDLLM.py # Main compression script
├── compress_svd_vs_rsvd.sh # Benchmark RSVD vs SVD compression
├── evaluate_svd_vs_rsvd.sh # Benchmark evaluation script
├── evaluater.py # Evaluation utilities
├── benchmark_results_comparison/ # Benchmark results (ratios 0.1-0.5)
├── graphs/ # Visualization scripts
│ ├── perplexity_graph.py
│ ├── efficiency_graph.py
│ ├── compression_speedup_graph.py
│ └── parameters_retained_graph.py
├── component/ # Modified model components
├── utils/ # Utilities and RSVD implementation
│ └── rsvd.py # Randomized SVD implementation
└── gptq/ # GPTQ integration
If you use this work, please cite the original SVD-LLM papers:
@inproceedings{wang2025svdllm,
title={{SVD}-{LLM}: Truncation-aware Singular Value Decomposition for Large Language Model Compression},
author={Xin Wang and Yu Zheng and Zhongwei Wan and Mi Zhang},
booktitle={International Conference on Learning Representations (ICLR)},
year={2025},
url={https://openreview.net/forum?id=LNYIUouhdt}
}This repository is based on SVD-LLM by the AIoT-MLSys Lab at The Ohio State University.