Delta-K: Group-Size-Robust KV Cache Quantization via Closed-Loop Differential Encoding

Delta-K replaces per-channel Key quantization with closed-loop differential encoding, exploiting the DC-dominated spectrum of Key activations to enable per-token 2-bit quantization without grouped channel statistics.

Key Results

Core Finding: Group Size Robustness

Delta-K's PPL is invariant to group size, while KIVI degrades sharply. Crossover at G=64.

Table 2: Group size sweep (Qwen2.5-7B, WikiText-2, 2-bit K+V, online eval, steady-state PPL)

G	Delta-K PPL	Delta-K ΔPPL	K bpv	KIVI PPL	KIVI ΔPPL	K bpv	Winner
32	6.730	+0.677	2.56	6.548	+0.494	3.00	KIVI
64	6.727	+0.674	2.28	6.728	+0.675	2.50	Tie
128	6.700	+0.647	2.14	6.870	+0.817	2.25	Delta-K
256	6.598	+0.545	2.07	7.068	+1.015	2.13	Delta-K

Literature: KIVI 2-bit ΔPPL = +0.89 on Qwen2.5-7B (AQUA-KV, ICML 2025, Table 1)

DPCM Eliminates Error Accumulation

Table 3: Ablation study (Qwen2.5-14B, K-only, G=32, attention output CosSim)

Step	Configuration	CosSim	Δ
A	Open-loop, 3-level symmetric	0.419	baseline
B	+ Closed-loop (DPCM)	0.884	+0.464 (82%)
C	+ 4-level symmetric	0.976	+0.092
D	+ Non-uniform codebook	0.985	+0.013
—	KIVI per-channel 2-bit	0.764	—

PPL vs Compression Rate (G=32)

Downstream Tasks

Table 4: LongBench (Qwen2.5-7B-Instruct, G=128, 2-bit K+V, TREC excluded)

Method	TriviaQA (F1)	SAMSum (ROUGE-L)	Avg. Δ
FP16	79.87	34.68	—
Delta-K G=128	80.30 (+0.43)	34.74 (+0.06)	+0.25
KIVI g=128	75.84 (-4.03)	33.10 (-1.58)	-2.81

Table 5: GSM8K — Limitation (Qwen2.5-7B, 2-bit K+V, 300 samples)

Method	Accuracy	ΔAcc	KV bpv
FP16	82.0%	—	16.0
KIVI g=32	77.0%	-5.0%	2.63
KIVI g=128	74.0%	-8.0%	2.25
Delta-K G=32	55.7%	-26.3%	2.40
Delta-K G=128	57.3%	-24.7%	2.40
Delta-K G=256	58.3%	-23.7%	2.40

DPCM's temporally correlated errors harm chain-of-thought reasoning. Delta-K is best suited for long-context understanding rather than precise multi-step tasks.

Experiment Journey

Method

Delta-K DPCM Encoding

For each group of G tokens in the K cache:

Anchor: First token stored at FP16 (lossless)
DPCM residuals: r[t] = K[t] - K_hat[t-1] (closed-loop)
Quantization: Residuals quantized with learned 4-level non-uniform codebook
Reconstruction: K_hat[t] = K_hat[t-1] + Q(r[t])

Reconstruction error at each step equals single-step quantization noise — it does not accumulate.

Why Delta-K is Robust to Group Size

KIVI (per-channel grouped): Larger groups → more outliers per group → scale forced larger → precision loss
Delta-K DPCM: Group size only affects anchor overhead. DPCM residuals are independent of G.

Storage Efficiency

bpv_DK = 2 + 14/G + 16(G-1)/(G·d)

At G=128, d=128: bpv = 2.14 (vs KIVI's 2.25 with asymmetric metadata).

Repository Structure

├── README.md
├── scripts/                         # All experiment code
│   ├── analyze.py                   # KV cache frequency analysis
│   ├── kv_cache_freq.py             # KV cache spectral analysis
│   ├── kv_sensitivity.py            # K vs V sensitivity comparison
│   ├── neuron_pruning_exp.py        # FFN neuron pruning (negative result)
│   ├── delta_k_validation.py        # Delta-K v1 open-loop (1.5B)
│   ├── delta_k_v2_revised.py        # Delta-K v2 DPCM smoke test (14B)
│   ├── delta_k_v3_ppl.py            # PPL evaluation K-only (14B)
│   ├── delta_k_v4_7b.py             # K+V prefix PPL (7B)
│   ├── delta_k_v4_2_online.py       # Online full-sequence PPL (7B)
│   ├── delta_k_v4_2_g128.py         # KIVI g128/g256 supplement
│   ├── delta_k_v4_2_gsweep.py       # Group size sweep
│   ├── Delta_k_longbench.py         # LongBench eval (includes Triton kernel)
│   ├── delta_k_longbench_final.py   # LongBench final version
│   ├── delta_k_gsm8k.py             # GSM8K evaluation
│   ├── delta_k_gsm8k_300.py         # GSM8K 300-sample version
│   └── benchmark_latency.py         # Quantization latency benchmark
├── experiments/                     # Results by experiment
│   ├── 01_depth_axis_freq/
│   ├── 02_neuron_pruning/
│   ├── 03_kv_cache_freq/
│   ├── 04_kv_sensitivity/
│   ├── 05_delta_k_v1_1.5B/
│   ├── 06_delta_k_v2_14B/
│   ├── 07_delta_k_v3_ppl_14B/
│   ├── 08_delta_k_v4_kv_7B/
│   ├── 09_delta_k_v4_2_online/
│   ├── 10_gsweep/
│   └── 11_gsm8k/
└── plots/                           # Figures and tables
    ├── fig1_group_size_sweep.png
    ├── fig2_ppl_vs_bpv.png
    ├── fig3_error_accumulation.png
    ├── fig4_experiment_journey.png
    └── tables/
        ├── table1_ppl_main.png
        ├── table2_longbench.png
        ├── table3_gsm8k.png
        └── table4_ablation.png

Hardware

All experiments on NEU Explorer HPC cluster:

Qwen2.5-1.5B: V100 PCIe 32GB
Qwen2.5-7B/14B: A100 80GB
Framework: PyTorch 2.9.1, Transformers 4.57.6

Citation

@article{pan2025deltak,
  title={Delta-K: Group-Size-Robust KV Cache Quantization via Closed-Loop Differential Encoding},
  author={Pan, Zhiyuan},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Delta-K: Group-Size-Robust KV Cache Quantization via Closed-Loop Differential Encoding

Key Results

Core Finding: Group Size Robustness

DPCM Eliminates Error Accumulation

PPL vs Compression Rate (G=32)

Downstream Tasks

Experiment Journey

Method

Delta-K DPCM Encoding

Why Delta-K is Robust to Group Size

Storage Efficiency

Repository Structure

Hardware

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
experiments		experiments
plots		plots
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Delta-K: Group-Size-Robust KV Cache Quantization via Closed-Loop Differential Encoding

Key Results

Core Finding: Group Size Robustness

DPCM Eliminates Error Accumulation

PPL vs Compression Rate (G=32)

Downstream Tasks

Experiment Journey

Method

Delta-K DPCM Encoding

Why Delta-K is Robust to Group Size

Storage Efficiency

Repository Structure

Hardware

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages