Compaction

Code for Fast KV Compaction via Attention Matching. Attention Matching (AM) compacts a KV cache in latent space by constructing a smaller set of keys and values that reproduce the original attention behavior.

Repository Layout

Path	Purpose
`compaction/`	Core methods (`compaction_methods/`, `algorithms/`), chunking strategies, and query generation helpers.
`evaluation/`	`run_qa_evaluation.py`, `run_reasoning_evaluation.py`, configs, and scoring utilities.
`scripts/`	Shell entry points for main experiments plus plotting and aggregation helpers.
`head_budget_optimization/`	Tools for computing nonuniform head budgets for models.
`models/`	Model-specific generation and caching utilities (Qwen3, Llama, Gemma3).
`examples/`	Demo scripts.
`data/`	Dataset and cached reference generation artifacts.

Try Compaction

python -m examples.qa_demo --model Qwen/Qwen3-4B --target-size 0.1

This prefills a short article, compacts its KV cache to 10% with Attention Matching, and compares QA accuracy before and after. See examples/qa_demo.py for details.

Evaluate QA Tasks

Run the evaluator with one or more compaction methods and datasets:

python -m evaluation.run_qa_evaluation \
  --algorithm-config default \
  --methods original AM-HighestAttnKeys \
  --dataset-name quality \
  --n-articles 1 \
  --compute-stats 1

For non-uniform budgets, point --precomputed-budget-path to one of the JSON files under head_budget_optimization/head_budgets/.

Development Status

We might continue developing this into a more polished, installable package and are open to contributions. If you're interested in collaborating, feel free to open an issue or PR. See TODO for current plans.

Citation

If you found this work useful, please cite:

@misc{zweiger2026fastkvcompactionattention,
      title={Fast {KV} Compaction via {Attention Matching}}, 
      author={Adam Zweiger and Xinghong Fu and Han Guo and Yoon Kim},
      year={2026},
      eprint={2602.16284},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.16284}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compaction

Repository Layout

Try Compaction

Evaluate QA Tasks

Development Status

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
compaction		compaction
data		data
evaluation		evaluation
examples		examples
head_budget_optimization		head_budget_optimization
models		models
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TODO		TODO
attention-matching.png		attention-matching.png
requirements.txt		requirements.txt

License

adamzweiger/compaction

Folders and files

Latest commit

History

Repository files navigation

Compaction

Repository Layout

Try Compaction

Evaluate QA Tasks

Development Status

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages