DCR-attention: Top-K Sparse Attention for Long-Context Decode on Llama-3.2-1B (v3 Release)
DCR-attention v3.0.0 — Initial public release
Top-K sparse attention for long-context decode on Llama-3.2-1B. Multi-seed
validated, pre-registered DESCRIPTIVE causal verdict, honest sub-parity
systems characterization.
Headline result
Multi-seed hero deployment point: N = 32,000 context, c = 0.15,
ΔPPL = +0.428% ± 0.096 pp (5 seeds; STRICT classification). Latency on
RTX 4060 Ti at hero point: 0.895× SDPA — HeroQualityOnly (quality
multi-seed validated, speedup partial).
What's included
- paper/ — v3 paper (PDF + LaTeX source, 31 pages, compiled clean)
- dcr_attention/ — M4 Triton kernel + reference + Llama integration
- tests/ — acceptance tests, integration tests, 6 causal-pilot iteration scripts
- scripts/ — analysis scripts (Wilcoxon, Mann-Whitney, dose-response), figure generators
- data/raw/ — curated per-seed measurement JSONs cited by paper
Key claims
- Empirical scaling law ΔPPL(N, c) = A(c) · N^(-(1-α_eff(c))), coverage-dependent
- Pre-registered matched-magnitude causal test: DESCRIPTIVE (Wilcoxon p=0.76,
Mann-Whitney U bias check p=0.27, generalizable within tested protocol) - Random-spectrum baseline locates α<1 as trained property (α_random=1.0000
vs α_trained=0.39 on 5 untrained-K seeds) - HeroQualityOnly mechanistically explained: Pass-3 dominance at large N×B
- ABKV architectural response demonstrated as synthetic-data feasible;
end-to-end speedup is explicit future work
Release context
v1.0 and v2.0 Zenodo DOIs were deleted as a sober reset prior to v3 — the
single-seed v2.0 numerical headline was reproduced exactly on seed 0 in
the multi-seed re-validation but sits at the high end of the seed
distribution; the 5-seed mean places it ~30× lower, consistent with a
tier-boundary measurement that benefits from multi-seed protocol.
License
Apache 2.0
Citation
See CITATION info on Zenodo (DOI badge will appear in README after this
release).