Releases: Seqev/dcr-attention_v3
v3.1-data — Scope memo + canonical measurement artifacts
Pre-release data drop for DCR-Attention v3.1. Scope memo + canonical
measurement artifacts. Full manuscript is a separate forthcoming release.
Hero result (N=32K, B=4, c=0.15):
- M6 + M5-mixed: 187.29 ms — 1.061× over SDPA, 1.220× over M4 baseline
- Parity crossing: M4 was 0.870× (sub-parity); v3.1 now above SDPA
- Clean theoretical ceiling: 1.243×
- Canonical protocol: 50 warmup, 30 timed, 3 randomized sessions (hero variance 0.098%)
Contents:
- docs/paper_rewrite_scope_memo.md — paper scope, incl. retraction ledger (§5)
- results/ — canonical measurements + falsification artifacts
- 8 characterized negative results (see README)
Note: this drop documents process, not finished paper. Two intermediate
findings (Pass-2 inflation, Pass-3 deflation) were retracted pre-publication
via canonical re-measurement — kept on record in §5 as a discipline ledger.
Env: Llama-3.2-1B · RTX 4060 Ti · torch 2.5.1+cu121 · triton 3.1.0 · seed 0
DCR-attention: Top-K Sparse Attention for Long-Context Decode on Llama-3.2-1B (v3 Release)
DCR-attention v3.0.0 — Initial public release
Top-K sparse attention for long-context decode on Llama-3.2-1B. Multi-seed
validated, pre-registered DESCRIPTIVE causal verdict, honest sub-parity
systems characterization.
Headline result
Multi-seed hero deployment point: N = 32,000 context, c = 0.15,
ΔPPL = +0.428% ± 0.096 pp (5 seeds; STRICT classification). Latency on
RTX 4060 Ti at hero point: 0.895× SDPA — HeroQualityOnly (quality
multi-seed validated, speedup partial).
What's included
- paper/ — v3 paper (PDF + LaTeX source, 31 pages, compiled clean)
- dcr_attention/ — M4 Triton kernel + reference + Llama integration
- tests/ — acceptance tests, integration tests, 6 causal-pilot iteration scripts
- scripts/ — analysis scripts (Wilcoxon, Mann-Whitney, dose-response), figure generators
- data/raw/ — curated per-seed measurement JSONs cited by paper
Key claims
- Empirical scaling law ΔPPL(N, c) = A(c) · N^(-(1-α_eff(c))), coverage-dependent
- Pre-registered matched-magnitude causal test: DESCRIPTIVE (Wilcoxon p=0.76,
Mann-Whitney U bias check p=0.27, generalizable within tested protocol) - Random-spectrum baseline locates α<1 as trained property (α_random=1.0000
vs α_trained=0.39 on 5 untrained-K seeds) - HeroQualityOnly mechanistically explained: Pass-3 dominance at large N×B
- ABKV architectural response demonstrated as synthetic-data feasible;
end-to-end speedup is explicit future work
Release context
v1.0 and v2.0 Zenodo DOIs were deleted as a sober reset prior to v3 — the
single-seed v2.0 numerical headline was reproduced exactly on seed 0 in
the multi-seed re-validation but sits at the high end of the seed
distribution; the 5-seed mean places it ~30× lower, consistent with a
tier-boundary measurement that benefits from multi-seed protocol.
License
Apache 2.0
Citation
See CITATION info on Zenodo (DOI badge will appear in README after this
release).