Pre-release data drop for DCR-Attention v3.1. Scope memo + canonical
measurement artifacts. Full manuscript is a separate forthcoming release.

Hero result (N=32K, B=4, c=0.15):

M6 + M5-mixed: 187.29 ms — 1.061× over SDPA, 1.220× over M4 baseline
Parity crossing: M4 was 0.870× (sub-parity); v3.1 now above SDPA
Clean theoretical ceiling: 1.243×
Canonical protocol: 50 warmup, 30 timed, 3 randomized sessions (hero variance 0.098%)

Contents:

docs/paper_rewrite_scope_memo.md — paper scope, incl. retraction ledger (§5)
results/ — canonical measurements + falsification artifacts
8 characterized negative results (see README)

Note: this drop documents process, not finished paper. Two intermediate
findings (Pass-2 inflation, Pass-3 deflation) were retracted pre-publication
via canonical re-measurement — kept on record in §5 as a discipline ledger.

Env: Llama-3.2-1B · RTX 4060 Ti · torch 2.5.1+cu121 · triton 3.1.0 · seed 0

DCR-attention v3.0.0 — Initial public release

Top-K sparse attention for long-context decode on Llama-3.2-1B. Multi-seed
validated, pre-registered DESCRIPTIVE causal verdict, honest sub-parity
systems characterization.

Headline result

Multi-seed hero deployment point: N = 32,000 context, c = 0.15,
ΔPPL = +0.428% ± 0.096 pp (5 seeds; STRICT classification). Latency on
RTX 4060 Ti at hero point: 0.895× SDPA — HeroQualityOnly (quality
multi-seed validated, speedup partial).

What's included

paper/ — v3 paper (PDF + LaTeX source, 31 pages, compiled clean)
dcr_attention/ — M4 Triton kernel + reference + Llama integration
tests/ — acceptance tests, integration tests, 6 causal-pilot iteration scripts
scripts/ — analysis scripts (Wilcoxon, Mann-Whitney, dose-response), figure generators
data/raw/ — curated per-seed measurement JSONs cited by paper

Key claims

Empirical scaling law ΔPPL(N, c) = A(c) · N^(-(1-α_eff(c))), coverage-dependent
Pre-registered matched-magnitude causal test: DESCRIPTIVE (Wilcoxon p=0.76,
Mann-Whitney U bias check p=0.27, generalizable within tested protocol)
Random-spectrum baseline locates α<1 as trained property (α_random=1.0000
vs α_trained=0.39 on 5 untrained-K seeds)
HeroQualityOnly mechanistically explained: Pass-3 dominance at large N×B
ABKV architectural response demonstrated as synthetic-data feasible;
end-to-end speedup is explicit future work

Release context

v1.0 and v2.0 Zenodo DOIs were deleted as a sober reset prior to v3 — the
single-seed v2.0 numerical headline was reproduced exactly on seed 0 in
the multi-seed re-validation but sits at the high end of the seed
distribution; the 5-seed mean places it ~30× lower, consistent with a
tier-boundary measurement that benefits from multi-seed protocol.

License

Apache 2.0

Citation

See CITATION info on Zenodo (DOI badge will appear in README after this
release).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

DCR-attention v3.0.0 — Initial public release

Headline result

What's included

Key claims

Release context

License

Citation

Uh oh!

Releases: Seqev/dcr-attention_v3

v3.1-data — Scope memo + canonical measurement artifacts

Uh oh!

DCR-attention: Top-K Sparse Attention for Long-Context Decode on Llama-3.2-1B (v3 Release)

DCR-attention v3.0.0 — Initial public release

Headline result

What's included

Key claims

Release context

License

Citation

Uh oh!