docs: codec invariants + experiment catalogue (session-end déjà-vu) by AdaWorldAPI · Pull Request #186 · AdaWorldAPI/lance-graph

AdaWorldAPI · 2026-04-15T12:47:25Z

Summary

Session-end catalogue of every compression approach tried in PRs #176–#185 plus the lesson each produced. Written for future-session déjà-vu — every failed experiment is kept with a "mutation hook" explaining how it could evolve into something that works.

Single file: docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md (231 lines).

Why

The failures this session were informative:

perf(tts_rvq_e2e): hierarchical CLAM 256×256 for vocab tensors + docs + F32x16 rms_norm #177 → fix: passthrough BF16 for vocab tensors + Lance upgrade roadmap + WAV validity test #178: Hierarchical CLAM refuted (cos 0.0046)
feat(examples): universal_hhtld_encode — model-generic encoder with SlotL dispatch #183: Base17 centroid reconstruction refuted (cos 0.04 on Qwen3)
feat(bgz-tensor): HhtlF32Tensor codec + Path A encoder + Path B argmax-parity probe #184 Path A: f32 palette + SlotL — 10× better but still short (ρ̄ 0.2–0.5)
feat(bgz-tensor): HhtlF32Tensor codec + Path A encoder + Path B argmax-parity probe #184 Path B: cascade attention probe — 3.71% argmax agreement, revealing Path B depends on Path A's palette substrate

Each lesson is buried in a different PR comment. Without this doc, the next session would re-run the experiments. With it, the signposts are explicit.

Structure

6 core invariants — structural truths a future codec must respect (two regimes; near-orthogonal rows; direction vs amplitude; wire-format caps; u8-span-u16 only with right decoder; ticket-for-curve)
7 approaches tried — each with "mutation hooks" (what it could evolve into)
3 right abstractions — SpiralEncoding, per-role stride, HHTL cascade
4 open probes — unproven claims with specific next experiments
Déjà-vu table — "if you're tempted to X, read PR #Y first" (7 instincts)
5-question structural checklist before shipping any new codec

Probe deferred

The obvious next experiment — P1: SpiralEncoding::rehydrate_interpolated on real Qwen3-TTS-0.6B weight rows — is fully specified in the doc but not run in this PR. Session pausing due to token budget per user request. Starting cold next session from this doc should lose zero context.

Session stack

PR	Status
`#176` – `#184`	merged
`#185`	open (HhtlF32Tensor palette bounds — codex P1 fix)
`#186`	this PR (invariants doc)

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

Session-end artefact for future déjà-vu. Catalogues every compression approach tried in PRs #176-#185 and the lesson each one produced. No approach is thrown away — each failed experiment carries information about where the real boundary is. ## Structure ### Core invariants (6) I1. Two regimes, opposite needs (argmax vs index) I2. Near-orthogonality of weight rows in high dim I3. Direction vs amplitude cannot be merged into one scalar I4. Wire-format type widths are hard caps — assert at encode time I5. 'u8 can span u16/u64 effective' requires the right decoder I6. The ticket-for-curve model (SpiralAddress + shared curve) ### Approaches tried (7) A1. HhtlDTensor — Base17 + Slot D + Slot V (correct for cascade, wrong for f32 GEMM) A2. Progressive residual RVQ with k-ladder (works argmax, fails index) A3. Hierarchical CLAM 256x256 (REFUTED — cos 0.0046 on vocab) A4. Passthrough BF16 n_rows > 8192 (SHIPS for correctness, net loss for ratio) A5. SlotL 8 x i8 on SVD basis (correct algorithm, misapplied to Base17 centroid) A6. HhtlF32Tensor f32 palette + SlotL (right direction, 10x better, still short) A7. cascade_attention_probe Base17 palette (3.71% argmax agreement — palette doesn't preserve inner products) ### Abstractions that ARE the right primitive (3) R1. highheelbgz::rehydrate::SpiralEncoding (exists, untested on real Qwen3) R2. Per-role stride in NeuronPrint (q/k=3, v=5, gate=8, up=2, down=4) R3. HHTL cascade inference (hhtl_cache RouteAction) ### Open probes (4) P1. SpiralEncoding on real Qwen3 weights — claim rho >= 0.95 unproven P2. Shared anchors + i8 position per row — depends on P1 P3. Palette preserves inner-product neighbourhoods — A7 refuted for Base17 P4. Log-radial CLAM with magnitude split — hypothesised > linear CLAM ### Déjà-vu table Lists 7 'if you're tempted to...' instincts with the PR that already refuted them. Exists so future sessions hit the lesson before writing the code. ### Structural checklist (5 questions) Before shipping any new codec: 1. What regime does this tensor belong to? (I1) 2. Does the codec encode direction AND amplitude separately? (I3) 3. Is the palette substrate inner-product-preserving? (I2, A7) 4. Does the decoder evaluate the curve, or tile anchors? (I5) 5. Are wire-format widths asserted at encode time? (I4) ## Why this doc matters Every failed approach in this session taught something the next session would otherwise re-learn the hard way. HCLAM (#177->#178) already has its lesson buried in a passthrough commit. The Base17 reconstruction failure (#183) is buried in a PR comment. The #184 Path A/B duality (they aren't independent) is only visible if you read the probe results. This doc surfaces all of it as a single index, structured for mutation: each approach has 'mutation hooks' naming how it could evolve into something that works, rather than being discarded. ## Next step blocked by token budget The SpiralEncoding-on-real-Qwen3 probe (P1) is the obvious next experiment and would have landed in this PR. Deferred to a fresh session with budget. The doc leaves the probe fully specified so re-entering cold loses no context. https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99be7bf921

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-15T12:49:16Z

+
+Claim: `SpiralEncoding::rehydrate_interpolated` hits ρ ≥ 0.95 on real Qwen3-TTS-0.6B weight rows at reasonable K (say K=4–16).
+
+Probe: `spiral_reconstruction_probe.rs` (this PR).


Mark the P1 probe as missing or add the referenced file

This line says the P1 experiment probe is spiral_reconstruction_probe.rs in this PR, but this commit only adds documentation and the repository has no file by that name, so the documented next step is not runnable as written. Because this document is intended as the handoff for the next session, the missing artifact can cause immediate confusion and duplicate effort; either add the probe file or change this entry to NOT YET WRITTEN (consistent with P2–P4).

Useful? React with 👍 / 👎.

AdaWorldAPI · 2026-04-17T06:14:11Z

P1 probe run — first empirical measurement of `SpiralEncoding` on real Qwen3-TTS weights

spiral_reconstruction_probe.rs (just added) runs highheelbgz::rehydrate::SpiralEncoding against 256 stride-sampled rows of talker.model.layers.0.self_attn.k_proj.weight at K ∈ {4, 8, 16}, spiral stride=3 (matching NeuronPrint's k_proj role).

Clarification discovered during probe design

SpiralEncoding is a signature codec (17 Base17 dims × K anchor samples per row), not a dense-row reconstructor. So the probe measures neighborhood preservation instead of per-element ρ:

G1: self-cosine = 1.0 identity check
G2: top-1 NN match vs raw-cosine argmax
G3: pairwise rank agreement on random pairs

Results

K	Top-1 NN	Top-5 NN	Pairwise rank-agree	Bytes/row
4	18.4%	39.8%	0.663	142
8	31.6%	59.8%	0.747	278
16	44.9%	78.9%	0.803	550

Self-cos = 1.000000 at all K (identity holds).

Status: P1 PARTIAL

~12× better than Base17 palette (feat(bgz-tensor): HhtlF32Tensor codec + Path A encoder + Path B argmax-parity probe #184 cascade_attention_probe: 3.71% top-1 at similar cost per row)
Monotonic with K — confirms signature is capturing real structure
Misses the 90% top-1 / 0.85 rank-agree gate even at K=16 (550 B/row, too expensive to keep growing K)

What this tells us

SpiralEncoding is directionally right — preserves far more neighborhood structure than the Base17 palette — but K-bound at the tested budget. The 79% top-5 result is the most actionable signal: most queries find their true NN within the top 5 candidates.

Forward menu (updated in invariants doc)

Hybrid: SpiralEncoding at small K + compact BF16 / 8 × i8 residual correction on top
Per-role stride sweep: tested stride=3 for k_proj; other roles use 2/4/5/8 per NeuronPrint design
Wire SpiralEncoding into cascade_attention_probe directly — cascade routing (Skip/Attend/Compose/Escalate with top-5 escalation) may converge where raw argmax-parity does not. The 79% top-5 suggests this is the right fallback.

Session stack

PR	Status
#176 – #184	merged
#185	open — `HhtlF32Tensor` palette bounds (codex P1)
#186	this PR — invariants doc + P1 SpiralEncoding probe with measurements

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

Generated by Claude Code

Per procedure-bookkeeping.md Pass 2: classify each "none" row from Pass 1 as superseded / live / archived. Result: 25 open → 13 superseded, 6 live, 6 archived. Superseded (shipped under overlapping PRs): FINAL_MAP (#65), session_A_v3 (Phase 1 #29), session_B_v3 (Phase 2), session_6d (#78), session_bgz17_similarity (#40), session_unified_26_epiphanies (#60), session_ontology_layer_audit (#155), research_quantized_graph_algebra (#186-198), session_MASTER_map_v3, session_{integration,master,model}_plan (elegant-herding-rocket) Live (aligned to active phases): P18_INTERNAL_LLM (Phase 8 D2), SCOPED_PROMPTS (refresh candidate), arxiv (governance), session_C_v3 (Phase 3 Lane A), session_D_v3 (Phase 4), session_epiphany_integration (Phase 8), session_unified_vector_search (Phase 3 cross-repo) Archived (moved to prompts/archive/ in prior commit): 6 audio/codec/fisher-z files https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

chatgpt-codex-connector Bot reviewed Apr 15, 2026

View reviewed changes

AdaWorldAPI merged commit e12825d into main Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: codec invariants + experiment catalogue (session-end déjà-vu)#186

docs: codec invariants + experiment catalogue (session-end déjà-vu)#186
AdaWorldAPI merged 1 commit intomainfrom
claude/codec-invariants-and-spiral-probe

AdaWorldAPI commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 15, 2026

Uh oh!

AdaWorldAPI commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		Claim: `SpiralEncoding::rehydrate_interpolated` hits ρ ≥ 0.95 on real Qwen3-TTS-0.6B weight rows at reasonable K (say K=4–16).

		Probe: `spiral_reconstruction_probe.rs` (this PR).

Conversation

AdaWorldAPI commented Apr 15, 2026

Summary

Why

Structure

Probe deferred

Session stack

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

AdaWorldAPI commented Apr 17, 2026

P1 probe run — first empirical measurement of SpiralEncoding on real Qwen3-TTS weights

Clarification discovered during probe design

Results

Status: P1 PARTIAL

What this tells us

Forward menu (updated in invariants doc)

Session stack

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

P1 probe run — first empirical measurement of `SpiralEncoding` on real Qwen3-TTS weights