Skip to content

docs: codec invariants + experiment catalogue (session-end déjà-vu)#186

Merged
AdaWorldAPI merged 1 commit intomainfrom
claude/codec-invariants-and-spiral-probe
Apr 16, 2026
Merged

docs: codec invariants + experiment catalogue (session-end déjà-vu)#186
AdaWorldAPI merged 1 commit intomainfrom
claude/codec-invariants-and-spiral-probe

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Session-end catalogue of every compression approach tried in PRs #176#185 plus the lesson each produced. Written for future-session déjà-vu — every failed experiment is kept with a "mutation hook" explaining how it could evolve into something that works.

Single file: docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md (231 lines).

Why

The failures this session were informative:

Each lesson is buried in a different PR comment. Without this doc, the next session would re-run the experiments. With it, the signposts are explicit.

Structure

  • 6 core invariants — structural truths a future codec must respect (two regimes; near-orthogonal rows; direction vs amplitude; wire-format caps; u8-span-u16 only with right decoder; ticket-for-curve)
  • 7 approaches tried — each with "mutation hooks" (what it could evolve into)
  • 3 right abstractionsSpiralEncoding, per-role stride, HHTL cascade
  • 4 open probes — unproven claims with specific next experiments
  • Déjà-vu table — "if you're tempted to X, read PR #Y first" (7 instincts)
  • 5-question structural checklist before shipping any new codec

Probe deferred

The obvious next experiment — P1: SpiralEncoding::rehydrate_interpolated on real Qwen3-TTS-0.6B weight rows — is fully specified in the doc but not run in this PR. Session pausing due to token budget per user request. Starting cold next session from this doc should lose zero context.

Session stack

PR Status
#176#184 merged
#185 open (HhtlF32Tensor palette bounds — codex P1 fix)
#186 this PR (invariants doc)

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

Session-end artefact for future déjà-vu. Catalogues every compression
approach tried in PRs #176-#185 and the lesson each one produced. No
approach is thrown away — each failed experiment carries information
about where the real boundary is.

## Structure

### Core invariants (6)
  I1. Two regimes, opposite needs (argmax vs index)
  I2. Near-orthogonality of weight rows in high dim
  I3. Direction vs amplitude cannot be merged into one scalar
  I4. Wire-format type widths are hard caps — assert at encode time
  I5. 'u8 can span u16/u64 effective' requires the right decoder
  I6. The ticket-for-curve model (SpiralAddress + shared curve)

### Approaches tried (7)
  A1. HhtlDTensor — Base17 + Slot D + Slot V (correct for cascade, wrong for f32 GEMM)
  A2. Progressive residual RVQ with k-ladder (works argmax, fails index)
  A3. Hierarchical CLAM 256x256 (REFUTED — cos 0.0046 on vocab)
  A4. Passthrough BF16 n_rows > 8192 (SHIPS for correctness, net loss for ratio)
  A5. SlotL 8 x i8 on SVD basis (correct algorithm, misapplied to Base17 centroid)
  A6. HhtlF32Tensor f32 palette + SlotL (right direction, 10x better, still short)
  A7. cascade_attention_probe Base17 palette (3.71% argmax agreement — palette doesn't preserve inner products)

### Abstractions that ARE the right primitive (3)
  R1. highheelbgz::rehydrate::SpiralEncoding (exists, untested on real Qwen3)
  R2. Per-role stride in NeuronPrint (q/k=3, v=5, gate=8, up=2, down=4)
  R3. HHTL cascade inference (hhtl_cache RouteAction)

### Open probes (4)
  P1. SpiralEncoding on real Qwen3 weights — claim rho >= 0.95 unproven
  P2. Shared anchors + i8 position per row — depends on P1
  P3. Palette preserves inner-product neighbourhoods — A7 refuted for Base17
  P4. Log-radial CLAM with magnitude split — hypothesised > linear CLAM

### Déjà-vu table

Lists 7 'if you're tempted to...' instincts with the PR that already
refuted them. Exists so future sessions hit the lesson before writing
the code.

### Structural checklist (5 questions)

Before shipping any new codec:
  1. What regime does this tensor belong to? (I1)
  2. Does the codec encode direction AND amplitude separately? (I3)
  3. Is the palette substrate inner-product-preserving? (I2, A7)
  4. Does the decoder evaluate the curve, or tile anchors? (I5)
  5. Are wire-format widths asserted at encode time? (I4)

## Why this doc matters

Every failed approach in this session taught something the next session
would otherwise re-learn the hard way. HCLAM (#177->#178) already has
its lesson buried in a passthrough commit. The Base17 reconstruction
failure (#183) is buried in a PR comment. The #184 Path A/B duality
(they aren't independent) is only visible if you read the probe results.

This doc surfaces all of it as a single index, structured for mutation:
each approach has 'mutation hooks' naming how it could evolve into
something that works, rather than being discarded.

## Next step blocked by token budget

The SpiralEncoding-on-real-Qwen3 probe (P1) is the obvious next
experiment and would have landed in this PR. Deferred to a fresh
session with budget. The doc leaves the probe fully specified so
re-entering cold loses no context.

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99be7bf921

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


Claim: `SpiralEncoding::rehydrate_interpolated` hits ρ ≥ 0.95 on real Qwen3-TTS-0.6B weight rows at reasonable K (say K=4–16).

Probe: `spiral_reconstruction_probe.rs` (this PR).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Mark the P1 probe as missing or add the referenced file

This line says the P1 experiment probe is spiral_reconstruction_probe.rs in this PR, but this commit only adds documentation and the repository has no file by that name, so the documented next step is not runnable as written. Because this document is intended as the handoff for the next session, the missing artifact can cause immediate confusion and duplicate effort; either add the probe file or change this entry to NOT YET WRITTEN (consistent with P2–P4).

Useful? React with 👍 / 👎.

@AdaWorldAPI AdaWorldAPI merged commit e12825d into main Apr 16, 2026
Copy link
Copy Markdown
Owner Author

P1 probe run — first empirical measurement of SpiralEncoding on real Qwen3-TTS weights

spiral_reconstruction_probe.rs (just added) runs highheelbgz::rehydrate::SpiralEncoding against 256 stride-sampled rows of talker.model.layers.0.self_attn.k_proj.weight at K ∈ {4, 8, 16}, spiral stride=3 (matching NeuronPrint's k_proj role).

Clarification discovered during probe design

SpiralEncoding is a signature codec (17 Base17 dims × K anchor samples per row), not a dense-row reconstructor. So the probe measures neighborhood preservation instead of per-element ρ:

  • G1: self-cosine = 1.0 identity check
  • G2: top-1 NN match vs raw-cosine argmax
  • G3: pairwise rank agreement on random pairs

Results

K Top-1 NN Top-5 NN Pairwise rank-agree Bytes/row
4 18.4% 39.8% 0.663 142
8 31.6% 59.8% 0.747 278
16 44.9% 78.9% 0.803 550

Self-cos = 1.000000 at all K (identity holds).

Status: P1 PARTIAL

What this tells us

SpiralEncoding is directionally right — preserves far more neighborhood structure than the Base17 palette — but K-bound at the tested budget. The 79% top-5 result is the most actionable signal: most queries find their true NN within the top 5 candidates.

Forward menu (updated in invariants doc)

  1. Hybrid: SpiralEncoding at small K + compact BF16 / 8 × i8 residual correction on top
  2. Per-role stride sweep: tested stride=3 for k_proj; other roles use 2/4/5/8 per NeuronPrint design
  3. Wire SpiralEncoding into cascade_attention_probe directly — cascade routing (Skip/Attend/Compose/Escalate with top-5 escalation) may converge where raw argmax-parity does not. The 79% top-5 suggests this is the right fallback.

Session stack

PR Status
#176#184 merged
#185 open — HhtlF32Tensor palette bounds (codex P1)
#186 this PR — invariants doc + P1 SpiralEncoding probe with measurements

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj


Generated by Claude Code

AdaWorldAPI pushed a commit that referenced this pull request Apr 19, 2026
Per procedure-bookkeeping.md Pass 2: classify each "none" row from
Pass 1 as superseded / live / archived.

Result: 25 open → 13 superseded, 6 live, 6 archived.

Superseded (shipped under overlapping PRs):
  FINAL_MAP (#65), session_A_v3 (Phase 1 #29), session_B_v3 (Phase 2),
  session_6d (#78), session_bgz17_similarity (#40),
  session_unified_26_epiphanies (#60), session_ontology_layer_audit (#155),
  research_quantized_graph_algebra (#186-198), session_MASTER_map_v3,
  session_{integration,master,model}_plan (elegant-herding-rocket)

Live (aligned to active phases):
  P18_INTERNAL_LLM (Phase 8 D2), SCOPED_PROMPTS (refresh candidate),
  arxiv (governance), session_C_v3 (Phase 3 Lane A), session_D_v3
  (Phase 4), session_epiphany_integration (Phase 8),
  session_unified_vector_search (Phase 3 cross-repo)

Archived (moved to prompts/archive/ in prior commit):
  6 audio/codec/fisher-z files

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants