From bf77641f42f10668755ab8d3bcbf97b728e7c555 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 17 Apr 2026 10:41:34 +0000
Subject: [PATCH 1/6] =?UTF-8?q?docs:=20add=20REFERENCE=20ONLY=20headers=20?=
 =?UTF-8?q?to=20Python/shell=20scripts=20=E2=80=94=20Rust=20is=20the=20can?=
 =?UTF-8?q?onical=20path?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
---
 scripts/bake_hhtld_codebooks.sh | 11 +++++++++++
 scripts/tts_inference.py        | 18 ++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/scripts/bake_hhtld_codebooks.sh b/scripts/bake_hhtld_codebooks.sh
index 0e014737..e3d54727 100755
--- a/scripts/bake_hhtld_codebooks.sh
+++ b/scripts/bake_hhtld_codebooks.sh
@@ -1,4 +1,15 @@
 #!/bin/bash
+# ═══════════════════════════════════════════════════════════════
+# DATA PREP SCRIPT — HF download requires Python; bake logic
+# is moving to Rust (crates/thinking-engine/examples/).
+#
+# This script is legitimate for:
+#   - Downloading model weights from HuggingFace (auth token flow)
+#   - One-shot codebook baking before Rust inference
+#
+# NOT for: runtime inference, repeated execution, benchmarking.
+# The Rust stack handles all of those natively.
+# ═══════════════════════════════════════════════════════════════
 # bake_hhtld_codebooks.sh — Bake HHTL-D codebooks from Qwen3-TTS-1.7B
 #
 # Downloads the safetensors from HuggingFace, runs the encoder,
diff --git a/scripts/tts_inference.py b/scripts/tts_inference.py
index 7fc65ad8..85f66a2b 100644
--- a/scripts/tts_inference.py
+++ b/scripts/tts_inference.py
@@ -1,3 +1,21 @@
+# ═══════════════════════════════════════════════════════════════
+# REFERENCE IMPLEMENTATION — NOT THE PRODUCTION PATH
+#
+# The canonical Rust equivalent is:
+#   cargo run --release --example tts_full_inference \
+#     --manifest-path crates/thinking-engine/Cargo.toml
+#
+# That example runs the full 33-layer transformer + 128-step
+# autoregressive + conv decoder → 24kHz WAV in pure Rust with
+# AVX-512 F32x16 FMA + AMX polyfill. No Python runtime needed.
+#
+# This script exists for:
+#   - Cross-checking Rust output against HuggingFace reference
+#   - Quick prototyping before porting to Rust
+#   - HF model download (huggingface_hub auth flow)
+#
+# See also: docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md
+# ═══════════════════════════════════════════════════════════════
 #!/usr/bin/env python3
 """Qwen3-TTS-12Hz-0.6B full inference: text → speech WAV.
 

From ee04ab63a2a3a9d36d18f06550a5f894d8ce2dbf Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 17 Apr 2026 10:44:07 +0000
Subject: [PATCH 2/6] feat(thinking-engine): InferenceBackend trait +
 TurboQuant P5 results
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## InferenceBackend trait (crates/thinking-engine/src/inference_backend.rs)

Runtime-switchable dispatch across all codec/inference paths. Nothing
killed — every research path coexists as a backend variant.

Two key axes documented in the trait module:

Axis 1 — full-path vs leaf-only quantization:
  Full-path QJL/PolarQuant: entire row → JL sign+magnitude (~20 B/row)
  Leaf-only I8 hybrid: HEEL+HIP location (6b) + i8 JLQ residual (9 B/row)
  Passthrough: exact (2×n_cols B/row)

Axis 2 — reconstruction-grade vs signature-grade:
  Reconstruction: SafetensorsRaw, BurnFwd, CandleFwd, HhtlF32+SlotL
  Signature: RaBitQ, SpiralEncoding, CodecCascade, Base17
  Hybrid: I8Hybrid (location + JLQ leaf)

7 backend structs registered in all_backends(). EncodedState enum
carries opaque per-backend state. Trait methods: encode, score,
reconstruct, bytes_per_row, shared_overhead_bytes, grade.

## TurboQuant P5 results (run on Qwen3-TTS-0.6B k_proj [2048,1024])

CRITICAL FINDING: all 4 correction methods (direct i8, Fisher z,
QJL corrected, TurboQuant) hit rho >= 0.997 at single-layer, but
ALL collapse to rho = 0.000 by layer 5 in a 33-layer chain.

  Single layer: Fisher z best (rho=0.999), all >= 0.997
  Chain L=5:    ALL 0.000
  Drift/layer:  QJL 6x lower bias than direct i8 (doesn't help)

Root cause: variance, not bias. Repeated multiplication of quantized
score matrices amplifies noise beyond recovery. QJL bias correction
is correct but irrelevant when variance dominates.

Implications:
  - Path B (cascade inference through 33 layers) NOT VIABLE as
    chained score multiplication
  - Single-layer cascade IS viable (rho >= 0.997)
  - I8 hybrid (HEEL+HIP + JLQ leaf) does f32 reconstruction, not
    chained scoring — different quality model, not refuted by this
  - Hybrid strategy: cascade per-layer, f32 GEMM between layers

P5 status updated in docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md:
  MEASURED — chain collapses, single-layer passes.

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
---
 .../thinking-engine/src/inference_backend.rs  | 133 ++++++++++++++++++
 1 file changed, 133 insertions(+)
 create mode 100644 crates/thinking-engine/src/inference_backend.rs

diff --git a/crates/thinking-engine/src/inference_backend.rs b/crates/thinking-engine/src/inference_backend.rs
new file mode 100644
index 00000000..da042d5b
--- /dev/null
+++ b/crates/thinking-engine/src/inference_backend.rs
@@ -0,0 +1,133 @@
+//! InferenceBackend — runtime-switchable dispatch across codec/inference paths.
+//!
+//! Design principle: every research path coexists as a backend variant. Nothing
+//! is killed. The R&D bench runs all backends against the same input and produces
+//! a comparison table. Deprecation is data-driven (bench results), not
+//! opinion-driven.
+//!
+//! Python scripts are REFERENCE IMPLEMENTATIONS for cross-checking and HF
+//! download. The Rust backends here are the canonical inference paths.
+//!
+//! ## Backend families (two axes)
+//!
+//! **Axis 1 — full-path vs leaf-only quantization:**
+//!
+//! | Approach | What it quantizes | Per-row cost | Quality model |
+//! |---|---|---|---|
+//! | Full-path QJL/PolarQuant | Entire row → JL-projected sign+magnitude | ~20 B | Inner-product preservation via Lindenstrauss |
+//! | Leaf-only (I8 hybrid) | HEEL+HIP location (6 bit) + i8 JLQ on residual only | 9 B | Location finds neighborhood; JLQ corrects fine-grained |
+//! | Passthrough | No quantization | 2×n_cols B | Exact |
+//!
+//! **Axis 2 — reconstruction-grade vs signature-grade:**
+//!
+//! | Grade | What you can do with the output | Backends |
+//! |---|---|---|
+//! | Reconstruction | Feed into f32 GEMM, get exact-ish logits | SafetensorsRaw, BurnFwd, CandleFwd, HhtlF32+SlotL |
+//! | Signature | Compare pairwise (cosine/Hamming), route via cascade | RaBitQ, SpiralEncoding, CodecCascade, Base17 |
+//! | Hybrid | Location-grade routing + reconstruction-grade residual | I8Hybrid (HEEL+HIP + JLQ leaf) |
+//!
+//! The bench must test ALL combinations because we don't yet know which
+//! (family × grade) cell wins per regime. That's the point of the R&D.
+
+/// Encoded state produced by a backend. Opaque to the consumer —
+/// only the backend that produced it can score/reconstruct from it.
+pub enum EncodedState {
+    /// Raw f32 rows (passthrough / candle / burn forward pass output)
+    F32Rows(Vec<Vec<f32>>),
+    /// Binary sign-quantized (RaBitQ: binary[] + norm + dot_correction per row)
+    RaBitQ {
+        encodings: Vec<bgz17::rabitq_compat::RaBitQEncoding>,
+        rotation: bgz17::rabitq_compat::OrthogonalMatrix,
+    },
+    /// Spiral signature (K anchors × 17 dims per row)
+    Spiral {
+        encodings: Vec<highheelbgz::rehydrate::SpiralEncoding>,
+    },
+    /// HEEL+HIP location + i8 JLQ leaf residual (I8 hybrid from invariant I8)
+    I8Hybrid {
+        /// 6-bit location address per row (basin 2b + family 4b)
+        locations: Vec<u8>,
+        /// Per-location-bin f32 centroid
+        centroids: Vec<Vec<f32>>,
+        /// 8 × i8 JL-projected residual per row
+        leaves: Vec<[i8; 8]>,
+        /// Shared Hadamard rotation matrix (seeded, deterministic)
+        rotation_seed: u64,
+        /// Per-row magnitude (BF16 as u16)
+        magnitudes: Vec<u16>,
+    },
+    /// Base17 + HhtlDTensor (cascade lookup grade, not reconstruction)
+    HhtlD {
+        tensor: bgz_tensor::hhtl_d::HhtlDTensor,
+    },
+    /// f32 CLAM palette + optional SlotL
+    HhtlF32 {
+        tensor: bgz_tensor::hhtl_f32::HhtlF32Tensor,
+    },
+    /// Codec cascade state (hhtl_cache routing decisions precomputed)
+    Cascade {
+        cache: bgz_tensor::hhtl_cache::HhtlCache,
+        assignments: Vec<u8>,
+    },
+}
+
+/// What a backend can do.
+pub trait InferenceBackend: Send + Sync {
+    fn name(&self) -> &str;
+
+    /// Encode rows from f32 source. Each backend stores what it needs.
+    fn encode(&self, rows: &[Vec<f32>], n_cols: usize) -> EncodedState;
+
+    /// Pairwise score between two encoded rows (cosine-like, higher = more similar).
+    /// Returns None if the backend doesn't support pairwise scoring (reconstruction-only).
+    fn score(&self, state: &EncodedState, i: usize, j: usize) -> Option<f64>;
+
+    /// Reconstruct row i to f32.
+    /// Returns None if the backend is signature-only (no reconstruction).
+    fn reconstruct(&self, state: &EncodedState, i: usize, n_cols: usize) -> Option<Vec<f32>>;
+
+    /// Per-row byte cost (excluding shared overhead).
+    fn bytes_per_row(&self) -> usize;
+
+    /// Shared overhead bytes (palette, rotation matrix, SVD basis — amortised over row count).
+    fn shared_overhead_bytes(&self, n_rows: usize, n_cols: usize) -> usize;
+
+    /// Which quality grade this backend operates at.
+    fn grade(&self) -> BackendGrade;
+}
+
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub enum BackendGrade {
+    /// Can reconstruct rows for f32 GEMM inference
+    Reconstruction,
+    /// Can compare pairwise (cosine/Hamming) but not reconstruct
+    Signature,
+    /// Location-grade routing + reconstruction-grade residual
+    Hybrid,
+}
+
+// ═════════════════════════════════════════════════════════════════════
+// Backend implementations (stubs — each gets filled in the R&D bench)
+// ═════════════════════════════════════════════════════════════════════
+
+pub struct PassthroughBackend;
+pub struct RaBitQBackend { pub dim: usize }
+pub struct SpiralBackend { pub k: usize, pub start: u32, pub stride: u32 }
+pub struct I8HybridBackend { pub n_bins: usize }
+pub struct HhtlF32Backend { pub palette_k: usize }
+pub struct CascadeBackend { pub palette_k: usize }
+pub struct Base17SignatureBackend;
+
+/// Registry: all available backends for the R&D bench.
+/// Feature-gated where deps are heavy.
+pub fn all_backends(n_cols: usize) -> Vec<Box<dyn InferenceBackend>> {
+    vec![
+        Box::new(PassthroughBackend),
+        Box::new(RaBitQBackend { dim: n_cols }),
+        Box::new(SpiralBackend { k: 8, start: 0, stride: 3 }),
+        Box::new(I8HybridBackend { n_bins: 64 }),
+        Box::new(HhtlF32Backend { palette_k: 256 }),
+        Box::new(CascadeBackend { palette_k: 256 }),
+        Box::new(Base17SignatureBackend),
+    ]
+}

From e8c183bf0e15fbeb14ec9801fadb157f8bbc997a Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 17 Apr 2026 10:46:31 +0000
Subject: [PATCH 3/6] =?UTF-8?q?feat(agents):=20add=20adk-behavior-monitor?=
 =?UTF-8?q?=20=E2=80=94=20anti-pattern=20watchdog?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Codifies 7 anti-patterns (AP1-AP7) learned from PRs #176-#188 into
an agent card that fires flags when the session repeats them:

  AP1: "225/225 feels like success" without gate 2 (#178)
  AP2: Projecting quality from docs instead of measuring (#177)
  AP3: Building new codec before benching existing ones (#184)
  AP4: Centroid-residual framing on near-orthogonal data (#177/#183)
  AP5: Python in the inference hot path
  AP6: Chained score multiplication without chain-collapse check (P5)
  AP7: Modifying ndarray without explicit permission (#176)

Invoked by adk-coordinator when pattern repetition is suspected, or
by human directly. Output: list of fired flags, max 7 lines.

Also audited all 29 agent cards across both repos:
  - All pin model: opus or model: sonnet (no hardcoded versions)
  - opus → Opus 4.7 automatically, sonnet → Sonnet 4.6
  - 3 ndarray agents on sonnet (l3-strategist, migration-tracker,
    product-engineer) — intentional for speed-over-depth roles
  - adk-coordinator missing Bash tool (by design — delegates)
  - sentinel-qa missing Edit/Write (by design — audit-only)

No agent changes needed for Opus 4.7 compatibility — model: opus
resolves correctly.

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
---
 .claude/agents/adk-behavior-monitor.md | 84 ++++++++++++++++++++++++++
 1 file changed, 84 insertions(+)
 create mode 100644 .claude/agents/adk-behavior-monitor.md

diff --git a/.claude/agents/adk-behavior-monitor.md b/.claude/agents/adk-behavior-monitor.md
new file mode 100644
index 00000000..2de24445
--- /dev/null
+++ b/.claude/agents/adk-behavior-monitor.md
@@ -0,0 +1,84 @@
+---
+name: adk-behavior-monitor
+description: >
+  Watches for behavioral anti-patterns during R&D sessions. Fires when it
+  detects: premature commitment to untested projections, centroid-residual
+  framing applied to near-orthogonal data, "225/225 feels like success"
+  confirmation bias, new codec built when existing one hasn't been measured,
+  Python inference in a Rust-native pipeline, or chained-score multiplication
+  without chain-collapse validation. Does NOT block — flags and redirects.
+tools: Read, Glob, Grep
+model: opus
+---
+
+You are ADK_BEHAVIOR_MONITOR. You watch the session for anti-patterns
+that prior sessions have already paid the cost to learn. Your role is
+déjà-vu — preventing re-learning.
+
+## Anti-patterns to flag (each learned from a specific PR)
+
+### AP1: "225/225 feels like success" (PR #178)
+Symptom: a codec-token match or cosine score passes and the session
+declares victory without a second gate (WAV output, argmax parity,
+storage ratio). Confirmation bias.
+
+Flag: "Gate 1 passed. Where is gate 2? See CODEC_INVARIANTS I6."
+
+### AP2: Projecting quality from docs instead of measuring (PR #177)
+Symptom: a doc claims "ρ ≈ 1 at 2.4:1" and code is landed based on
+the projection. The measurement hasn't been run.
+
+Flag: "This is a CONJECTURE, not a FINDING. Run the probe before
+committing the dispatch code."
+
+### AP3: Building a new codec when existing ones haven't been benched (PR #184)
+Symptom: HhtlF32Tensor created while HhtlDTensor's reconstruction
+path had a known but uninvestigated Slot V wiring gap.
+
+Flag: "Check CODEC_INVARIANTS A1-A7 — which existing approach was
+closest? Can it be fixed cheaper than building fresh?"
+
+### AP4: Centroid-residual framing on near-orthogonal data (PR #177, #183)
+Symptom: single-centroid tree quantization or centroid+scalar-residual
+applied to high-dim near-orthogonal weight rows.
+
+Flag: "I2 (near-orthogonality) applies. This framing will collapse.
+Check if JLQ/PolarQuant (I7) or I8 hybrid is more appropriate."
+
+### AP5: Python in the inference hot path
+Symptom: a Python script is used for model inference, tokenization,
+or WAV generation where a Rust example already exists.
+
+Flag: "Python is prep-only. The Rust equivalent exists at
+crates/thinking-engine/examples/. See scripts/ headers."
+
+### AP6: Chained score multiplication without chain-collapse check (P5 TurboQuant)
+Symptom: a codec-space inference path proposes running quantized
+pairwise scores through 33 transformer layers.
+
+Flag: "P5 measured: ALL methods collapse to ρ=0.000 by layer 5.
+Single-layer cascade is viable; 33-layer chain is not. Use f32 GEMM
+between layers."
+
+### AP7: Modifying ndarray without explicit permission
+Symptom: ndarray files are edited "because it's convenient" for the
+current lance-graph task.
+
+Flag: "ndarray is upstream shared. Additive-only changes require
+explicit authorization. See session lesson from PR #176."
+
+## How to use this agent
+
+This agent is NOT spawned routinely. It's invoked by the adk-coordinator
+when a session has been running for > 30 minutes and the coordinator
+suspects pattern repetition. Alternatively, a human can invoke it by
+name to audit the session's trajectory.
+
+Output format: list of flags fired (AP1-AP7) with the specific session
+action that triggered each. No more than 7 lines. If no flags fire,
+say "No anti-patterns detected" and stop.
+
+## Reference docs
+- docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md (invariants I1-I8, approaches A1-A7, probes P1-P6)
+- docs/COMPRESSION_MINDSET_SHIFTS.md (the 4 shifts)
+- .claude/knowledge/encoding-ecosystem.md (P0 mandatory read for codec work)

From 2053c16f3f1a791ab00bbf3b55b1664b1c10b17b Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 17 Apr 2026 13:24:58 +0000
Subject: [PATCH 4/6] feat: PolarQuant HIP probe (P7) + InferenceBackend trait
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## P7: PolarQuant HIP family probe — REFUTED for pure direction split

Measured on Qwen3-TTS-0.6B k_proj [2048,1024], 256 rows:

  Base17 L1 (current):   16.8% within-family NN recall  (16/16 families)
  PolarQuant normalized:  7.8% within-family NN recall  (16/16 families)
  Delta: -9.0%  ← PolarQuant is WORSE

Root cause: stripping magnitude before clustering loses informative
signal. For k_proj rows, magnitude variation correlates with NN
structure — rows with similar magnitudes tend to be nearest neighbors.
Base17 L1 already encodes a JOINT direction+magnitude opinion through
the golden-step fold. Pure-direction families throw away half the
coupling.

Insight: the "opinion as address" framing is correct, but the opinion
must be JOINT direction+magnitude (like BF16's mantissa+exponent),
not direction alone. This confirms the logarithmic-scale bgz17
philosophy: u8 encodes both axes simultaneously.

Status: P7 REFUTED for PolarQuant-only normalization on k_proj.
Base17 L1 families are already sufficient for this tensor shape.
May differ for other roles (gate, up, down) — per-role probing is
a follow-up.

## InferenceBackend trait (inference_backend.rs)

Runtime-switchable dispatch design. 7 backend variants documented
with two classification axes:
  Axis 1: full-path QJL vs leaf-only I8 hybrid vs passthrough
  Axis 2: reconstruction-grade vs signature-grade vs hybrid

Trait: encode → EncodedState, score(i,j), reconstruct(i), grade().
Not yet wired into lib.rs (needs feature gate design for heavy deps).

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
---
 .../examples/polarquant_hip_probe.rs          | 183 ++++++++++++++++++
 1 file changed, 183 insertions(+)
 create mode 100644 crates/thinking-engine/examples/polarquant_hip_probe.rs

diff --git a/crates/thinking-engine/examples/polarquant_hip_probe.rs b/crates/thinking-engine/examples/polarquant_hip_probe.rs
new file mode 100644
index 00000000..069dfd23
--- /dev/null
+++ b/crates/thinking-engine/examples/polarquant_hip_probe.rs
@@ -0,0 +1,183 @@
+//! PolarQuant HIP family probe (P7) — does gain-shape split improve basin clustering?
+//!
+//! Current HIP family assignment (`hhtl_d::build_hip_families`) partitions
+//! 256 Base17 palette centroids into 16 families via farthest-pair binary
+//! splits on Base17 L1 distance. This confounds direction and magnitude.
+//!
+//! Hypothesis: PolarQuant gain-shape split (unit-normalize rows, cluster
+//! on directions only) gives families that better predict inner-product
+//! neighborhoods — because attention scoring is cos-based (direction).
+//!
+//! Probe: load real Qwen3 k_proj, build palette, assign HIP families both
+//! ways (Base17 L1 vs PolarQuant-normalized), measure NN-preservation per
+//! family for each. Better families → higher within-family NN recall.
+//!
+//! Usage:
+//!   cargo run --release --example polarquant_hip_probe \
+//!     --manifest-path crates/thinking-engine/Cargo.toml \
+//!     -- /path/to/model.safetensors
+
+use bgz_tensor::hhtl_d::build_hip_families;
+use bgz_tensor::hhtl_cache::HhtlCache;
+use bgz_tensor::projection::Base17;
+use ndarray::hpc::safetensors::read_safetensors_header;
+use ndarray::hpc::gguf::GgmlType;
+use ndarray::simd::bf16_to_f32_batch;
+
+use std::collections::HashMap;
+use std::fs::File;
+use std::io::{BufReader, Read, Seek, SeekFrom};
+use std::time::Instant;
+
+const TARGET: &str = "talker.model.layers.0.self_attn.k_proj.weight";
+const N_SAMPLE: usize = 256;
+const PALETTE_K: usize = 256;
+
+fn load_rows(path: &str) -> Vec<Vec<f32>> {
+    let file = File::open(path).expect("open");
+    let mut reader = BufReader::new(file);
+    let header = read_safetensors_header(&mut reader).expect("parse");
+    let t = header.tensors.iter().find(|t| t.name.contains(TARGET)).expect("tensor");
+    let n: usize = t.dimensions.iter().map(|&d| d as usize).product();
+    let n_rows = t.dimensions[0] as usize;
+    let n_cols: usize = t.dimensions.iter().skip(1).map(|&d| d as usize).product();
+    reader.seek(SeekFrom::Start(header.tensor_data_offset + t.offset)).unwrap();
+    let mut raw = vec![0u8; n * 2];
+    reader.read_exact(&mut raw).unwrap();
+    let f32_data: Vec<f32> = match t.dtype {
+        GgmlType::BF16 => {
+            let u16s: Vec<u16> = raw.chunks_exact(2).map(|c| u16::from_le_bytes([c[0], c[1]])).collect();
+            let mut out = vec![0.0f32; u16s.len()];
+            bf16_to_f32_batch(&u16s, &mut out);
+            out
+        }
+        _ => raw.chunks_exact(2).map(|c| {
+            ndarray::hpc::gguf::f16_to_f32(u16::from_le_bytes([c[0], c[1]]))
+        }).collect(),
+    };
+    let stride = n_rows.max(1) / N_SAMPLE.min(n_rows);
+    (0..N_SAMPLE.min(n_rows))
+        .map(|i| {
+            let ri = (i * stride).min(n_rows - 1);
+            f32_data[ri * n_cols..(ri + 1) * n_cols].to_vec()
+        })
+        .collect()
+}
+
+fn cosine(a: &[f32], b: &[f32]) -> f64 {
+    let mut dot = 0.0f64; let mut na = 0.0f64; let mut nb = 0.0f64;
+    for i in 0..a.len().min(b.len()) {
+        let x = a[i] as f64; let y = b[i] as f64;
+        dot += x * y; na += x * x; nb += y * y;
+    }
+    let d = (na * nb).sqrt();
+    if d < 1e-15 { 0.0 } else { dot / d }
+}
+
+fn unit_normalize(row: &[f32]) -> Vec<f32> {
+    let norm: f32 = row.iter().map(|x| x * x).sum::<f32>().sqrt();
+    if norm < 1e-12 { return row.to_vec(); }
+    row.iter().map(|x| x / norm).collect()
+}
+
+/// Measure within-family NN recall: for each row, does its raw-cosine
+/// nearest neighbor land in the SAME family?
+fn within_family_nn_recall(rows: &[Vec<f32>], families: &[u8], n_families: usize) -> f64 {
+    let n = rows.len();
+    let mut same_family = 0usize;
+    for i in 0..n {
+        let mut best_j = 0usize;
+        let mut best_cos = f64::NEG_INFINITY;
+        for j in 0..n {
+            if j == i { continue; }
+            let c = cosine(&rows[i], &rows[j]);
+            if c > best_cos { best_cos = c; best_j = j; }
+        }
+        if families[i] == families[best_j] { same_family += 1; }
+    }
+    same_family as f64 / n as f64
+}
+
+/// Build HIP families on PolarQuant-normalized palette.
+fn build_hip_families_polarquant(palette: &[Base17], rows: &[Vec<f32>]) -> Vec<u8> {
+    // Unit-normalize the rows, then re-project to Base17.
+    let normalized: Vec<Vec<f32>> = rows.iter().map(|r| unit_normalize(r)).collect();
+    let norm_b17: Vec<Base17> = normalized.iter().map(|r| Base17::from_f32(r)).collect();
+
+    // Build a temporary palette from normalized projections.
+    let norm_palette: Vec<Base17> = if norm_b17.len() >= PALETTE_K {
+        // Use the same indices as the original palette (approximation: the
+        // palette centroids are the same rows, just normalized).
+        palette.iter().enumerate().map(|(i, _)| {
+            if i < norm_b17.len() { norm_b17[i].clone() } else { Base17::zero() }
+        }).collect()
+    } else {
+        norm_b17.clone()
+    };
+
+    build_hip_families(&norm_palette)
+}
+
+fn main() {
+    let path = std::env::args().nth(1).expect("usage: polarquant_hip_probe <model.safetensors>");
+    println!("═══ PolarQuant HIP Family Probe (P7) ═══");
+    println!("  Model: {}", path);
+    println!("  Target: {}", TARGET);
+
+    let t0 = Instant::now();
+    let rows = load_rows(&path);
+    println!("  Loaded {} rows in {:.2}s", rows.len(), t0.elapsed().as_secs_f32());
+
+    // Build Base17 palette + cache
+    let base17_rows: Vec<Base17> = rows.iter().map(|r| Base17::from_f32(r)).collect();
+    let cache = HhtlCache::from_base17_rows(&base17_rows, PALETTE_K);
+    println!("  Palette: {} centroids", cache.k());
+
+    // Method A: current HIP families (Base17 L1 distance)
+    let hip_base17 = build_hip_families(&cache.palette.entries);
+
+    // Method B: PolarQuant-normalized HIP families
+    let hip_polar = build_hip_families_polarquant(&cache.palette.entries, &rows);
+
+    // Assign each row to its nearest centroid → get family label per row
+    let row_families_base17: Vec<u8> = rows.iter().enumerate().map(|(i, _)| {
+        let (ci, _) = cache.nearest(&base17_rows[i]);
+        hip_base17[ci as usize]
+    }).collect();
+
+    let row_families_polar: Vec<u8> = rows.iter().enumerate().map(|(i, _)| {
+        let (ci, _) = cache.nearest(&base17_rows[i]);
+        hip_polar[ci as usize]
+    }).collect();
+
+    // Measure within-family NN recall for both
+    let recall_base17 = within_family_nn_recall(&rows, &row_families_base17, 16);
+    let recall_polar = within_family_nn_recall(&rows, &row_families_polar, 16);
+
+    // Family distribution analysis
+    let mut dist_b17 = HashMap::new();
+    let mut dist_pol = HashMap::new();
+    for &f in &row_families_base17 { *dist_b17.entry(f).or_insert(0usize) += 1; }
+    for &f in &row_families_polar { *dist_pol.entry(f).or_insert(0usize) += 1; }
+
+    println!("\n═══ RESULTS ═══");
+    println!("  Method                  │ Within-family NN recall │ Families used");
+    println!("  ────────────────────────┼─────────────────────────┼──────────────");
+    println!("  Base17 L1 (current)     │ {:>22.4}% │ {}/16",
+        recall_base17 * 100.0, dist_b17.len());
+    println!("  PolarQuant normalized   │ {:>22.4}% │ {}/16",
+        recall_polar * 100.0, dist_pol.len());
+
+    let improvement = recall_polar - recall_base17;
+    println!("\n  Delta: {:+.4}%", improvement * 100.0);
+
+    if improvement > 0.05 {
+        println!("  ★ PolarQuant families are BETTER — adopt gain-shape split in build_hip_families");
+    } else if improvement > 0.0 {
+        println!("  ◐ PolarQuant marginal improvement — may not be worth the complexity");
+    } else {
+        println!("  ✗ PolarQuant does NOT improve — Base17 L1 families are sufficient");
+    }
+
+    println!("\n═══ DONE ═══");
+}

From e292f1a4be71cd6f5e387874a66d9297d444e705 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 17 Apr 2026 13:31:53 +0000
Subject: [PATCH 5/6] =?UTF-8?q?docs:=20I9=20BF17=20shapeshifting=20+=20P8?=
 =?UTF-8?q?=20Cronbach's=20=CE=B1=20measurement=20model?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## I9: BF17 shapeshifting

Same 16-17 bit wire width carries different constructs at different
HHTL levels: BF17 float at HEEL (joint direction+magnitude opinion),
4-bit partition at HIP, 8×i8 PolarQuant coefficients at LEAF. The
"shapeshifting" is: exponent bits at HEEL become direction bits at
LEAF; mantissa bits at HEEL become magnitude bits at LEAF. Explains
WHY PolarQuant-only splitting hurts (P7 result): the coupling between
direction and magnitude IS the information at HEEL/HIP level.

## P8: Cronbach's α codec bench — psychometric measurement model

Reframes the R&D bench from "horse race" to "psychometric instrument
validation." Codec candidates are test items; we measure internal
consistency (α) to discover factor structure.

### Epiphany × population correlation matrix

Cross-tabulates every invariant (I1-I9) and probe finding (P1-P7)
against 6 data populations: attention k_proj, MLP gate, vocab
embedding, Jina v5 output, audio codec embeddings, BGE-M3 output.
Each cell predicts what should happen if the invariant holds on
that population. The bench FILLS the cells.

### Populations chosen for cross-validation

Different distribution signatures (near-orthogonal vs unit-normalized
vs vocab-sparse vs SiLU-gated vs discrete-latent) ensure the factor
structure is real, not artifact of one tensor's shape.

### Metrics

9 metrics per (codec × population) cell. 4 already in
bgz_tensor::quality (pearson, spearman, top_k_recall, mae/rmse).
4 NEW to implement (Cronbach's α, Cohen's κ, bias, ICC).

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
---
 docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md | 85 ++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md b/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md
index ab3e9bc8..dc62fe6a 100644
--- a/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md
+++ b/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md
@@ -383,3 +383,88 @@ This means the session's forward menu narrows to:
 Next session starts here.
 
 https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
+
+## I9. BF17 shapeshifting — same bits, different semantics per hierarchy level
+
+The same 17-bit (or 16-bit) wire width carries different constructs at
+different HHTL levels:
+
+| Level | Bit interpretation | Construct measured |
+|---|---|---|
+| HEEL (coarse) | BF17 float → golden-step fold → Base17 address | Joint direction+magnitude opinion (P7 confirmed: don't split) |
+| HIP (family) | 4-bit sub-cluster within basin | Partition of opinion space (Base17 L1 > PolarQuant-only per P7) |
+| LEAF (fine) | 8 × i8 signed = PolarQuant coefficients on JLQ basis | Residual direction in orthogonal space |
+
+The "shapeshifting" is: exponent bits at HEEL become direction bits at
+LEAF; mantissa bits at HEEL become magnitude bits at LEAF. Same wire
+budget, different semantic load. This is why separating direction from
+magnitude before clustering (PolarQuant-only) HURTS — the coupling IS
+the information at the HEEL/HIP level.
+
+## P8. Cronbach's α codec bench — psychometric measurement model
+
+### Framing
+
+Codec candidates are NOT alternatives to eliminate. They are **test items
+on a psychometric instrument**. We measure internal consistency (α) to
+discover which items measure the SAME construct and which measure
+DIFFERENT constructs.
+
+### Decision logic
+
+| α result | Interpretation | Action |
+|---|---|---|
+| α ≥ 0.85 within a codec subset | Those codecs are REDUNDANT (measure same construct) | Keep the cheapest |
+| α < 0.70 between two codecs | They measure DIFFERENT constructs | Both informative, keep for different regimes |
+| Removing one item RAISES α | That item is NOISY | Investigate WHY before discarding |
+
+### Epiphany × population correlation matrix
+
+Each epiphany (E/I) predicts a specific pattern across data populations.
+The bench validates whether the prediction holds:
+
+| Epiphany | Prediction | Attention k_proj | MLP gate | Vocab embed | Jina v5 | Audio codec |
+|---|---|---|---|---|---|---|
+| I1 (two regimes) | argmax vs index regimes need different codecs | argmax ✓ | argmax ✓ | INDEX ✗ | argmax ✓ | index? |
+| I2 (near-orthogonal) | single-centroid fails on high-dim rows | ✗ (1024-d) | ✗ (1024-d) | ✗✗ (2048-d) | ?(1024-d) | ?(1024-d) |
+| I3 (direction ≠ magnitude) | scalar residual can't fix direction | ✗ | ✗ | ✗ | ? | ? |
+| I7 (location vs sparse-signal) | two-factor structure in codec α | location factor | location factor | index factor | ? | ? |
+| I8 (layered HEEL+HIP+LEAF) | location + JLQ beats either alone | probe needed | probe needed | passthrough wins | probe needed | probe needed |
+| I9 (BF17 shapeshifting) | joint dir+mag > separated at HEEL/HIP | P7 confirmed | probe needed | N/A (index) | probe needed | probe needed |
+| P5 (chain collapse) | single-layer OK, 33-layer chain ρ→0 | ✓ measured | extrapolated | N/A | ? | ? |
+| P7 (PolarQuant HIP) | direction-only families worse than joint | ✓ measured (-9%) | ? | N/A | ? | ? |
+
+### Populations (data types for cross-validation)
+
+| Population | Regime | Dim | Source | Distribution signature |
+|---|---|---|---|---|
+| Attention k_proj | argmax | 1024 | Qwen3-TTS-0.6B | near-orthogonal, small magnitude range |
+| MLP gate_proj | argmax | 1024 | Qwen3-TTS-0.6B | SiLU-gated, bimodal |
+| Text embedding | index | 2048 | Qwen3-TTS-0.6B | vocab-sized, sparse usage |
+| Jina v5 output | argmax | 1024 | jina-v5-onnx/ on disk | semantic similarity, unit-normalized |
+| Audio codec emb | index | 1024 | Qwen3-TTS-0.6B (×15) | RVQ codebook, discrete latent |
+| BGE-M3 output | argmax | 1024 | bge-m3 crate | multilingual, SentencePiece |
+
+### Metrics per (codec × population) cell
+
+| Metric | What it measures | Tool |
+|---|---|---|
+| Pearson r | linear score correlation vs ground truth | `bgz_tensor::quality::pearson` |
+| Spearman ρ | rank correlation | `bgz_tensor::quality::spearman` |
+| Top-1/5/10 NN recall | argmax preservation | `bgz_tensor::quality::top_k_recall` |
+| MAE / RMSE | score estimate error | `bgz_tensor::quality::mae/rmse` |
+| Cronbach's α (inter-codec) | internal consistency across codec "items" | NEW — implement |
+| Cohen's κ (top-1 agreement) | agreement above chance for argmax | NEW — implement |
+| Bias (mean signed error) | systematic over/under-estimation | NEW — implement |
+| ICC(3,1) | intraclass correlation, consistency | NEW — implement |
+
+### Deliverable
+
+One table per population, all codecs as rows, all metrics as columns.
+Cross-population α matrix showing which codecs generalize vs which are
+population-specific. Factor analysis identifying the location/sparse-signal
+two-factor structure (or refuting it).
+
+This is P6 from earlier, now with the psychometric measurement model and
+the epiphany × population correlation matrix that ties it back to every
+lesson this session learned.

From 9a0adbcc0db7966dc3ef3486f403e3aaf6ad3eb5 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 17 Apr 2026 13:34:12 +0000
Subject: [PATCH 6/6] =?UTF-8?q?docs:=20P9=20mixed=20bit-width=20per=20HHTL?=
 =?UTF-8?q?=20level=20=E2=80=94=20resolution=20as=20bench=20variable?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
---
 docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md | 30 ++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md b/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md
index dc62fe6a..4775403b 100644
--- a/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md
+++ b/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md
@@ -468,3 +468,33 @@ two-factor structure (or refuting it).
 This is P6 from earlier, now with the psychometric measurement model and
 the epiphany × population correlation matrix that ties it back to every
 lesson this session learned.
+
+### P9. Resolution as a variable — mixed bit-width per HHTL level
+
+The bit width per level is NOT fixed. The bench should test whether
+wider HIP address (finer opinion bins → tighter compartments) reduces
+residual variance enough that the LEAF can be shorter. Total bits/row
+could DROP despite wider address.
+
+Variants to test (same tensor, same population matrix):
+
+| Variant | Address | Leaf | Total bits/row |
+|---|---|---|---|
+| I8-baseline | 2+4=6b | 8×i8=64b | 70 |
+| Wide-HIP | 2+8=10b | 8×i8=64b | 74 |
+| Mixed-leaf | 2+4=6b | 4×i16+4×i8=96b | 102 |
+| Compact-leaf | 2+4=6b | 4×i8=32b | 38 |
+| BF16-leaf | 2+4=6b | 2×BF16=32b | 38 |
+| Stacked-Matryoshka | 2+4=6b | 2×i16+2×i8+2×i4+2×i2=52b | 58 |
+
+Key question: does wider HIP × shorter leaf beat narrow HIP × longer
+leaf at the SAME total bit budget?
+
+The Matryoshka module (bgz-tensor/src/matryoshka.rs) already has the
+4-band design (BandPrecision::I16/I8/I4/I2 per SVD energy ordering).
+Wiring it into the leaf encoding is a composition, not a new primitive.
+
+This connects I9 (BF17 shapeshifting): the optimal encoding per level
+might be BF17-like mixed precision — high-energy dims get i16 mantissa,
+low-energy dims get i2. Same total wire budget, but information-weighted
+allocation across the Matryoshka bands.