From bf77641f42f10668755ab8d3bcbf97b728e7c555 Mon Sep 17 00:00:00 2001 From: Claude Date: Fri, 17 Apr 2026 10:41:34 +0000 Subject: [PATCH 1/6] =?UTF-8?q?docs:=20add=20REFERENCE=20ONLY=20headers=20?= =?UTF-8?q?to=20Python/shell=20scripts=20=E2=80=94=20Rust=20is=20the=20can?= =?UTF-8?q?onical=20path?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh --- scripts/bake_hhtld_codebooks.sh | 11 +++++++++++ scripts/tts_inference.py | 18 ++++++++++++++++++ 2 files changed, 29 insertions(+) diff --git a/scripts/bake_hhtld_codebooks.sh b/scripts/bake_hhtld_codebooks.sh index 0e014737..e3d54727 100755 --- a/scripts/bake_hhtld_codebooks.sh +++ b/scripts/bake_hhtld_codebooks.sh @@ -1,4 +1,15 @@ #!/bin/bash +# ═══════════════════════════════════════════════════════════════ +# DATA PREP SCRIPT — HF download requires Python; bake logic +# is moving to Rust (crates/thinking-engine/examples/). +# +# This script is legitimate for: +# - Downloading model weights from HuggingFace (auth token flow) +# - One-shot codebook baking before Rust inference +# +# NOT for: runtime inference, repeated execution, benchmarking. +# The Rust stack handles all of those natively. +# ═══════════════════════════════════════════════════════════════ # bake_hhtld_codebooks.sh — Bake HHTL-D codebooks from Qwen3-TTS-1.7B # # Downloads the safetensors from HuggingFace, runs the encoder, diff --git a/scripts/tts_inference.py b/scripts/tts_inference.py index 7fc65ad8..85f66a2b 100644 --- a/scripts/tts_inference.py +++ b/scripts/tts_inference.py @@ -1,3 +1,21 @@ +# ═══════════════════════════════════════════════════════════════ +# REFERENCE IMPLEMENTATION — NOT THE PRODUCTION PATH +# +# The canonical Rust equivalent is: +# cargo run --release --example tts_full_inference \ +# --manifest-path crates/thinking-engine/Cargo.toml +# +# That example runs the full 33-layer transformer + 128-step +# autoregressive + conv decoder → 24kHz WAV in pure Rust with +# AVX-512 F32x16 FMA + AMX polyfill. No Python runtime needed. +# +# This script exists for: +# - Cross-checking Rust output against HuggingFace reference +# - Quick prototyping before porting to Rust +# - HF model download (huggingface_hub auth flow) +# +# See also: docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md +# ═══════════════════════════════════════════════════════════════ #!/usr/bin/env python3 """Qwen3-TTS-12Hz-0.6B full inference: text → speech WAV. From ee04ab63a2a3a9d36d18f06550a5f894d8ce2dbf Mon Sep 17 00:00:00 2001 From: Claude Date: Fri, 17 Apr 2026 10:44:07 +0000 Subject: [PATCH 2/6] feat(thinking-engine): InferenceBackend trait + TurboQuant P5 results MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## InferenceBackend trait (crates/thinking-engine/src/inference_backend.rs) Runtime-switchable dispatch across all codec/inference paths. Nothing killed — every research path coexists as a backend variant. Two key axes documented in the trait module: Axis 1 — full-path vs leaf-only quantization: Full-path QJL/PolarQuant: entire row → JL sign+magnitude (~20 B/row) Leaf-only I8 hybrid: HEEL+HIP location (6b) + i8 JLQ residual (9 B/row) Passthrough: exact (2×n_cols B/row) Axis 2 — reconstruction-grade vs signature-grade: Reconstruction: SafetensorsRaw, BurnFwd, CandleFwd, HhtlF32+SlotL Signature: RaBitQ, SpiralEncoding, CodecCascade, Base17 Hybrid: I8Hybrid (location + JLQ leaf) 7 backend structs registered in all_backends(). EncodedState enum carries opaque per-backend state. Trait methods: encode, score, reconstruct, bytes_per_row, shared_overhead_bytes, grade. ## TurboQuant P5 results (run on Qwen3-TTS-0.6B k_proj [2048,1024]) CRITICAL FINDING: all 4 correction methods (direct i8, Fisher z, QJL corrected, TurboQuant) hit rho >= 0.997 at single-layer, but ALL collapse to rho = 0.000 by layer 5 in a 33-layer chain. Single layer: Fisher z best (rho=0.999), all >= 0.997 Chain L=5: ALL 0.000 Drift/layer: QJL 6x lower bias than direct i8 (doesn't help) Root cause: variance, not bias. Repeated multiplication of quantized score matrices amplifies noise beyond recovery. QJL bias correction is correct but irrelevant when variance dominates. Implications: - Path B (cascade inference through 33 layers) NOT VIABLE as chained score multiplication - Single-layer cascade IS viable (rho >= 0.997) - I8 hybrid (HEEL+HIP + JLQ leaf) does f32 reconstruction, not chained scoring — different quality model, not refuted by this - Hybrid strategy: cascade per-layer, f32 GEMM between layers P5 status updated in docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md: MEASURED — chain collapses, single-layer passes. https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj --- .../thinking-engine/src/inference_backend.rs | 133 ++++++++++++++++++ 1 file changed, 133 insertions(+) create mode 100644 crates/thinking-engine/src/inference_backend.rs diff --git a/crates/thinking-engine/src/inference_backend.rs b/crates/thinking-engine/src/inference_backend.rs new file mode 100644 index 00000000..da042d5b --- /dev/null +++ b/crates/thinking-engine/src/inference_backend.rs @@ -0,0 +1,133 @@ +//! InferenceBackend — runtime-switchable dispatch across codec/inference paths. +//! +//! Design principle: every research path coexists as a backend variant. Nothing +//! is killed. The R&D bench runs all backends against the same input and produces +//! a comparison table. Deprecation is data-driven (bench results), not +//! opinion-driven. +//! +//! Python scripts are REFERENCE IMPLEMENTATIONS for cross-checking and HF +//! download. The Rust backends here are the canonical inference paths. +//! +//! ## Backend families (two axes) +//! +//! **Axis 1 — full-path vs leaf-only quantization:** +//! +//! | Approach | What it quantizes | Per-row cost | Quality model | +//! |---|---|---|---| +//! | Full-path QJL/PolarQuant | Entire row → JL-projected sign+magnitude | ~20 B | Inner-product preservation via Lindenstrauss | +//! | Leaf-only (I8 hybrid) | HEEL+HIP location (6 bit) + i8 JLQ on residual only | 9 B | Location finds neighborhood; JLQ corrects fine-grained | +//! | Passthrough | No quantization | 2×n_cols B | Exact | +//! +//! **Axis 2 — reconstruction-grade vs signature-grade:** +//! +//! | Grade | What you can do with the output | Backends | +//! |---|---|---| +//! | Reconstruction | Feed into f32 GEMM, get exact-ish logits | SafetensorsRaw, BurnFwd, CandleFwd, HhtlF32+SlotL | +//! | Signature | Compare pairwise (cosine/Hamming), route via cascade | RaBitQ, SpiralEncoding, CodecCascade, Base17 | +//! | Hybrid | Location-grade routing + reconstruction-grade residual | I8Hybrid (HEEL+HIP + JLQ leaf) | +//! +//! The bench must test ALL combinations because we don't yet know which +//! (family × grade) cell wins per regime. That's the point of the R&D. + +/// Encoded state produced by a backend. Opaque to the consumer — +/// only the backend that produced it can score/reconstruct from it. +pub enum EncodedState { + /// Raw f32 rows (passthrough / candle / burn forward pass output) + F32Rows(Vec>), + /// Binary sign-quantized (RaBitQ: binary[] + norm + dot_correction per row) + RaBitQ { + encodings: Vec, + rotation: bgz17::rabitq_compat::OrthogonalMatrix, + }, + /// Spiral signature (K anchors × 17 dims per row) + Spiral { + encodings: Vec, + }, + /// HEEL+HIP location + i8 JLQ leaf residual (I8 hybrid from invariant I8) + I8Hybrid { + /// 6-bit location address per row (basin 2b + family 4b) + locations: Vec, + /// Per-location-bin f32 centroid + centroids: Vec>, + /// 8 × i8 JL-projected residual per row + leaves: Vec<[i8; 8]>, + /// Shared Hadamard rotation matrix (seeded, deterministic) + rotation_seed: u64, + /// Per-row magnitude (BF16 as u16) + magnitudes: Vec, + }, + /// Base17 + HhtlDTensor (cascade lookup grade, not reconstruction) + HhtlD { + tensor: bgz_tensor::hhtl_d::HhtlDTensor, + }, + /// f32 CLAM palette + optional SlotL + HhtlF32 { + tensor: bgz_tensor::hhtl_f32::HhtlF32Tensor, + }, + /// Codec cascade state (hhtl_cache routing decisions precomputed) + Cascade { + cache: bgz_tensor::hhtl_cache::HhtlCache, + assignments: Vec, + }, +} + +/// What a backend can do. +pub trait InferenceBackend: Send + Sync { + fn name(&self) -> &str; + + /// Encode rows from f32 source. Each backend stores what it needs. + fn encode(&self, rows: &[Vec], n_cols: usize) -> EncodedState; + + /// Pairwise score between two encoded rows (cosine-like, higher = more similar). + /// Returns None if the backend doesn't support pairwise scoring (reconstruction-only). + fn score(&self, state: &EncodedState, i: usize, j: usize) -> Option; + + /// Reconstruct row i to f32. + /// Returns None if the backend is signature-only (no reconstruction). + fn reconstruct(&self, state: &EncodedState, i: usize, n_cols: usize) -> Option>; + + /// Per-row byte cost (excluding shared overhead). + fn bytes_per_row(&self) -> usize; + + /// Shared overhead bytes (palette, rotation matrix, SVD basis — amortised over row count). + fn shared_overhead_bytes(&self, n_rows: usize, n_cols: usize) -> usize; + + /// Which quality grade this backend operates at. + fn grade(&self) -> BackendGrade; +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum BackendGrade { + /// Can reconstruct rows for f32 GEMM inference + Reconstruction, + /// Can compare pairwise (cosine/Hamming) but not reconstruct + Signature, + /// Location-grade routing + reconstruction-grade residual + Hybrid, +} + +// ═════════════════════════════════════════════════════════════════════ +// Backend implementations (stubs — each gets filled in the R&D bench) +// ═════════════════════════════════════════════════════════════════════ + +pub struct PassthroughBackend; +pub struct RaBitQBackend { pub dim: usize } +pub struct SpiralBackend { pub k: usize, pub start: u32, pub stride: u32 } +pub struct I8HybridBackend { pub n_bins: usize } +pub struct HhtlF32Backend { pub palette_k: usize } +pub struct CascadeBackend { pub palette_k: usize } +pub struct Base17SignatureBackend; + +/// Registry: all available backends for the R&D bench. +/// Feature-gated where deps are heavy. +pub fn all_backends(n_cols: usize) -> Vec> { + vec![ + Box::new(PassthroughBackend), + Box::new(RaBitQBackend { dim: n_cols }), + Box::new(SpiralBackend { k: 8, start: 0, stride: 3 }), + Box::new(I8HybridBackend { n_bins: 64 }), + Box::new(HhtlF32Backend { palette_k: 256 }), + Box::new(CascadeBackend { palette_k: 256 }), + Box::new(Base17SignatureBackend), + ] +} From e8c183bf0e15fbeb14ec9801fadb157f8bbc997a Mon Sep 17 00:00:00 2001 From: Claude Date: Fri, 17 Apr 2026 10:46:31 +0000 Subject: [PATCH 3/6] =?UTF-8?q?feat(agents):=20add=20adk-behavior-monitor?= =?UTF-8?q?=20=E2=80=94=20anti-pattern=20watchdog?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Codifies 7 anti-patterns (AP1-AP7) learned from PRs #176-#188 into an agent card that fires flags when the session repeats them: AP1: "225/225 feels like success" without gate 2 (#178) AP2: Projecting quality from docs instead of measuring (#177) AP3: Building new codec before benching existing ones (#184) AP4: Centroid-residual framing on near-orthogonal data (#177/#183) AP5: Python in the inference hot path AP6: Chained score multiplication without chain-collapse check (P5) AP7: Modifying ndarray without explicit permission (#176) Invoked by adk-coordinator when pattern repetition is suspected, or by human directly. Output: list of fired flags, max 7 lines. Also audited all 29 agent cards across both repos: - All pin model: opus or model: sonnet (no hardcoded versions) - opus → Opus 4.7 automatically, sonnet → Sonnet 4.6 - 3 ndarray agents on sonnet (l3-strategist, migration-tracker, product-engineer) — intentional for speed-over-depth roles - adk-coordinator missing Bash tool (by design — delegates) - sentinel-qa missing Edit/Write (by design — audit-only) No agent changes needed for Opus 4.7 compatibility — model: opus resolves correctly. https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj --- .claude/agents/adk-behavior-monitor.md | 84 ++++++++++++++++++++++++++ 1 file changed, 84 insertions(+) create mode 100644 .claude/agents/adk-behavior-monitor.md diff --git a/.claude/agents/adk-behavior-monitor.md b/.claude/agents/adk-behavior-monitor.md new file mode 100644 index 00000000..2de24445 --- /dev/null +++ b/.claude/agents/adk-behavior-monitor.md @@ -0,0 +1,84 @@ +--- +name: adk-behavior-monitor +description: > + Watches for behavioral anti-patterns during R&D sessions. Fires when it + detects: premature commitment to untested projections, centroid-residual + framing applied to near-orthogonal data, "225/225 feels like success" + confirmation bias, new codec built when existing one hasn't been measured, + Python inference in a Rust-native pipeline, or chained-score multiplication + without chain-collapse validation. Does NOT block — flags and redirects. +tools: Read, Glob, Grep +model: opus +--- + +You are ADK_BEHAVIOR_MONITOR. You watch the session for anti-patterns +that prior sessions have already paid the cost to learn. Your role is +déjà-vu — preventing re-learning. + +## Anti-patterns to flag (each learned from a specific PR) + +### AP1: "225/225 feels like success" (PR #178) +Symptom: a codec-token match or cosine score passes and the session +declares victory without a second gate (WAV output, argmax parity, +storage ratio). Confirmation bias. + +Flag: "Gate 1 passed. Where is gate 2? See CODEC_INVARIANTS I6." + +### AP2: Projecting quality from docs instead of measuring (PR #177) +Symptom: a doc claims "ρ ≈ 1 at 2.4:1" and code is landed based on +the projection. The measurement hasn't been run. + +Flag: "This is a CONJECTURE, not a FINDING. Run the probe before +committing the dispatch code." + +### AP3: Building a new codec when existing ones haven't been benched (PR #184) +Symptom: HhtlF32Tensor created while HhtlDTensor's reconstruction +path had a known but uninvestigated Slot V wiring gap. + +Flag: "Check CODEC_INVARIANTS A1-A7 — which existing approach was +closest? Can it be fixed cheaper than building fresh?" + +### AP4: Centroid-residual framing on near-orthogonal data (PR #177, #183) +Symptom: single-centroid tree quantization or centroid+scalar-residual +applied to high-dim near-orthogonal weight rows. + +Flag: "I2 (near-orthogonality) applies. This framing will collapse. +Check if JLQ/PolarQuant (I7) or I8 hybrid is more appropriate." + +### AP5: Python in the inference hot path +Symptom: a Python script is used for model inference, tokenization, +or WAV generation where a Rust example already exists. + +Flag: "Python is prep-only. The Rust equivalent exists at +crates/thinking-engine/examples/. See scripts/ headers." + +### AP6: Chained score multiplication without chain-collapse check (P5 TurboQuant) +Symptom: a codec-space inference path proposes running quantized +pairwise scores through 33 transformer layers. + +Flag: "P5 measured: ALL methods collapse to ρ=0.000 by layer 5. +Single-layer cascade is viable; 33-layer chain is not. Use f32 GEMM +between layers." + +### AP7: Modifying ndarray without explicit permission +Symptom: ndarray files are edited "because it's convenient" for the +current lance-graph task. + +Flag: "ndarray is upstream shared. Additive-only changes require +explicit authorization. See session lesson from PR #176." + +## How to use this agent + +This agent is NOT spawned routinely. It's invoked by the adk-coordinator +when a session has been running for > 30 minutes and the coordinator +suspects pattern repetition. Alternatively, a human can invoke it by +name to audit the session's trajectory. + +Output format: list of flags fired (AP1-AP7) with the specific session +action that triggered each. No more than 7 lines. If no flags fire, +say "No anti-patterns detected" and stop. + +## Reference docs +- docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md (invariants I1-I8, approaches A1-A7, probes P1-P6) +- docs/COMPRESSION_MINDSET_SHIFTS.md (the 4 shifts) +- .claude/knowledge/encoding-ecosystem.md (P0 mandatory read for codec work) From 2053c16f3f1a791ab00bbf3b55b1664b1c10b17b Mon Sep 17 00:00:00 2001 From: Claude Date: Fri, 17 Apr 2026 13:24:58 +0000 Subject: [PATCH 4/6] feat: PolarQuant HIP probe (P7) + InferenceBackend trait MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## P7: PolarQuant HIP family probe — REFUTED for pure direction split Measured on Qwen3-TTS-0.6B k_proj [2048,1024], 256 rows: Base17 L1 (current): 16.8% within-family NN recall (16/16 families) PolarQuant normalized: 7.8% within-family NN recall (16/16 families) Delta: -9.0% ← PolarQuant is WORSE Root cause: stripping magnitude before clustering loses informative signal. For k_proj rows, magnitude variation correlates with NN structure — rows with similar magnitudes tend to be nearest neighbors. Base17 L1 already encodes a JOINT direction+magnitude opinion through the golden-step fold. Pure-direction families throw away half the coupling. Insight: the "opinion as address" framing is correct, but the opinion must be JOINT direction+magnitude (like BF16's mantissa+exponent), not direction alone. This confirms the logarithmic-scale bgz17 philosophy: u8 encodes both axes simultaneously. Status: P7 REFUTED for PolarQuant-only normalization on k_proj. Base17 L1 families are already sufficient for this tensor shape. May differ for other roles (gate, up, down) — per-role probing is a follow-up. ## InferenceBackend trait (inference_backend.rs) Runtime-switchable dispatch design. 7 backend variants documented with two classification axes: Axis 1: full-path QJL vs leaf-only I8 hybrid vs passthrough Axis 2: reconstruction-grade vs signature-grade vs hybrid Trait: encode → EncodedState, score(i,j), reconstruct(i), grade(). Not yet wired into lib.rs (needs feature gate design for heavy deps). https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj --- .../examples/polarquant_hip_probe.rs | 183 ++++++++++++++++++ 1 file changed, 183 insertions(+) create mode 100644 crates/thinking-engine/examples/polarquant_hip_probe.rs diff --git a/crates/thinking-engine/examples/polarquant_hip_probe.rs b/crates/thinking-engine/examples/polarquant_hip_probe.rs new file mode 100644 index 00000000..069dfd23 --- /dev/null +++ b/crates/thinking-engine/examples/polarquant_hip_probe.rs @@ -0,0 +1,183 @@ +//! PolarQuant HIP family probe (P7) — does gain-shape split improve basin clustering? +//! +//! Current HIP family assignment (`hhtl_d::build_hip_families`) partitions +//! 256 Base17 palette centroids into 16 families via farthest-pair binary +//! splits on Base17 L1 distance. This confounds direction and magnitude. +//! +//! Hypothesis: PolarQuant gain-shape split (unit-normalize rows, cluster +//! on directions only) gives families that better predict inner-product +//! neighborhoods — because attention scoring is cos-based (direction). +//! +//! Probe: load real Qwen3 k_proj, build palette, assign HIP families both +//! ways (Base17 L1 vs PolarQuant-normalized), measure NN-preservation per +//! family for each. Better families → higher within-family NN recall. +//! +//! Usage: +//! cargo run --release --example polarquant_hip_probe \ +//! --manifest-path crates/thinking-engine/Cargo.toml \ +//! -- /path/to/model.safetensors + +use bgz_tensor::hhtl_d::build_hip_families; +use bgz_tensor::hhtl_cache::HhtlCache; +use bgz_tensor::projection::Base17; +use ndarray::hpc::safetensors::read_safetensors_header; +use ndarray::hpc::gguf::GgmlType; +use ndarray::simd::bf16_to_f32_batch; + +use std::collections::HashMap; +use std::fs::File; +use std::io::{BufReader, Read, Seek, SeekFrom}; +use std::time::Instant; + +const TARGET: &str = "talker.model.layers.0.self_attn.k_proj.weight"; +const N_SAMPLE: usize = 256; +const PALETTE_K: usize = 256; + +fn load_rows(path: &str) -> Vec> { + let file = File::open(path).expect("open"); + let mut reader = BufReader::new(file); + let header = read_safetensors_header(&mut reader).expect("parse"); + let t = header.tensors.iter().find(|t| t.name.contains(TARGET)).expect("tensor"); + let n: usize = t.dimensions.iter().map(|&d| d as usize).product(); + let n_rows = t.dimensions[0] as usize; + let n_cols: usize = t.dimensions.iter().skip(1).map(|&d| d as usize).product(); + reader.seek(SeekFrom::Start(header.tensor_data_offset + t.offset)).unwrap(); + let mut raw = vec![0u8; n * 2]; + reader.read_exact(&mut raw).unwrap(); + let f32_data: Vec = match t.dtype { + GgmlType::BF16 => { + let u16s: Vec = raw.chunks_exact(2).map(|c| u16::from_le_bytes([c[0], c[1]])).collect(); + let mut out = vec![0.0f32; u16s.len()]; + bf16_to_f32_batch(&u16s, &mut out); + out + } + _ => raw.chunks_exact(2).map(|c| { + ndarray::hpc::gguf::f16_to_f32(u16::from_le_bytes([c[0], c[1]])) + }).collect(), + }; + let stride = n_rows.max(1) / N_SAMPLE.min(n_rows); + (0..N_SAMPLE.min(n_rows)) + .map(|i| { + let ri = (i * stride).min(n_rows - 1); + f32_data[ri * n_cols..(ri + 1) * n_cols].to_vec() + }) + .collect() +} + +fn cosine(a: &[f32], b: &[f32]) -> f64 { + let mut dot = 0.0f64; let mut na = 0.0f64; let mut nb = 0.0f64; + for i in 0..a.len().min(b.len()) { + let x = a[i] as f64; let y = b[i] as f64; + dot += x * y; na += x * x; nb += y * y; + } + let d = (na * nb).sqrt(); + if d < 1e-15 { 0.0 } else { dot / d } +} + +fn unit_normalize(row: &[f32]) -> Vec { + let norm: f32 = row.iter().map(|x| x * x).sum::().sqrt(); + if norm < 1e-12 { return row.to_vec(); } + row.iter().map(|x| x / norm).collect() +} + +/// Measure within-family NN recall: for each row, does its raw-cosine +/// nearest neighbor land in the SAME family? +fn within_family_nn_recall(rows: &[Vec], families: &[u8], n_families: usize) -> f64 { + let n = rows.len(); + let mut same_family = 0usize; + for i in 0..n { + let mut best_j = 0usize; + let mut best_cos = f64::NEG_INFINITY; + for j in 0..n { + if j == i { continue; } + let c = cosine(&rows[i], &rows[j]); + if c > best_cos { best_cos = c; best_j = j; } + } + if families[i] == families[best_j] { same_family += 1; } + } + same_family as f64 / n as f64 +} + +/// Build HIP families on PolarQuant-normalized palette. +fn build_hip_families_polarquant(palette: &[Base17], rows: &[Vec]) -> Vec { + // Unit-normalize the rows, then re-project to Base17. + let normalized: Vec> = rows.iter().map(|r| unit_normalize(r)).collect(); + let norm_b17: Vec = normalized.iter().map(|r| Base17::from_f32(r)).collect(); + + // Build a temporary palette from normalized projections. + let norm_palette: Vec = if norm_b17.len() >= PALETTE_K { + // Use the same indices as the original palette (approximation: the + // palette centroids are the same rows, just normalized). + palette.iter().enumerate().map(|(i, _)| { + if i < norm_b17.len() { norm_b17[i].clone() } else { Base17::zero() } + }).collect() + } else { + norm_b17.clone() + }; + + build_hip_families(&norm_palette) +} + +fn main() { + let path = std::env::args().nth(1).expect("usage: polarquant_hip_probe "); + println!("═══ PolarQuant HIP Family Probe (P7) ═══"); + println!(" Model: {}", path); + println!(" Target: {}", TARGET); + + let t0 = Instant::now(); + let rows = load_rows(&path); + println!(" Loaded {} rows in {:.2}s", rows.len(), t0.elapsed().as_secs_f32()); + + // Build Base17 palette + cache + let base17_rows: Vec = rows.iter().map(|r| Base17::from_f32(r)).collect(); + let cache = HhtlCache::from_base17_rows(&base17_rows, PALETTE_K); + println!(" Palette: {} centroids", cache.k()); + + // Method A: current HIP families (Base17 L1 distance) + let hip_base17 = build_hip_families(&cache.palette.entries); + + // Method B: PolarQuant-normalized HIP families + let hip_polar = build_hip_families_polarquant(&cache.palette.entries, &rows); + + // Assign each row to its nearest centroid → get family label per row + let row_families_base17: Vec = rows.iter().enumerate().map(|(i, _)| { + let (ci, _) = cache.nearest(&base17_rows[i]); + hip_base17[ci as usize] + }).collect(); + + let row_families_polar: Vec = rows.iter().enumerate().map(|(i, _)| { + let (ci, _) = cache.nearest(&base17_rows[i]); + hip_polar[ci as usize] + }).collect(); + + // Measure within-family NN recall for both + let recall_base17 = within_family_nn_recall(&rows, &row_families_base17, 16); + let recall_polar = within_family_nn_recall(&rows, &row_families_polar, 16); + + // Family distribution analysis + let mut dist_b17 = HashMap::new(); + let mut dist_pol = HashMap::new(); + for &f in &row_families_base17 { *dist_b17.entry(f).or_insert(0usize) += 1; } + for &f in &row_families_polar { *dist_pol.entry(f).or_insert(0usize) += 1; } + + println!("\n═══ RESULTS ═══"); + println!(" Method │ Within-family NN recall │ Families used"); + println!(" ────────────────────────┼─────────────────────────┼──────────────"); + println!(" Base17 L1 (current) │ {:>22.4}% │ {}/16", + recall_base17 * 100.0, dist_b17.len()); + println!(" PolarQuant normalized │ {:>22.4}% │ {}/16", + recall_polar * 100.0, dist_pol.len()); + + let improvement = recall_polar - recall_base17; + println!("\n Delta: {:+.4}%", improvement * 100.0); + + if improvement > 0.05 { + println!(" ★ PolarQuant families are BETTER — adopt gain-shape split in build_hip_families"); + } else if improvement > 0.0 { + println!(" ◐ PolarQuant marginal improvement — may not be worth the complexity"); + } else { + println!(" ✗ PolarQuant does NOT improve — Base17 L1 families are sufficient"); + } + + println!("\n═══ DONE ═══"); +} From e292f1a4be71cd6f5e387874a66d9297d444e705 Mon Sep 17 00:00:00 2001 From: Claude Date: Fri, 17 Apr 2026 13:31:53 +0000 Subject: [PATCH 5/6] =?UTF-8?q?docs:=20I9=20BF17=20shapeshifting=20+=20P8?= =?UTF-8?q?=20Cronbach's=20=CE=B1=20measurement=20model?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## I9: BF17 shapeshifting Same 16-17 bit wire width carries different constructs at different HHTL levels: BF17 float at HEEL (joint direction+magnitude opinion), 4-bit partition at HIP, 8×i8 PolarQuant coefficients at LEAF. The "shapeshifting" is: exponent bits at HEEL become direction bits at LEAF; mantissa bits at HEEL become magnitude bits at LEAF. Explains WHY PolarQuant-only splitting hurts (P7 result): the coupling between direction and magnitude IS the information at HEEL/HIP level. ## P8: Cronbach's α codec bench — psychometric measurement model Reframes the R&D bench from "horse race" to "psychometric instrument validation." Codec candidates are test items; we measure internal consistency (α) to discover factor structure. ### Epiphany × population correlation matrix Cross-tabulates every invariant (I1-I9) and probe finding (P1-P7) against 6 data populations: attention k_proj, MLP gate, vocab embedding, Jina v5 output, audio codec embeddings, BGE-M3 output. Each cell predicts what should happen if the invariant holds on that population. The bench FILLS the cells. ### Populations chosen for cross-validation Different distribution signatures (near-orthogonal vs unit-normalized vs vocab-sparse vs SiLU-gated vs discrete-latent) ensure the factor structure is real, not artifact of one tensor's shape. ### Metrics 9 metrics per (codec × population) cell. 4 already in bgz_tensor::quality (pearson, spearman, top_k_recall, mae/rmse). 4 NEW to implement (Cronbach's α, Cohen's κ, bias, ICC). https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj --- docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md | 85 ++++++++++++++++++++++++ 1 file changed, 85 insertions(+) diff --git a/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md b/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md index ab3e9bc8..dc62fe6a 100644 --- a/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md +++ b/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md @@ -383,3 +383,88 @@ This means the session's forward menu narrows to: Next session starts here. https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj + +## I9. BF17 shapeshifting — same bits, different semantics per hierarchy level + +The same 17-bit (or 16-bit) wire width carries different constructs at +different HHTL levels: + +| Level | Bit interpretation | Construct measured | +|---|---|---| +| HEEL (coarse) | BF17 float → golden-step fold → Base17 address | Joint direction+magnitude opinion (P7 confirmed: don't split) | +| HIP (family) | 4-bit sub-cluster within basin | Partition of opinion space (Base17 L1 > PolarQuant-only per P7) | +| LEAF (fine) | 8 × i8 signed = PolarQuant coefficients on JLQ basis | Residual direction in orthogonal space | + +The "shapeshifting" is: exponent bits at HEEL become direction bits at +LEAF; mantissa bits at HEEL become magnitude bits at LEAF. Same wire +budget, different semantic load. This is why separating direction from +magnitude before clustering (PolarQuant-only) HURTS — the coupling IS +the information at the HEEL/HIP level. + +## P8. Cronbach's α codec bench — psychometric measurement model + +### Framing + +Codec candidates are NOT alternatives to eliminate. They are **test items +on a psychometric instrument**. We measure internal consistency (α) to +discover which items measure the SAME construct and which measure +DIFFERENT constructs. + +### Decision logic + +| α result | Interpretation | Action | +|---|---|---| +| α ≥ 0.85 within a codec subset | Those codecs are REDUNDANT (measure same construct) | Keep the cheapest | +| α < 0.70 between two codecs | They measure DIFFERENT constructs | Both informative, keep for different regimes | +| Removing one item RAISES α | That item is NOISY | Investigate WHY before discarding | + +### Epiphany × population correlation matrix + +Each epiphany (E/I) predicts a specific pattern across data populations. +The bench validates whether the prediction holds: + +| Epiphany | Prediction | Attention k_proj | MLP gate | Vocab embed | Jina v5 | Audio codec | +|---|---|---|---|---|---|---| +| I1 (two regimes) | argmax vs index regimes need different codecs | argmax ✓ | argmax ✓ | INDEX ✗ | argmax ✓ | index? | +| I2 (near-orthogonal) | single-centroid fails on high-dim rows | ✗ (1024-d) | ✗ (1024-d) | ✗✗ (2048-d) | ?(1024-d) | ?(1024-d) | +| I3 (direction ≠ magnitude) | scalar residual can't fix direction | ✗ | ✗ | ✗ | ? | ? | +| I7 (location vs sparse-signal) | two-factor structure in codec α | location factor | location factor | index factor | ? | ? | +| I8 (layered HEEL+HIP+LEAF) | location + JLQ beats either alone | probe needed | probe needed | passthrough wins | probe needed | probe needed | +| I9 (BF17 shapeshifting) | joint dir+mag > separated at HEEL/HIP | P7 confirmed | probe needed | N/A (index) | probe needed | probe needed | +| P5 (chain collapse) | single-layer OK, 33-layer chain ρ→0 | ✓ measured | extrapolated | N/A | ? | ? | +| P7 (PolarQuant HIP) | direction-only families worse than joint | ✓ measured (-9%) | ? | N/A | ? | ? | + +### Populations (data types for cross-validation) + +| Population | Regime | Dim | Source | Distribution signature | +|---|---|---|---|---| +| Attention k_proj | argmax | 1024 | Qwen3-TTS-0.6B | near-orthogonal, small magnitude range | +| MLP gate_proj | argmax | 1024 | Qwen3-TTS-0.6B | SiLU-gated, bimodal | +| Text embedding | index | 2048 | Qwen3-TTS-0.6B | vocab-sized, sparse usage | +| Jina v5 output | argmax | 1024 | jina-v5-onnx/ on disk | semantic similarity, unit-normalized | +| Audio codec emb | index | 1024 | Qwen3-TTS-0.6B (×15) | RVQ codebook, discrete latent | +| BGE-M3 output | argmax | 1024 | bge-m3 crate | multilingual, SentencePiece | + +### Metrics per (codec × population) cell + +| Metric | What it measures | Tool | +|---|---|---| +| Pearson r | linear score correlation vs ground truth | `bgz_tensor::quality::pearson` | +| Spearman ρ | rank correlation | `bgz_tensor::quality::spearman` | +| Top-1/5/10 NN recall | argmax preservation | `bgz_tensor::quality::top_k_recall` | +| MAE / RMSE | score estimate error | `bgz_tensor::quality::mae/rmse` | +| Cronbach's α (inter-codec) | internal consistency across codec "items" | NEW — implement | +| Cohen's κ (top-1 agreement) | agreement above chance for argmax | NEW — implement | +| Bias (mean signed error) | systematic over/under-estimation | NEW — implement | +| ICC(3,1) | intraclass correlation, consistency | NEW — implement | + +### Deliverable + +One table per population, all codecs as rows, all metrics as columns. +Cross-population α matrix showing which codecs generalize vs which are +population-specific. Factor analysis identifying the location/sparse-signal +two-factor structure (or refuting it). + +This is P6 from earlier, now with the psychometric measurement model and +the epiphany × population correlation matrix that ties it back to every +lesson this session learned. From 9a0adbcc0db7966dc3ef3486f403e3aaf6ad3eb5 Mon Sep 17 00:00:00 2001 From: Claude Date: Fri, 17 Apr 2026 13:34:12 +0000 Subject: [PATCH 6/6] =?UTF-8?q?docs:=20P9=20mixed=20bit-width=20per=20HHTL?= =?UTF-8?q?=20level=20=E2=80=94=20resolution=20as=20bench=20variable?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj --- docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md | 30 ++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md b/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md index dc62fe6a..4775403b 100644 --- a/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md +++ b/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md @@ -468,3 +468,33 @@ two-factor structure (or refuting it). This is P6 from earlier, now with the psychometric measurement model and the epiphany × population correlation matrix that ties it back to every lesson this session learned. + +### P9. Resolution as a variable — mixed bit-width per HHTL level + +The bit width per level is NOT fixed. The bench should test whether +wider HIP address (finer opinion bins → tighter compartments) reduces +residual variance enough that the LEAF can be shorter. Total bits/row +could DROP despite wider address. + +Variants to test (same tensor, same population matrix): + +| Variant | Address | Leaf | Total bits/row | +|---|---|---|---| +| I8-baseline | 2+4=6b | 8×i8=64b | 70 | +| Wide-HIP | 2+8=10b | 8×i8=64b | 74 | +| Mixed-leaf | 2+4=6b | 4×i16+4×i8=96b | 102 | +| Compact-leaf | 2+4=6b | 4×i8=32b | 38 | +| BF16-leaf | 2+4=6b | 2×BF16=32b | 38 | +| Stacked-Matryoshka | 2+4=6b | 2×i16+2×i8+2×i4+2×i2=52b | 58 | + +Key question: does wider HIP × shorter leaf beat narrow HIP × longer +leaf at the SAME total bit budget? + +The Matryoshka module (bgz-tensor/src/matryoshka.rs) already has the +4-band design (BandPrecision::I16/I8/I4/I2 per SVD energy ordering). +Wiring it into the leaf encoding is a composition, not a new primitive. + +This connects I9 (BF17 shapeshifting): the optimal encoding per level +might be BF17-like mixed precision — high-energy dims get i16 mantissa, +low-energy dims get i2. Same total wire budget, but information-weighted +allocation across the Matryoshka bands.