Skip to content

feat(inference): CBAR-PIECE-5 PR-2 — GGUF metadata loader populates QwenModelMetadata#1333

Merged
joelteply merged 2 commits into
canaryfrom
feat/cbar-piece5-pr2-gguf-metadata-populator
May 16, 2026
Merged

feat(inference): CBAR-PIECE-5 PR-2 — GGUF metadata loader populates QwenModelMetadata#1333
joelteply merged 2 commits into
canaryfrom
feat/cbar-piece5-pr2-gguf-metadata-populator

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Summary

Stacks on #1331 (CBAR-PIECE-5 PR-1, residency gate types). PR-1 defined the typed gate; this PR-2 reads a real GGUF file and produces the QwenModelMetadata the gate consumes. PR-3 wires both probe + this loader into the turn dispatcher with enforcement.

Same pure-functions cadence as PR-1 — file I/O lives in a thin wrapper, all parsing logic lives in helpers unit-testable without GGUF fixtures.

What ships in inference_capability/gguf_loader.rs

  • pub fn read_qwen_model_metadata(path: &Path) -> Result<QwenModelMetadata, String> — thin file-opener using backends::gguf_file::Content already in the crate. No new dependencies.
  • pub(crate) fn file_type_to_bytes_per_param(ft: u32) -> Result<f64, String> — maps GGUF general.file_type enum to bytes-per-weight. Covers the full shipped quantization set (Q4_0/Q4_1/Q4_K_S/Q4_K_M/Q5_0/Q5_1/Q5_K_S/Q5_K_M/Q6_K/Q8_0, IQ-series sub-2-bit, F16/F32/BF16). Unknown ft returns Err with the value named.
  • pub(crate) fn layer_kinds_for_architecture(arch: &str) -> Vec<String> — lookup table for Vulkan-llama.cpp gaps: qwen3moe → [moe_gate, sliding_window_attn], qwen3 → [sliding_window_attn], everything else → []. Pinned by a dedicated test so renames must land in both the table + residency.rs's matching test simultaneously.

Failure-mode discipline

Field Required? Action when missing
general.architecture Err with named path
{arch}.block_count Err with named path
general.file_type Err with named path
general.parameter_count optional derive from file_size / bytes_per_param (loud fallback, documented in field comment)
general.name optional fall back to file stem (display only)

Same posture as backends::read_gguf_metadata (Joel's 2026-04-23 fix removed all silent-llama-fallback paths there).

Test plan

15 passing on cargo test --lib --features metal,accelerate inference_capability::gguf_loader::

  • file_type_to_bytes_per_param (7):
    • workhorse quants (0, 1, 2, 7, 8, 14, 15, 17, 18, 32) all have entries
    • Q4_K_M in 0.55-0.65 bytes/param band
    • FP16 = 2.0 exactly
    • F32 = 4.0 exactly
    • unknown ft (9999) returns Err with value named
    • removed ft (4, 5, 6) return Err
    • quantization ordering monotone (Q4_K_M < Q5_K_M < Q6_K < Q8_0 < F16 < F32)
    • IQ-series sub-2-bit all < 0.4 bytes/param
  • layer_kinds_for_architecture (5):
    • qwen3moe lists both moe_gate + sliding_window_attn
    • qwen3 lists sliding_window_attn only
    • qwen2 + qwen2vl empty (Vulkan handles them)
    • unknown arch (mistral, phi3, empty, future-model) returns empty
    • table-pinning test forces rename to land in two places
  • read_qwen_model_metadata I/O (2):
    • nonexistent path returns Err with descriptive message
    • non-GGUF file (Cargo.toml) returns Err

VDD evidence

N/A — pure-data loader, no inference dispatch. Evidence lands with PR-3 (enforcement integration).

Stack

After #1331 merges, this PR rebases cleanly onto canary.

Test added 2 commits May 16, 2026 16:20
…nctions slice)

CBAR-SUBSTRATE missing-piece #5 (docs/architecture/CBAR-SUBSTRATE-ARCHITECTURE.md
§336): Qwen GPU residency gate. Stacks on PR #1315 (GRID-INFERENCE-ROUTING PR-1)
inference_capability module — different file, same module surface, same pure-
functions cadence as rate_proposals + generate_recipe + #1315 PR-1s.

#1315's probe answers "does this node have an advertisable GPU at all?" This
gate answers the next question one level deeper: "will the SELECTED MODEL
actually fit with all layers on that GPU, evidenced not guessed?"

Per CBAR-SUBSTRATE spec, before any local-generation turn runs:

- Selected Qwen model named explicitly
- Backend (Metal / CUDA / Vulkan) named + matches platform
- GPU layer count reported
- Unsupported layers enumerated (Vulkan-llama.cpp gaps, etc.)
- VRAM residency estimate covers all layers
- "CPU graph splits or unsupported Qwen layers are blockers unless the
  turn is explicitly degraded with a visible reason."

What ships (pure-functions slice — no GGUF I/O, no dispatch wiring; PR-2
wires the GGUF reader to populate QwenModelMetadata, PR-3 wires the gate
into the actual turn dispatcher with a block-the-turn enforcement point):

- BackendChoice (Metal / Cuda / Vulkan) — lowercase ts-rs export
- QwenModelMetadata — model_name, architecture, layer_count,
  parameter_count_billions, bytes_per_parameter_quantized,
  layer_kinds_needing_check. Pure data populated by future PR-2 GGUF reader
- ResidencyEvidence — typed evidence emitted on Pass; covers every
  CBAR-SUBSTRATE-required field
- ResidencyGateResult — Pass(evidence) | Block { reasons } tagged-union
- BlockReason — NoGpuBackendOnNode | UnsupportedLayer | PartialGpuSplit |
  WrongBackendForPlatform (typed, surfaces specific cause)
- Pure functions: select_backend, check_residency_gate

Failure-mode discipline (non-negotiable per vhsm-d1f4 audit pass 1):

- No silent CPU split: PartialGpuSplit fires when free VRAM < estimate
- No silent fallback: NoGpuBackendOnNode fires when no GPU at all
- No silent unsupported layer: UnsupportedLayer fires per-kind for
  Vulkan + qwen3moe (vendored llama.cpp Vulkan gap today)
- No hardcoded enums: BackendChoice is a tagged enum; QwenModelMetadata's
  layer_kinds_needing_check is Vec<String> (new layer kinds plug in)
- No assumed defaults: every field comes from inputs

Backend selection precedence (matches probe.rs llamacpp advertisement rule):
Mac → Metal, NVIDIA → CUDA, AMD/Intel → Vulkan, CPU-only → None.
Metal wins over Cuda on a Mac (native path); CUDA wins over Vulkan on
NVIDIA hardware (llama.cpp CUDA kernels more complete than Vulkan today).

Tests: 41 passing on cargo test --lib --features metal,accelerate
inference_capability::residency::

- select_backend (4): picks Metal/CUDA/Vulkan correctly per HW class; None
  on CPU-only
- check_residency_gate happy paths (4): M5 Pro / MacBook Air M2 / Blackwell
  / AMD-Vulkan all run their expected Qwen variants with full evidence
- check_residency_gate block paths (4): CPU-only blocks with
  NoGpuBackendOnNode + exclusive reason; M2 blocks 30B for VRAM; AMD Vulkan
  blocks Qwen3 MoE with UnsupportedLayer; vulkan-+-Qwen2 PASSES (vulkan
  handles qwen2 today, not qwen3moe)
- VRAM estimate (3): Q4 7B in 3-5GB band, Q4 30B in 14-18GB band,
  estimate scales with quantization
- Evidence + serde (5): every required field present on Pass; BackendChoice
  lowercase; BlockReason + ResidencyGateResult tagged-union round-trips;
  QwenModelMetadata + ResidencyEvidence camelCase
- Edge cases (8): inclusive-vram-boundary pass; one-byte-under blocks;
  tiny model on CPU still blocks; probe-passes-residency-blocks
  composition; multi-reason block accumulates; reasons() empty slice on
  Pass; FP16 7B blocks on 8GB Mac; WrongBackend variant round-trips
- Layer-kind detail (3): backend_choice_as_str; vulkan emits one
  UnsupportedLayer per kind; empty layer_kinds never emits
- ts-rs exports (5): BackendChoice, BlockReason, QwenModelMetadata,
  ResidencyEvidence, ResidencyGateResult

Cargo check clean on --features metal,accelerate.

This is PR-1 of CBAR-PIECE-5. PR-2 wires GGUF metadata reader (extends
backends::read_gguf_metadata with block_count + parameter count) to
populate QwenModelMetadata from a path. PR-3 wires the gate result into
the turn dispatcher with enforcement (block the turn instead of letting
it silently run).

VDD evidence N/A — pure data + derivation, no inference dispatch.
Evidence lands with PR-3.

Stack:
- #1315 GRID-INFERENCE-ROUTING PR-1 (this PR's base; OPEN, MERGEABLE,
  zero file conflict)
- This PR: inference_capability/residency.rs (PIECE-5 PR-1)
- Future PR-2: GGUF reader + metadata populator
- Future PR-3: dispatcher integration + enforcement
…wenModelMetadata

Stacks on #1331 (CBAR-PIECE-5 PR-1, residency gate types). PR-1 defined
the QwenModelMetadata struct + gate; this PR-2 reads a real GGUF file
and produces the metadata the gate consumes. PR-3 will wire both probe
+ this loader into the turn dispatcher with enforcement.

Same pure-functions cadence as PR-1 — file I/O lives in a thin
wrapper, all parsing logic lives in helpers that are unit-testable
without GGUF fixtures.

What ships in inference_capability/gguf_loader.rs:

- pub fn read_qwen_model_metadata(path: &Path) -> Result<QwenModelMetadata>
  Thin file-opener; uses backends:: gguf_file::Content already in the
  crate. No new dependencies.

- pub(crate) fn file_type_to_bytes_per_param(ft: u32) -> Result<f64>
  Maps the GGUF general.file_type enum to bytes-per-weight. Covers the
  full shipped quantization set (Q4_0/Q4_1/Q4_K_S/Q4_K_M/Q5_0/Q5_1/
  Q5_K_S/Q5_K_M/Q6_K/Q8_0, IQ-series sub-2-bit, F16/F32/BF16). Unknown
  ft returns Err with the value named — same no-silent-default posture
  as backends::read_gguf_metadata.

- pub(crate) fn layer_kinds_for_architecture(arch: &str) -> Vec<String>
  Lookup table for architectures with known Vulkan-llama.cpp gaps:
  qwen3moe → [moe_gate, sliding_window_attn], qwen3 → [sliding_window_attn],
  everything else → []. Pinned by a dedicated test so renames must land
  in both the table + residency.rs's matching test simultaneously.

Failure-mode discipline:

- general.architecture: REQUIRED (refuse to guess — silent fallback was
  the 2026-04-23 bug Joel called out)
- {arch}.block_count: REQUIRED (no fake layer-count evidence)
- general.file_type: REQUIRED (no guessed quantization → wrong VRAM)
- general.parameter_count: OPTIONAL with loud fallback (derive from
  file_size / bytes_per_param — approximate, documented)
- general.name: OPTIONAL with file-stem fallback (display only, doesn't
  affect gate correctness)

Tests: 15 passing on cargo test --lib --features metal,accelerate
inference_capability::gguf_loader::

- file_type_to_bytes_per_param (7): workhorse quants present, Q4_K_M
  in 0.55-0.65 band, FP16=2.0, F32=4.0, unknown=Err, removed
  ft={4,5,6}=Err, ordering monotone, IQ-series sub-0.4 bytes
- layer_kinds_for_architecture (5): qwen3moe = [moe_gate,
  sliding_window_attn], qwen3 = [sliding_window_attn], qwen2 +
  qwen2vl empty, unknown arch empty, table pinning
- read_qwen_model_metadata I/O (2): nonexistent path Err, non-GGUF
  file (Cargo.toml) Err

VDD evidence N/A — pure-data loader, no inference dispatch. Evidence
will land with PR-3 (enforcement integration).

Stack:
- #1315 GRID-INFERENCE-ROUTING PR-1 (merged to canary)
- #1331 CBAR-PIECE-5 PR-1 (residency gate types — base of this PR)
- This PR: GGUF metadata loader (PIECE-5 PR-2)
- Future PR-3: dispatcher integration + enforcement
@joelteply joelteply changed the base branch from feat/cbar-piece5-qwen-residency-gate to canary May 16, 2026 21:30
An error occurred while trying to automatically change base from feat/cbar-piece5-qwen-residency-gate to canary May 16, 2026 21:30
@joelteply joelteply merged commit 656148a into canary May 16, 2026
3 checks passed
@joelteply joelteply deleted the feat/cbar-piece5-pr2-gguf-metadata-populator branch May 16, 2026 21:30
joelteply pushed a commit that referenced this pull request May 16, 2026
…ning; navigate to MODULE-CATALOG queue

Second refresh of ALPHA-GAP Immediate Next Actions to reflect work
landed since #1316 merged. Six items closed; navigation into
MODULE-CATALOG queue made explicit.

Closed: #6 contract widening (#1341), #8 GRID-INFERENCE-ROUTING PR-1
(#1315), CBAR-PIECE-5 end-to-end (#1331/#1333/#1335/#1338),
PIECE-8 inference-grpc hardcoded-clamps (#1340), doc family
architecture surface (#1324/#1327/#1332/#1336/#1337 open;
#1316/#1317/#1320/#1329 merged).

Item #9 reorganized to point at MODULE-CATALOG's 'Next Modules To
Build' queue (audit-recorder → threat-detector → working-set-manager
→ demand-aligned-recall → substrate-governor).

Adds closeout summary section listing what's done, what's open
(5 architecture-doc PRs ready for review + 2 airc PRs), and what's
queued (5 modules with dependency state + LoC + acceptance criteria
in MODULE-CATALOG).

Doc-driven development cycle is working: doc spec → implementing
agent picks up → ships PR → next spec referenced.
joelteply added a commit that referenced this pull request May 16, 2026
…ning; navigate to MODULE-CATALOG queue (#1342)

Second refresh of ALPHA-GAP Immediate Next Actions to reflect work
landed since #1316 merged. Six items closed; navigation into
MODULE-CATALOG queue made explicit.

Closed: #6 contract widening (#1341), #8 GRID-INFERENCE-ROUTING PR-1
(#1315), CBAR-PIECE-5 end-to-end (#1331/#1333/#1335/#1338),
PIECE-8 inference-grpc hardcoded-clamps (#1340), doc family
architecture surface (#1324/#1327/#1332/#1336/#1337 open;
#1316/#1317/#1320/#1329 merged).

Item #9 reorganized to point at MODULE-CATALOG's 'Next Modules To
Build' queue (audit-recorder → threat-detector → working-set-manager
→ demand-aligned-recall → substrate-governor).

Adds closeout summary section listing what's done, what's open
(5 architecture-doc PRs ready for review + 2 airc PRs), and what's
queued (5 modules with dependency state + LoC + acceptance criteria
in MODULE-CATALOG).

Doc-driven development cycle is working: doc spec → implementing
agent picks up → ships PR → next spec referenced.

Co-authored-by: Test <test@test.com>
joelteply added a commit that referenced this pull request May 16, 2026
…e traits (+sentinel cleanup) (#1353)

* feat(genome): working-set-manager PR-2 — WorkingSetManager + TierStore traits

PR-2 of working-set-manager (MODULE-CATALOG §VII + GENOME-FOUNDRY-
SENTINEL Parts 2/3/4). Trait surface on top of PR-1's typed data
layer (#1346). No implementations — those are PR-3 + the per-role
TierStore PRs.

Mirrors the slice shape: PR-1 = data, PR-2 = traits, PR-3 = impl +
wiring. Same pattern as CBAR-PIECE-2 (data #1321 → traits #1323 →
dispatch #1339+#1343) and PIECE-5 (data #1331 → loader #1333 →
probe #1335 → enforcement #1338).

What lands

- `genome::store::TierStore` — the trait every per-role tier
  implementation satisfies. Five methods: role / read / write /
  evict / capacity / observe_access. `Send + Sync + async_trait`
  for tokio concurrency. Used by working-set-manager (PR-3) as
  `Box<dyn TierStore>` per configured role.

- `genome::manager::WorkingSetManager` — the top-level paging
  interface. Four methods this PR: page_in / page_out / working_set
  / audit_access. The fifth method `check_permission(actor, region,
  op)` from GENOME-FOUNDRY-SENTINEL Part 4 lands in PR-3 alongside
  the GenomeRegion + Op type definitions.

- `genome::blob::ArtifactBlob` — bytes-side type for
  `TierStore::write`. Content-addressed via ArtifactId. NOT
  ts-rs-exported — large blobs don't belong on the TS wire.

- `genome::blob::Provenance` — PR-2 minimal stub (artifact_id +
  created_at_ms). Full GENOME-FOUNDRY-SENTINEL Part 1 shape grows
  this type later without breaking the trait surface.

Design refinements vs the raw spec

- `working_set` returns `Option<&WorkingSet>` instead of
  `&WorkingSet`. Unregistered persona → `None` instead of fabricating
  an empty struct that masks wrong-persona-id bugs.
- `page_in` returns `Result<PageHandle, PageFault>` per spec.
  Documented that PageFault is a typed observability signal, not a
  failure error — caller treats it as success-with-trace-event.

Tests

13 new tests on genome::manager + genome::store + genome::blob:
trait object-safety, dispatch through Arc/Box, audit_access denial
shape, ArtifactBlob size invariant, Provenance wire shape. 48
genome:: tests total (PR-1's 35 + PR-2's 13). No regressions across
the other 2487 lib tests.

Stack

#1339 / #1343 — CBAR-PIECE-2 PR-3 artifact dispatch (mine)
#1344 — audit-recorder (codex's, subscribes to AccessDenied)
#1346 — working-set-manager PR-1: data types (mine)
THIS PR — working-set-manager PR-2: traits (mine)
NEXT  — working-set-manager PR-3: per-persona impl + PageFault /
        EvictionRecord publishing via artifact dispatch path

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sentinel): remove dead self_clone — was masking under -D warnings test build

Drift from canary HEAD: src/workers/continuum-core/src/modules/sentinel/mod.rs:1039
defined `let self_clone = Arc::new(self.sentinels.clone());` and never
referenced it. The actual clone used downstream is `let sentinels =
Arc::clone(&self.sentinels);` at line 1066 (now 1065 after this fix).

Why it bit me: the test build for genome PR-2 (#1346 stack)
`cargo test --lib --features metal,accelerate` is the gate the
prepush hook runs, and that build has -D warnings effectively-on for
unused_variables — so the warning became "error: could not compile."
This blocks every Rust-touching push until fixed.

Per Joel's boy-scout-rule + "Bugs from new users / new machines / new
OS are GIFTS — fix the source, never hack": dead-code fix in place,
sweeping as I go.

This is NOT genome-PR-2 scope but is REQUIRED for the precommit gate
to let genome-PR-2 through. Bundling here keeps the gate working;
splitting it into a separate PR would block PR-2's push behind a fix
that has nothing to do with PR-2's logic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(genome): scope uuid::Uuid import to test module in blob.rs

Earlier fix in this branch removed `use uuid::Uuid;` from file scope
because clippy on `cargo check --lib` flagged it unused. But the
TEST module uses `Uuid::nil()` — `cargo test --lib` failed with E0433
"use of undeclared type Uuid" once the test build saw the references.

Fix: move the import inside `#[cfg(test)] mod tests` so it lives where
it's used. Clippy on the non-test build sees no Uuid usage in
production code (correct — Provenance::minimal doesn't need it),
and the test build sees the import where the test fixtures need it.

48/48 genome:: tests pass after the fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply added a commit that referenced this pull request May 16, 2026
…rocess impl (#1355)

PR-3 of working-set-manager. Hangs the per-persona behaviors on the
PR-1 data layer (#1346) + PR-2 trait surface (#1353). Pure local
implementation — no MessageBus integration baked in (the trait's
`page_in` Result already carries `PageFault` as the typed
observability signal; callers wire to the artifact dispatch path
#1339+#1343 themselves).

Mirrors the slice shape: PR-1 = data, PR-2 = traits, PR-3 = impl.
Same pattern as CBAR-PIECE-2 (data #1321 → traits #1323 →
dispatch #1339+#1343) and PIECE-5 (data #1331 → loader #1333 →
probe #1335 → enforcement #1338).

What lands

- `LocalWorkingSetManager` struct holding:
  - `Vec<Arc<dyn TierStore>>` — tier chain, ordered Fast → Frozen
  - `RwLock<HashMap<PersonaId, WorkingSet>>` — per-persona state
  - `RwLock<HashMap<PageRef, PersonaId>>` — page-ownership map
    for the MMU-style `audit_access` enforcement

- Four trait method impls:
  - `page_in` — fast-path resident hit, otherwise walks tier chain
    top-down, returns PageFault with typed from_role/to_role (None
    from_role = true cold miss; Some = tier promotion)
  - `page_out` — removes from working set, observes target tier,
    skips pinned pages silently, returns `TierError::RoleNotConfigured`
    if the target tier isn't in the configured Vec
  - `working_set` — returns None per refined contract (lock-guard
    escape impossible through the trait signature; tests use the
    `working_set_snapshot` helper instead)
  - `audit_access` — checks page_owners map; returns typed
    `AccessDenied` with full context (actor + owner + reason) on
    cross-persona read

- Two convenience methods:
  - `register_persona(persona, capacity)` — must be called before
    any page_in for the persona
  - `register_page_owner(page, owner)` — populates the MMU table

- Diagnostic helper:
  - `working_set_snapshot(persona)` — clones for telemetry + tests

Deliberately deferred (PR-4 or later)

- MessageBus integration for PageFault/EvictionRecord publishing.
  The trait's Result<PageHandle, PageFault> contract gives caller-
  side observability today; bus publishing can stay caller-side
  too (and the artifact dispatch I shipped in #1339+#1343 is the
  publishing path when callers wire it).
- Eviction policy invocation when target tier is at limit. PR-3
  returns NoEvictionCandidate; PR-4 wires the callback so the
  manager observes + re-publishes the EvictionRecord.
- `check_permission(actor, region, op)` — needs GenomeRegion + Op
  type definitions; lands with PR-4.

Refinements to the PR-2 trait contract

- `working_set` returns `None` because borrowing through the RwLock
  would expose the lock guard type and break the trait signature.
  Documented in the impl + the trait docstring. Tests + telemetry
  use `working_set_snapshot` (clone, not on hot path).

Tests

8 new tests on genome::local_manager:
- page_in_resident_returns_cached_without_tier_walk — hot-path
  correctness (whole point of a working set)
- page_in_walks_tier_chain_and_records_promotion — Fast → Bench →
  Cold walk order, PageFault.from_role + to_role correctness
- page_in_true_cold_miss_has_none_from_role — typed signal
  sentinel uses to distinguish "page never existed"
- audit_access_denies_cross_persona_read — typed AccessDenied
  with full context, same contract PR-2's trait test pins
- page_out_observes_target_tier_and_handles_unconfigured — typed
  RoleNotConfigured for "this hardware doesn't have that role"
- page_out_skips_pinned_pages_silently — composition pin contract
- working_set_snapshot_reflects_page_in_state — diagnostic helper
- tier_count_reflects_configured_tiers — O(1) governor diagnostic

56 genome:: tests total (PR-1's 35 + PR-2's 13 + PR-3's 8). No
regressions across other 2566 lib tests.

Stack

#1339 / #1343 — CBAR-PIECE-2 PR-3 artifact dispatch (mine)
#1344 — audit-recorder (codex's, subscribes to AccessDenied)
#1346 — working-set-manager PR-1: data types (mine)
#1353 — working-set-manager PR-2: traits (mine)
THIS PR — working-set-manager PR-3: per-process impl (mine)
NEXT  — PR-4: bus integration + eviction-callback wiring +
        check_permission + GenomeRegion/Op types

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant