feat(inference): CBAR-PIECE-5 PR-2 — GGUF metadata loader populates QwenModelMetadata#1333
Merged
Merged
Conversation
added 2 commits
May 16, 2026 16:20
…nctions slice) CBAR-SUBSTRATE missing-piece #5 (docs/architecture/CBAR-SUBSTRATE-ARCHITECTURE.md §336): Qwen GPU residency gate. Stacks on PR #1315 (GRID-INFERENCE-ROUTING PR-1) inference_capability module — different file, same module surface, same pure- functions cadence as rate_proposals + generate_recipe + #1315 PR-1s. #1315's probe answers "does this node have an advertisable GPU at all?" This gate answers the next question one level deeper: "will the SELECTED MODEL actually fit with all layers on that GPU, evidenced not guessed?" Per CBAR-SUBSTRATE spec, before any local-generation turn runs: - Selected Qwen model named explicitly - Backend (Metal / CUDA / Vulkan) named + matches platform - GPU layer count reported - Unsupported layers enumerated (Vulkan-llama.cpp gaps, etc.) - VRAM residency estimate covers all layers - "CPU graph splits or unsupported Qwen layers are blockers unless the turn is explicitly degraded with a visible reason." What ships (pure-functions slice — no GGUF I/O, no dispatch wiring; PR-2 wires the GGUF reader to populate QwenModelMetadata, PR-3 wires the gate into the actual turn dispatcher with a block-the-turn enforcement point): - BackendChoice (Metal / Cuda / Vulkan) — lowercase ts-rs export - QwenModelMetadata — model_name, architecture, layer_count, parameter_count_billions, bytes_per_parameter_quantized, layer_kinds_needing_check. Pure data populated by future PR-2 GGUF reader - ResidencyEvidence — typed evidence emitted on Pass; covers every CBAR-SUBSTRATE-required field - ResidencyGateResult — Pass(evidence) | Block { reasons } tagged-union - BlockReason — NoGpuBackendOnNode | UnsupportedLayer | PartialGpuSplit | WrongBackendForPlatform (typed, surfaces specific cause) - Pure functions: select_backend, check_residency_gate Failure-mode discipline (non-negotiable per vhsm-d1f4 audit pass 1): - No silent CPU split: PartialGpuSplit fires when free VRAM < estimate - No silent fallback: NoGpuBackendOnNode fires when no GPU at all - No silent unsupported layer: UnsupportedLayer fires per-kind for Vulkan + qwen3moe (vendored llama.cpp Vulkan gap today) - No hardcoded enums: BackendChoice is a tagged enum; QwenModelMetadata's layer_kinds_needing_check is Vec<String> (new layer kinds plug in) - No assumed defaults: every field comes from inputs Backend selection precedence (matches probe.rs llamacpp advertisement rule): Mac → Metal, NVIDIA → CUDA, AMD/Intel → Vulkan, CPU-only → None. Metal wins over Cuda on a Mac (native path); CUDA wins over Vulkan on NVIDIA hardware (llama.cpp CUDA kernels more complete than Vulkan today). Tests: 41 passing on cargo test --lib --features metal,accelerate inference_capability::residency:: - select_backend (4): picks Metal/CUDA/Vulkan correctly per HW class; None on CPU-only - check_residency_gate happy paths (4): M5 Pro / MacBook Air M2 / Blackwell / AMD-Vulkan all run their expected Qwen variants with full evidence - check_residency_gate block paths (4): CPU-only blocks with NoGpuBackendOnNode + exclusive reason; M2 blocks 30B for VRAM; AMD Vulkan blocks Qwen3 MoE with UnsupportedLayer; vulkan-+-Qwen2 PASSES (vulkan handles qwen2 today, not qwen3moe) - VRAM estimate (3): Q4 7B in 3-5GB band, Q4 30B in 14-18GB band, estimate scales with quantization - Evidence + serde (5): every required field present on Pass; BackendChoice lowercase; BlockReason + ResidencyGateResult tagged-union round-trips; QwenModelMetadata + ResidencyEvidence camelCase - Edge cases (8): inclusive-vram-boundary pass; one-byte-under blocks; tiny model on CPU still blocks; probe-passes-residency-blocks composition; multi-reason block accumulates; reasons() empty slice on Pass; FP16 7B blocks on 8GB Mac; WrongBackend variant round-trips - Layer-kind detail (3): backend_choice_as_str; vulkan emits one UnsupportedLayer per kind; empty layer_kinds never emits - ts-rs exports (5): BackendChoice, BlockReason, QwenModelMetadata, ResidencyEvidence, ResidencyGateResult Cargo check clean on --features metal,accelerate. This is PR-1 of CBAR-PIECE-5. PR-2 wires GGUF metadata reader (extends backends::read_gguf_metadata with block_count + parameter count) to populate QwenModelMetadata from a path. PR-3 wires the gate result into the turn dispatcher with enforcement (block the turn instead of letting it silently run). VDD evidence N/A — pure data + derivation, no inference dispatch. Evidence lands with PR-3. Stack: - #1315 GRID-INFERENCE-ROUTING PR-1 (this PR's base; OPEN, MERGEABLE, zero file conflict) - This PR: inference_capability/residency.rs (PIECE-5 PR-1) - Future PR-2: GGUF reader + metadata populator - Future PR-3: dispatcher integration + enforcement
…wenModelMetadata Stacks on #1331 (CBAR-PIECE-5 PR-1, residency gate types). PR-1 defined the QwenModelMetadata struct + gate; this PR-2 reads a real GGUF file and produces the metadata the gate consumes. PR-3 will wire both probe + this loader into the turn dispatcher with enforcement. Same pure-functions cadence as PR-1 — file I/O lives in a thin wrapper, all parsing logic lives in helpers that are unit-testable without GGUF fixtures. What ships in inference_capability/gguf_loader.rs: - pub fn read_qwen_model_metadata(path: &Path) -> Result<QwenModelMetadata> Thin file-opener; uses backends:: gguf_file::Content already in the crate. No new dependencies. - pub(crate) fn file_type_to_bytes_per_param(ft: u32) -> Result<f64> Maps the GGUF general.file_type enum to bytes-per-weight. Covers the full shipped quantization set (Q4_0/Q4_1/Q4_K_S/Q4_K_M/Q5_0/Q5_1/ Q5_K_S/Q5_K_M/Q6_K/Q8_0, IQ-series sub-2-bit, F16/F32/BF16). Unknown ft returns Err with the value named — same no-silent-default posture as backends::read_gguf_metadata. - pub(crate) fn layer_kinds_for_architecture(arch: &str) -> Vec<String> Lookup table for architectures with known Vulkan-llama.cpp gaps: qwen3moe → [moe_gate, sliding_window_attn], qwen3 → [sliding_window_attn], everything else → []. Pinned by a dedicated test so renames must land in both the table + residency.rs's matching test simultaneously. Failure-mode discipline: - general.architecture: REQUIRED (refuse to guess — silent fallback was the 2026-04-23 bug Joel called out) - {arch}.block_count: REQUIRED (no fake layer-count evidence) - general.file_type: REQUIRED (no guessed quantization → wrong VRAM) - general.parameter_count: OPTIONAL with loud fallback (derive from file_size / bytes_per_param — approximate, documented) - general.name: OPTIONAL with file-stem fallback (display only, doesn't affect gate correctness) Tests: 15 passing on cargo test --lib --features metal,accelerate inference_capability::gguf_loader:: - file_type_to_bytes_per_param (7): workhorse quants present, Q4_K_M in 0.55-0.65 band, FP16=2.0, F32=4.0, unknown=Err, removed ft={4,5,6}=Err, ordering monotone, IQ-series sub-0.4 bytes - layer_kinds_for_architecture (5): qwen3moe = [moe_gate, sliding_window_attn], qwen3 = [sliding_window_attn], qwen2 + qwen2vl empty, unknown arch empty, table pinning - read_qwen_model_metadata I/O (2): nonexistent path Err, non-GGUF file (Cargo.toml) Err VDD evidence N/A — pure-data loader, no inference dispatch. Evidence will land with PR-3 (enforcement integration). Stack: - #1315 GRID-INFERENCE-ROUTING PR-1 (merged to canary) - #1331 CBAR-PIECE-5 PR-1 (residency gate types — base of this PR) - This PR: GGUF metadata loader (PIECE-5 PR-2) - Future PR-3: dispatcher integration + enforcement
An error occurred while trying to automatically change base from
feat/cbar-piece5-qwen-residency-gate
to
canary
May 16, 2026 21:30
joelteply
pushed a commit
that referenced
this pull request
May 16, 2026
…ning; navigate to MODULE-CATALOG queue Second refresh of ALPHA-GAP Immediate Next Actions to reflect work landed since #1316 merged. Six items closed; navigation into MODULE-CATALOG queue made explicit. Closed: #6 contract widening (#1341), #8 GRID-INFERENCE-ROUTING PR-1 (#1315), CBAR-PIECE-5 end-to-end (#1331/#1333/#1335/#1338), PIECE-8 inference-grpc hardcoded-clamps (#1340), doc family architecture surface (#1324/#1327/#1332/#1336/#1337 open; #1316/#1317/#1320/#1329 merged). Item #9 reorganized to point at MODULE-CATALOG's 'Next Modules To Build' queue (audit-recorder → threat-detector → working-set-manager → demand-aligned-recall → substrate-governor). Adds closeout summary section listing what's done, what's open (5 architecture-doc PRs ready for review + 2 airc PRs), and what's queued (5 modules with dependency state + LoC + acceptance criteria in MODULE-CATALOG). Doc-driven development cycle is working: doc spec → implementing agent picks up → ships PR → next spec referenced.
joelteply
added a commit
that referenced
this pull request
May 16, 2026
…ning; navigate to MODULE-CATALOG queue (#1342) Second refresh of ALPHA-GAP Immediate Next Actions to reflect work landed since #1316 merged. Six items closed; navigation into MODULE-CATALOG queue made explicit. Closed: #6 contract widening (#1341), #8 GRID-INFERENCE-ROUTING PR-1 (#1315), CBAR-PIECE-5 end-to-end (#1331/#1333/#1335/#1338), PIECE-8 inference-grpc hardcoded-clamps (#1340), doc family architecture surface (#1324/#1327/#1332/#1336/#1337 open; #1316/#1317/#1320/#1329 merged). Item #9 reorganized to point at MODULE-CATALOG's 'Next Modules To Build' queue (audit-recorder → threat-detector → working-set-manager → demand-aligned-recall → substrate-governor). Adds closeout summary section listing what's done, what's open (5 architecture-doc PRs ready for review + 2 airc PRs), and what's queued (5 modules with dependency state + LoC + acceptance criteria in MODULE-CATALOG). Doc-driven development cycle is working: doc spec → implementing agent picks up → ships PR → next spec referenced. Co-authored-by: Test <test@test.com>
5 tasks
joelteply
added a commit
that referenced
this pull request
May 16, 2026
…e traits (+sentinel cleanup) (#1353) * feat(genome): working-set-manager PR-2 — WorkingSetManager + TierStore traits PR-2 of working-set-manager (MODULE-CATALOG §VII + GENOME-FOUNDRY- SENTINEL Parts 2/3/4). Trait surface on top of PR-1's typed data layer (#1346). No implementations — those are PR-3 + the per-role TierStore PRs. Mirrors the slice shape: PR-1 = data, PR-2 = traits, PR-3 = impl + wiring. Same pattern as CBAR-PIECE-2 (data #1321 → traits #1323 → dispatch #1339+#1343) and PIECE-5 (data #1331 → loader #1333 → probe #1335 → enforcement #1338). What lands - `genome::store::TierStore` — the trait every per-role tier implementation satisfies. Five methods: role / read / write / evict / capacity / observe_access. `Send + Sync + async_trait` for tokio concurrency. Used by working-set-manager (PR-3) as `Box<dyn TierStore>` per configured role. - `genome::manager::WorkingSetManager` — the top-level paging interface. Four methods this PR: page_in / page_out / working_set / audit_access. The fifth method `check_permission(actor, region, op)` from GENOME-FOUNDRY-SENTINEL Part 4 lands in PR-3 alongside the GenomeRegion + Op type definitions. - `genome::blob::ArtifactBlob` — bytes-side type for `TierStore::write`. Content-addressed via ArtifactId. NOT ts-rs-exported — large blobs don't belong on the TS wire. - `genome::blob::Provenance` — PR-2 minimal stub (artifact_id + created_at_ms). Full GENOME-FOUNDRY-SENTINEL Part 1 shape grows this type later without breaking the trait surface. Design refinements vs the raw spec - `working_set` returns `Option<&WorkingSet>` instead of `&WorkingSet`. Unregistered persona → `None` instead of fabricating an empty struct that masks wrong-persona-id bugs. - `page_in` returns `Result<PageHandle, PageFault>` per spec. Documented that PageFault is a typed observability signal, not a failure error — caller treats it as success-with-trace-event. Tests 13 new tests on genome::manager + genome::store + genome::blob: trait object-safety, dispatch through Arc/Box, audit_access denial shape, ArtifactBlob size invariant, Provenance wire shape. 48 genome:: tests total (PR-1's 35 + PR-2's 13). No regressions across the other 2487 lib tests. Stack #1339 / #1343 — CBAR-PIECE-2 PR-3 artifact dispatch (mine) #1344 — audit-recorder (codex's, subscribes to AccessDenied) #1346 — working-set-manager PR-1: data types (mine) THIS PR — working-set-manager PR-2: traits (mine) NEXT — working-set-manager PR-3: per-persona impl + PageFault / EvictionRecord publishing via artifact dispatch path Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sentinel): remove dead self_clone — was masking under -D warnings test build Drift from canary HEAD: src/workers/continuum-core/src/modules/sentinel/mod.rs:1039 defined `let self_clone = Arc::new(self.sentinels.clone());` and never referenced it. The actual clone used downstream is `let sentinels = Arc::clone(&self.sentinels);` at line 1066 (now 1065 after this fix). Why it bit me: the test build for genome PR-2 (#1346 stack) `cargo test --lib --features metal,accelerate` is the gate the prepush hook runs, and that build has -D warnings effectively-on for unused_variables — so the warning became "error: could not compile." This blocks every Rust-touching push until fixed. Per Joel's boy-scout-rule + "Bugs from new users / new machines / new OS are GIFTS — fix the source, never hack": dead-code fix in place, sweeping as I go. This is NOT genome-PR-2 scope but is REQUIRED for the precommit gate to let genome-PR-2 through. Bundling here keeps the gate working; splitting it into a separate PR would block PR-2's push behind a fix that has nothing to do with PR-2's logic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(genome): scope uuid::Uuid import to test module in blob.rs Earlier fix in this branch removed `use uuid::Uuid;` from file scope because clippy on `cargo check --lib` flagged it unused. But the TEST module uses `Uuid::nil()` — `cargo test --lib` failed with E0433 "use of undeclared type Uuid" once the test build saw the references. Fix: move the import inside `#[cfg(test)] mod tests` so it lives where it's used. Clippy on the non-test build sees no Uuid usage in production code (correct — Provenance::minimal doesn't need it), and the test build sees the import where the test fixtures need it. 48/48 genome:: tests pass after the fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
joelteply
added a commit
that referenced
this pull request
May 16, 2026
…rocess impl (#1355) PR-3 of working-set-manager. Hangs the per-persona behaviors on the PR-1 data layer (#1346) + PR-2 trait surface (#1353). Pure local implementation — no MessageBus integration baked in (the trait's `page_in` Result already carries `PageFault` as the typed observability signal; callers wire to the artifact dispatch path #1339+#1343 themselves). Mirrors the slice shape: PR-1 = data, PR-2 = traits, PR-3 = impl. Same pattern as CBAR-PIECE-2 (data #1321 → traits #1323 → dispatch #1339+#1343) and PIECE-5 (data #1331 → loader #1333 → probe #1335 → enforcement #1338). What lands - `LocalWorkingSetManager` struct holding: - `Vec<Arc<dyn TierStore>>` — tier chain, ordered Fast → Frozen - `RwLock<HashMap<PersonaId, WorkingSet>>` — per-persona state - `RwLock<HashMap<PageRef, PersonaId>>` — page-ownership map for the MMU-style `audit_access` enforcement - Four trait method impls: - `page_in` — fast-path resident hit, otherwise walks tier chain top-down, returns PageFault with typed from_role/to_role (None from_role = true cold miss; Some = tier promotion) - `page_out` — removes from working set, observes target tier, skips pinned pages silently, returns `TierError::RoleNotConfigured` if the target tier isn't in the configured Vec - `working_set` — returns None per refined contract (lock-guard escape impossible through the trait signature; tests use the `working_set_snapshot` helper instead) - `audit_access` — checks page_owners map; returns typed `AccessDenied` with full context (actor + owner + reason) on cross-persona read - Two convenience methods: - `register_persona(persona, capacity)` — must be called before any page_in for the persona - `register_page_owner(page, owner)` — populates the MMU table - Diagnostic helper: - `working_set_snapshot(persona)` — clones for telemetry + tests Deliberately deferred (PR-4 or later) - MessageBus integration for PageFault/EvictionRecord publishing. The trait's Result<PageHandle, PageFault> contract gives caller- side observability today; bus publishing can stay caller-side too (and the artifact dispatch I shipped in #1339+#1343 is the publishing path when callers wire it). - Eviction policy invocation when target tier is at limit. PR-3 returns NoEvictionCandidate; PR-4 wires the callback so the manager observes + re-publishes the EvictionRecord. - `check_permission(actor, region, op)` — needs GenomeRegion + Op type definitions; lands with PR-4. Refinements to the PR-2 trait contract - `working_set` returns `None` because borrowing through the RwLock would expose the lock guard type and break the trait signature. Documented in the impl + the trait docstring. Tests + telemetry use `working_set_snapshot` (clone, not on hot path). Tests 8 new tests on genome::local_manager: - page_in_resident_returns_cached_without_tier_walk — hot-path correctness (whole point of a working set) - page_in_walks_tier_chain_and_records_promotion — Fast → Bench → Cold walk order, PageFault.from_role + to_role correctness - page_in_true_cold_miss_has_none_from_role — typed signal sentinel uses to distinguish "page never existed" - audit_access_denies_cross_persona_read — typed AccessDenied with full context, same contract PR-2's trait test pins - page_out_observes_target_tier_and_handles_unconfigured — typed RoleNotConfigured for "this hardware doesn't have that role" - page_out_skips_pinned_pages_silently — composition pin contract - working_set_snapshot_reflects_page_in_state — diagnostic helper - tier_count_reflects_configured_tiers — O(1) governor diagnostic 56 genome:: tests total (PR-1's 35 + PR-2's 13 + PR-3's 8). No regressions across other 2566 lib tests. Stack #1339 / #1343 — CBAR-PIECE-2 PR-3 artifact dispatch (mine) #1344 — audit-recorder (codex's, subscribes to AccessDenied) #1346 — working-set-manager PR-1: data types (mine) #1353 — working-set-manager PR-2: traits (mine) THIS PR — working-set-manager PR-3: per-process impl (mine) NEXT — PR-4: bus integration + eviction-callback wiring + check_permission + GenomeRegion/Op types Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacks on #1331 (CBAR-PIECE-5 PR-1, residency gate types). PR-1 defined the typed gate; this PR-2 reads a real GGUF file and produces the
QwenModelMetadatathe gate consumes. PR-3 wires both probe + this loader into the turn dispatcher with enforcement.Same pure-functions cadence as PR-1 — file I/O lives in a thin wrapper, all parsing logic lives in helpers unit-testable without GGUF fixtures.
What ships in
inference_capability/gguf_loader.rspub fn read_qwen_model_metadata(path: &Path) -> Result<QwenModelMetadata, String>— thin file-opener usingbackends::gguf_file::Contentalready in the crate. No new dependencies.pub(crate) fn file_type_to_bytes_per_param(ft: u32) -> Result<f64, String>— maps GGUFgeneral.file_typeenum to bytes-per-weight. Covers the full shipped quantization set (Q4_0/Q4_1/Q4_K_S/Q4_K_M/Q5_0/Q5_1/Q5_K_S/Q5_K_M/Q6_K/Q8_0, IQ-series sub-2-bit, F16/F32/BF16). UnknownftreturnsErrwith the value named.pub(crate) fn layer_kinds_for_architecture(arch: &str) -> Vec<String>— lookup table for Vulkan-llama.cpp gaps:qwen3moe → [moe_gate, sliding_window_attn],qwen3 → [sliding_window_attn], everything else →[]. Pinned by a dedicated test so renames must land in both the table +residency.rs's matching test simultaneously.Failure-mode discipline
general.architecture{arch}.block_countgeneral.file_typegeneral.parameter_countgeneral.nameSame posture as
backends::read_gguf_metadata(Joel's 2026-04-23 fix removed all silent-llama-fallback paths there).Test plan
15 passing on
cargo test --lib --features metal,accelerate inference_capability::gguf_loader::file_type_to_bytes_per_param(7):0, 1, 2, 7, 8, 14, 15, 17, 18, 32) all have entries4, 5, 6) return Errlayer_kinds_for_architecture(5):qwen3moelists both moe_gate + sliding_window_attnqwen3lists sliding_window_attn onlyqwen2+qwen2vlempty (Vulkan handles them)mistral,phi3, empty,future-model) returns emptyread_qwen_model_metadataI/O (2):VDD evidence
N/A — pure-data loader, no inference dispatch. Evidence lands with PR-3 (enforcement integration).
Stack
After #1331 merges, this PR rebases cleanly onto canary.