refactor(inference-grpc,PIECE-8): delete hardcoded worker-count ceilings + magic constants#1340
Merged
joelteply merged 1 commit intoMay 16, 2026
Conversation
…ngs + magic constants CBAR-PIECE-8 (vhsm-d1f4 audit pass 1, surfaced again in #1316 ALPHA-GAP): get_num_workers() in inference-grpc/main.rs had three anti-patterns that violate the dynamic / broker-owned-concurrency rule: (a) clamp(1, 8) ceiling on the env-var path (b) clamp(1, 4) ceiling on the autodetect path + magic 2GB-per-worker constant that's wrong for every model that isn't a 7B Q4_K_M (c) silent fallback to "2 workers" when sys-info fails All three deleted. New resolve_num_workers(): 1. INFERENCE_WORKERS env var is the channel a supervising continuum-core sets at process spawn (broker-derived). Value passes through verbatim — no clamping. Supervisor knows the live hardware + memory pressure; this binary doesn't second-guess. 2. INFERENCE_WORKERS unset → num_cpus::get_physical().max(1). Hardware- derived, never zero, one info log so operator sees the fallback. Documents that continuum-core supervisor SHOULD set INFERENCE_WORKERS based on its PressureBroker lease (the broker integration is the next PR in this chain). 3. INFERENCE_WORKERS=0 or invalid → Err with bad value named, main() propagates the error to abort startup. No silent default. Surfaces the config bug at the source. Deleted: - ~/.continuum/config.env file reading (static-config violates dynamic rule; env var is the cross-process channel now) - sys-info crate dep (was only used for the deleted auto-detect path) - magic 2GB-per-worker constant - clamp(1, 4) / clamp(1, 8) ceilings - 'Default: 2 workers' silent fallback Added: num_cpus crate dep (replaces sys-info; was already in continuum-core's deps via the workspace). Tests: 14 passing on cargo test --no-default-features -- --test-threads=1 (env-mutating tests must run serial): - env var passes through verbatim (8) - env var=64 not capped (was clamp(1,8) → 8 before; pins no-ceiling) - env var=0 → Err - env var=not-a-number → Err with value named - env var unset → num_cpus::get_physical() fallback - env var empty → Err (empty != unset; refuse rather than fallback) - env var=1 (lower boundary) → passes - env var=-1 (negative) → Err (defensive against shell underflow) What this enables (CBAR-SUBSTRATE alignment): one less hardcoded ceiling between the supervisor's PressureBroker and the actual inference pool size. Once a future PR wires continuum-core to spawn inference-grpc with INFERENCE_WORKERS=<broker-lease>, the concurrency budget is dynamic + supervisor-controlled end-to-end. The deletion landed here unblocks that wiring without further refactoring. Closes one of the three deletion targets listed in #1316 ALPHA-GAP's 'Concrete deletion target' callout.
joelteply
pushed a commit
that referenced
this pull request
May 16, 2026
…ning; navigate to MODULE-CATALOG queue Second refresh of ALPHA-GAP Immediate Next Actions to reflect work landed since #1316 merged. Six items closed; navigation into MODULE-CATALOG queue made explicit. Closed: #6 contract widening (#1341), #8 GRID-INFERENCE-ROUTING PR-1 (#1315), CBAR-PIECE-5 end-to-end (#1331/#1333/#1335/#1338), PIECE-8 inference-grpc hardcoded-clamps (#1340), doc family architecture surface (#1324/#1327/#1332/#1336/#1337 open; #1316/#1317/#1320/#1329 merged). Item #9 reorganized to point at MODULE-CATALOG's 'Next Modules To Build' queue (audit-recorder → threat-detector → working-set-manager → demand-aligned-recall → substrate-governor). Adds closeout summary section listing what's done, what's open (5 architecture-doc PRs ready for review + 2 airc PRs), and what's queued (5 modules with dependency state + LoC + acceptance criteria in MODULE-CATALOG). Doc-driven development cycle is working: doc spec → implementing agent picks up → ships PR → next spec referenced.
joelteply
added a commit
that referenced
this pull request
May 16, 2026
…ning; navigate to MODULE-CATALOG queue (#1342) Second refresh of ALPHA-GAP Immediate Next Actions to reflect work landed since #1316 merged. Six items closed; navigation into MODULE-CATALOG queue made explicit. Closed: #6 contract widening (#1341), #8 GRID-INFERENCE-ROUTING PR-1 (#1315), CBAR-PIECE-5 end-to-end (#1331/#1333/#1335/#1338), PIECE-8 inference-grpc hardcoded-clamps (#1340), doc family architecture surface (#1324/#1327/#1332/#1336/#1337 open; #1316/#1317/#1320/#1329 merged). Item #9 reorganized to point at MODULE-CATALOG's 'Next Modules To Build' queue (audit-recorder → threat-detector → working-set-manager → demand-aligned-recall → substrate-governor). Adds closeout summary section listing what's done, what's open (5 architecture-doc PRs ready for review + 2 airc PRs), and what's queued (5 modules with dependency state + LoC + acceptance criteria in MODULE-CATALOG). Doc-driven development cycle is working: doc spec → implementing agent picks up → ships PR → next spec referenced. Co-authored-by: Test <test@test.com>
4 tasks
joelteply
added a commit
that referenced
this pull request
May 16, 2026
…ierarchy + paging (#1346) PR-1 of working-set-manager (MODULE-CATALOG §VII + GENOME-FOUNDRY- SENTINEL Parts 2/3/4). Pure data + serde + ts-rs exports. No traits, no I/O, no async, no wiring — those land in PR-2/PR-3. Mirrors the slice shape that worked for CBAR-PIECE-2 PR-1 (#1321) + PIECE-5 PR-1 (#1331): ship the data shape first, hang behaviors on it incrementally. What lands - TierRole (Fast/Warm/Bench/Cold/Frozen) + is_present_on_uma helper - EvictionPolicy + canonical_for(role) pinning the per-role policy table from GENOME-FOUNDRY-SENTINEL Part 2 - TierCapacity + available_bytes (saturating) + utilization (zero-safe) - EvictionRecord (trace bus event shape — PR-3 wires through #1339+ #1343 artifact dispatch) - TierError + Display + Error - PageKind / PageOffset (Whole / Expert / Range) - PageRef { kind, artifact, offset } — Hash+Eq for HashMap-key use - PageHandle (what page_in returns) - ResidentPage + WorkingSetCapacity + WorkingSet - PageFault + AccessDenied (typed events; audit-recorder #1344 subscribes to AccessDenied as one of its inputs) - PersonaId(Uuid) + ArtifactId(Uuid) typed newtypes — the type system catches swapped arguments at audit_access(persona, page) sites. Wire is transparent (UUID string). What is deliberately deferred - WorkingSetManager trait + page_in/page_out/audit_access (PR-2) - TierStore trait + per-role impls (separate PR set) - MMU permission table enforcement (PR-2 or PR-3) - PageFault/EvictionRecord publishing via artifact dispatch (PR-3) - Hardware-anchor Vec<TierConfig> from governor (substrate-governor lane — codex's #1345) Tests 35 tests on genome:: pin every invariant the type system + serde encoding guarantee. 35/35 pass. No regressions across other 2467 lib tests. Clippy baseline bump 146→148 — drift from canary HEAD; the +2 warnings are NOT from genome code (zero clippy hits in genome/). They land via codex's recent #1340/#1341/#1344/#1345 merges that didn't bump the file. Bumping here so the ratchet stays meaningful for the NEXT PR to gate against. Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CBAR-PIECE-8 (vhsm-d1f4 audit pass 1, surfaced again in #1316 ALPHA-GAP's 'Concrete deletion target' callout):
get_num_workers()ininference-grpc/main.rshad three anti-patterns that violate the dynamic / broker-owned-concurrency rule. All three deleted.Anti-patterns deleted
clamp(1, 8)ceiling on the env-var pathclamp(1, 4)ceiling + magic2GB-per-workerconstant on autodetectDefault: 2 workersfallback when sys-info failsAlso deleted:
~/.continuum/config.envfile reading (static-config-file violates dynamic rule);sys-infocrate dep (only consumed by the deleted auto-detect path).New
resolve_num_workers()INFERENCE_WORKERSenv var — the channel a supervising continuum-core sets at process spawn (broker-derived). Value passes through verbatim. No clamping. Supervisor knows the live hardware + memory pressure; this binary doesn't second-guess.Env var unset →
num_cpus::get_physical().max(1). Hardware-derived, never zero, one info log so operator sees the fallback. Documents that continuum-core supervisor SHOULD set INFERENCE_WORKERS based on its PressureBroker lease — the broker integration is the next PR in this chain.INFERENCE_WORKERS=0or invalid →Errwith bad value named.main()propagates to abort startup. No silent default. Surfaces the config bug at the source instead of launching with a dead pool.Test plan
14 passing on
cargo test --no-default-features -- --test-threads=1:0in messagenot-a-number→ Err with value named (operator sees what was set)num_cpus::get_physical()fallback (matches host)INFERENCE_WORKERS=) → Err (empty ≠ unset; user meant to set something)Note: env-mutating tests must run serial (
--test-threads=1). Pinned in thewith_envhelper docstring + the test module name makes it discoverable.What this enables (CBAR-SUBSTRATE alignment)
One less hardcoded ceiling between the supervisor's
PressureBrokerand the actual inference pool size. Once a future PR wires continuum-core to spawn inference-grpc withINFERENCE_WORKERS=<broker-lease>, the concurrency budget is dynamic + supervisor-controlled end-to-end. The deletion landed here unblocks that wiring without further refactoring of inference-grpc.Closes one of the three deletion targets listed in #1316 ALPHA-GAP's 'Concrete deletion target' callout.
Coordination
Two prior attempts to edit this file (per local stash history) tripped multi-tab races + got flagged by vhsm-d1f4 as keeping env-var static-config reflex. This PR keeps the env var ONLY as the supervisor-set channel; deletes the file-reading + magic-constant scaffolding. Should land cleanly because the scope is narrow + non-conflicting with concurrent docs PRs.