Skip to content

refactor(inference-grpc,PIECE-8): delete hardcoded worker-count ceilings + magic constants#1340

Merged
joelteply merged 1 commit into
canaryfrom
feat/piece8-delete-inference-grpc-get-num-workers
May 16, 2026
Merged

refactor(inference-grpc,PIECE-8): delete hardcoded worker-count ceilings + magic constants#1340
joelteply merged 1 commit into
canaryfrom
feat/piece8-delete-inference-grpc-get-num-workers

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Summary

CBAR-PIECE-8 (vhsm-d1f4 audit pass 1, surfaced again in #1316 ALPHA-GAP's 'Concrete deletion target' callout): get_num_workers() in inference-grpc/main.rs had three anti-patterns that violate the dynamic / broker-owned-concurrency rule. All three deleted.

Anti-patterns deleted

Was Why deleted
1 clamp(1, 8) ceiling on the env-var path A Blackwell with 128GB RAM capped to 8 workers, same as a MacBook Air. Supervisor's value must pass through verbatim.
2 clamp(1, 4) ceiling + magic 2GB-per-worker constant on autodetect The 2GB number is wrong for every model that isn't a 7B Q4_K_M. Hardcoded ceiling silently capped throughput on big hardware.
3 Silent Default: 2 workers fallback when sys-info fails The exact "guess and degrade" anti-pattern vhsm-d1f4 audit pass 1 + 6 explicitly called out.

Also deleted: ~/.continuum/config.env file reading (static-config-file violates dynamic rule); sys-info crate dep (only consumed by the deleted auto-detect path).

New resolve_num_workers()

  1. INFERENCE_WORKERS env var — the channel a supervising continuum-core sets at process spawn (broker-derived). Value passes through verbatim. No clamping. Supervisor knows the live hardware + memory pressure; this binary doesn't second-guess.

  2. Env var unsetnum_cpus::get_physical().max(1). Hardware-derived, never zero, one info log so operator sees the fallback. Documents that continuum-core supervisor SHOULD set INFERENCE_WORKERS based on its PressureBroker lease — the broker integration is the next PR in this chain.

  3. INFERENCE_WORKERS=0 or invalidErr with bad value named. main() propagates to abort startup. No silent default. Surfaces the config bug at the source instead of launching with a dead pool.

Test plan

14 passing on cargo test --no-default-features -- --test-threads=1:

  • env var passes through verbatim (8)
  • env var=64 not capped (pins no-ceiling guarantee; would have been clamped to 8 before)
  • env var=0 → Err with 0 in message
  • env var=not-a-number → Err with value named (operator sees what was set)
  • env var unset → num_cpus::get_physical() fallback (matches host)
  • env var empty (INFERENCE_WORKERS=) → Err (empty ≠ unset; user meant to set something)
  • env var=1 (lower boundary) → passes
  • env var=-1 (negative, shell underflow case) → Err

Note: env-mutating tests must run serial (--test-threads=1). Pinned in the with_env helper docstring + the test module name makes it discoverable.

What this enables (CBAR-SUBSTRATE alignment)

One less hardcoded ceiling between the supervisor's PressureBroker and the actual inference pool size. Once a future PR wires continuum-core to spawn inference-grpc with INFERENCE_WORKERS=<broker-lease>, the concurrency budget is dynamic + supervisor-controlled end-to-end. The deletion landed here unblocks that wiring without further refactoring of inference-grpc.

Closes one of the three deletion targets listed in #1316 ALPHA-GAP's 'Concrete deletion target' callout.

Coordination

Two prior attempts to edit this file (per local stash history) tripped multi-tab races + got flagged by vhsm-d1f4 as keeping env-var static-config reflex. This PR keeps the env var ONLY as the supervisor-set channel; deletes the file-reading + magic-constant scaffolding. Should land cleanly because the scope is narrow + non-conflicting with concurrent docs PRs.

…ngs + magic constants

CBAR-PIECE-8 (vhsm-d1f4 audit pass 1, surfaced again in #1316 ALPHA-GAP):
get_num_workers() in inference-grpc/main.rs had three anti-patterns
that violate the dynamic / broker-owned-concurrency rule:

(a) clamp(1, 8) ceiling on the env-var path
(b) clamp(1, 4) ceiling on the autodetect path + magic 2GB-per-worker
    constant that's wrong for every model that isn't a 7B Q4_K_M
(c) silent fallback to "2 workers" when sys-info fails

All three deleted. New resolve_num_workers():

1. INFERENCE_WORKERS env var is the channel a supervising continuum-core
   sets at process spawn (broker-derived). Value passes through
   verbatim — no clamping. Supervisor knows the live hardware + memory
   pressure; this binary doesn't second-guess.

2. INFERENCE_WORKERS unset → num_cpus::get_physical().max(1). Hardware-
   derived, never zero, one info log so operator sees the fallback.
   Documents that continuum-core supervisor SHOULD set INFERENCE_WORKERS
   based on its PressureBroker lease (the broker integration is the
   next PR in this chain).

3. INFERENCE_WORKERS=0 or invalid → Err with bad value named, main()
   propagates the error to abort startup. No silent default. Surfaces
   the config bug at the source.

Deleted:
- ~/.continuum/config.env file reading (static-config violates
  dynamic rule; env var is the cross-process channel now)
- sys-info crate dep (was only used for the deleted auto-detect path)
- magic 2GB-per-worker constant
- clamp(1, 4) / clamp(1, 8) ceilings
- 'Default: 2 workers' silent fallback

Added: num_cpus crate dep (replaces sys-info; was already in
continuum-core's deps via the workspace).

Tests: 14 passing on cargo test --no-default-features
  -- --test-threads=1 (env-mutating tests must run serial):

- env var passes through verbatim (8)
- env var=64 not capped (was clamp(1,8) → 8 before; pins no-ceiling)
- env var=0 → Err
- env var=not-a-number → Err with value named
- env var unset → num_cpus::get_physical() fallback
- env var empty → Err (empty != unset; refuse rather than fallback)
- env var=1 (lower boundary) → passes
- env var=-1 (negative) → Err (defensive against shell underflow)

What this enables (CBAR-SUBSTRATE alignment): one less hardcoded
ceiling between the supervisor's PressureBroker and the actual
inference pool size. Once a future PR wires continuum-core to spawn
inference-grpc with INFERENCE_WORKERS=<broker-lease>, the concurrency
budget is dynamic + supervisor-controlled end-to-end. The deletion
landed here unblocks that wiring without further refactoring.

Closes one of the three deletion targets listed in #1316 ALPHA-GAP's
'Concrete deletion target' callout.
@joelteply joelteply merged commit ec4b361 into canary May 16, 2026
3 checks passed
@joelteply joelteply deleted the feat/piece8-delete-inference-grpc-get-num-workers branch May 16, 2026 22:04
joelteply pushed a commit that referenced this pull request May 16, 2026
…ning; navigate to MODULE-CATALOG queue

Second refresh of ALPHA-GAP Immediate Next Actions to reflect work
landed since #1316 merged. Six items closed; navigation into
MODULE-CATALOG queue made explicit.

Closed: #6 contract widening (#1341), #8 GRID-INFERENCE-ROUTING PR-1
(#1315), CBAR-PIECE-5 end-to-end (#1331/#1333/#1335/#1338),
PIECE-8 inference-grpc hardcoded-clamps (#1340), doc family
architecture surface (#1324/#1327/#1332/#1336/#1337 open;
#1316/#1317/#1320/#1329 merged).

Item #9 reorganized to point at MODULE-CATALOG's 'Next Modules To
Build' queue (audit-recorder → threat-detector → working-set-manager
→ demand-aligned-recall → substrate-governor).

Adds closeout summary section listing what's done, what's open
(5 architecture-doc PRs ready for review + 2 airc PRs), and what's
queued (5 modules with dependency state + LoC + acceptance criteria
in MODULE-CATALOG).

Doc-driven development cycle is working: doc spec → implementing
agent picks up → ships PR → next spec referenced.
joelteply added a commit that referenced this pull request May 16, 2026
…ning; navigate to MODULE-CATALOG queue (#1342)

Second refresh of ALPHA-GAP Immediate Next Actions to reflect work
landed since #1316 merged. Six items closed; navigation into
MODULE-CATALOG queue made explicit.

Closed: #6 contract widening (#1341), #8 GRID-INFERENCE-ROUTING PR-1
(#1315), CBAR-PIECE-5 end-to-end (#1331/#1333/#1335/#1338),
PIECE-8 inference-grpc hardcoded-clamps (#1340), doc family
architecture surface (#1324/#1327/#1332/#1336/#1337 open;
#1316/#1317/#1320/#1329 merged).

Item #9 reorganized to point at MODULE-CATALOG's 'Next Modules To
Build' queue (audit-recorder → threat-detector → working-set-manager
→ demand-aligned-recall → substrate-governor).

Adds closeout summary section listing what's done, what's open
(5 architecture-doc PRs ready for review + 2 airc PRs), and what's
queued (5 modules with dependency state + LoC + acceptance criteria
in MODULE-CATALOG).

Doc-driven development cycle is working: doc spec → implementing
agent picks up → ships PR → next spec referenced.

Co-authored-by: Test <test@test.com>
joelteply added a commit that referenced this pull request May 16, 2026
…ierarchy + paging (#1346)

PR-1 of working-set-manager (MODULE-CATALOG §VII + GENOME-FOUNDRY-
SENTINEL Parts 2/3/4). Pure data + serde + ts-rs exports. No traits,
no I/O, no async, no wiring — those land in PR-2/PR-3.

Mirrors the slice shape that worked for CBAR-PIECE-2 PR-1 (#1321) +
PIECE-5 PR-1 (#1331): ship the data shape first, hang behaviors on
it incrementally.

What lands

- TierRole (Fast/Warm/Bench/Cold/Frozen) + is_present_on_uma helper
- EvictionPolicy + canonical_for(role) pinning the per-role policy
  table from GENOME-FOUNDRY-SENTINEL Part 2
- TierCapacity + available_bytes (saturating) + utilization (zero-safe)
- EvictionRecord (trace bus event shape — PR-3 wires through #1339+
  #1343 artifact dispatch)
- TierError + Display + Error
- PageKind / PageOffset (Whole / Expert / Range)
- PageRef { kind, artifact, offset } — Hash+Eq for HashMap-key use
- PageHandle (what page_in returns)
- ResidentPage + WorkingSetCapacity + WorkingSet
- PageFault + AccessDenied (typed events; audit-recorder #1344
  subscribes to AccessDenied as one of its inputs)
- PersonaId(Uuid) + ArtifactId(Uuid) typed newtypes — the type
  system catches swapped arguments at audit_access(persona, page)
  sites. Wire is transparent (UUID string).

What is deliberately deferred

- WorkingSetManager trait + page_in/page_out/audit_access (PR-2)
- TierStore trait + per-role impls (separate PR set)
- MMU permission table enforcement (PR-2 or PR-3)
- PageFault/EvictionRecord publishing via artifact dispatch (PR-3)
- Hardware-anchor Vec<TierConfig> from governor (substrate-governor
  lane — codex's #1345)

Tests

35 tests on genome:: pin every invariant the type system + serde
encoding guarantee. 35/35 pass. No regressions across other 2467
lib tests.

Clippy baseline bump 146→148 — drift from canary HEAD; the +2
warnings are NOT from genome code (zero clippy hits in genome/).
They land via codex's recent #1340/#1341/#1344/#1345 merges that
didn't bump the file. Bumping here so the ratchet stays meaningful
for the NEXT PR to gate against.

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant