feat: Placement Group based subprocess manager - inference (3/5) by praateekmahajan · Pull Request #1833 · NVIDIA-NeMo/Curator

praateekmahajan · 2026-04-17T22:13:02Z

Reviewer tip: this is a zero-consumer infra PR. DynamoBackend still raises NotImplementedError at the end of this PR (unchanged from #1820). The two modules to review are nemo_curator/core/serve/subprocess_mgr.py (Ray-actor + subprocess lifecycle) and nemo_curator/core/serve/placement.py (PG planner + builders + bundle helpers). Everything else is tests or small constants moves.

Description

PR 3/5 of the inference-server restack
land the Ray-placement-group-based subprocess manager as a self-contained infra library; nothing in nemo_curator/ imports these modules yet — PR 4 (DynamoBackend) will
split responsibilities across two files:
- serve/subprocess_mgr.py — ManagedSubprocess + _SubprocessActor factory + SubprocessError + graceful_stop_actors
- serve/placement.py — ReplicaBundleSpec, plan_replica_bundle_shape, build_pg / build_replica_pg, bundle-scoped helpers (get_bundle_node_ip, get_free_port_in_bundle), and remove_named_pgs_with_prefix for orphan sweep
Dynamo-specific placement / naming helpers live in a new serve/dynamo/infra.py (build_infra_pg, build_worker_actor_name, engine_kwargs_to_cli_flags) so subprocess_mgr / placement stay backend-generic
dedup CURATOR_IGNORE_RAY_HEAD_NODE parsing into a shared nemo_curator.core.utils.ignore_ray_head_node(); nemo_curator/backends/base.py now uses the same helper instead of parsing the env var inline
add check_total_gpu_capacity next to get_available_cpu_gpu_resources in nemo_curator/backends/utils.py — coarse pre-check so insufficient-GPU cases fail fast instead of hanging on pg.ready()

Design callouts

Detached, named PGs. Each replica PG is created with lifetime=\"detached\" and a stable name= so a reconnecting driver (server.start → pipeline.run → server.stop flow) can find and reap them across ray.shutdown() / ray.init() cycles. remove_named_pgs_with_prefix is the orphan sweep.
Bundle-shape planner. Single-node replicas use one STRICT_PACK bundle; multi-node TP replicas use STRICT_SPREAD with an equal per-node split. Asymmetric splits (1+3 for TP=4) are rejected up front because vLLM's distributed executor requires identical local_world_size per node.
Head-node exclusion. CURATOR_IGNORE_RAY_HEAD_NODE=1 is translated to a per-bundle {\"ray.io/node-type\": \"worker\"} label selector (Ray-native; matches Ray Serve's deployment-scheduler pattern). Auto-satisfied on Anyscale; OSS Ray users must start worker nodes with ray start --labels ray.io/node-type=worker.
Subprocess-group reap. Subprocesses are launched with start_new_session=True so the child becomes a process-group leader. _SubprocessActor overrides __ray_terminate__ and registers an atexit hook so the subprocess tree is SIGTERM→SIGKILL'd on the process group whenever the actor shuts down gracefully.
SIGKILL escalation in teardown. graceful_stop_actors runs in three steps per actor: (1) actor.stop() with bounded wait, (2) if stop didn't drain, actor.force_sigkill_subprocess() — a non-blocking os.killpg(..., SIGKILL) scheduled on the actor's node — then (3) ray.kill. Without step 2, ray.kill bypasses __ray_terminate__ / atexit and orphans the subprocess tree.
ManagedSubprocess.spawn is a classmethod on the data class it returns — ergonomic factory that avoids the split between a free-function spawner and the handle type. The handle exposes .stop(), .stop_many(), .is_alive(), .pid(), .read_log_tail(), .wait() so callers don't hand-write ray.get(actor.X.remote()).

What's intentionally NOT in this PR

any change to DynamoBackend.start() / .stop() — still the NotImplementedError placeholder from feat!: Typed Serve Config + Dyamo config stub - inference (2/5) #1820
any consumer import of subprocess_mgr / placement from production code (tests are the only consumer)
the Bundle(pg, bundle_index) wrapper idea — deferred until PR 4's DynamoBackend wiring shows how often the pair repeats

Internal usage (preview of how PR 4 will consume this)

from nemo_curator.core.serve.placement import (
    build_replica_pg, get_bundle_node_ip, plan_replica_bundle_shape,
)
from nemo_curator.core.serve.subprocess_mgr import ManagedSubprocess

spec = plan_replica_bundle_shape(tp_size=4)      # single STRICT_PACK bundle or N STRICT_SPREAD
pg   = build_replica_pg(spec, name="dyn_model_0")
proc = ManagedSubprocess.spawn(
    "Dynamo_DP0_Qwen3-0.6B",
    pg, bundle_index=0,
    num_gpus=spec.per_node_gpus,
    python_args=["-m", "dynamo.vllm", "--model", "Qwen/Qwen3-0.6B", "--tensor-parallel-size", "4"],
)
# ... later ...
proc.stop()                                      # SIGTERM -> SIGKILL escalation inside the actor
ManagedSubprocess.stop_many([proc1, proc2, ...]) # parallel teardown

Verification

ruff check nemo_curator/core/serve/ tests/core/serve/
pytest tests/core/serve/test_placement.py tests/core/serve/test_subprocess_mgr.py tests/core/serve/dynamo/test_infra.py tests/core/test_utils.py tests/backends/test_utils.py::TestCheckTotalGpuCapacity -m \"not gpu\" — 21 passed (~28s; setup-bound on Ray cluster autouse fixture)
CUDA_VISIBLE_DEVICES=0,1 pytest tests/core/serve/test_placement.py tests/core/serve/test_subprocess_mgr.py -m gpu — 3 passed (TestReplicaLifecycle end-to-end, actor-death run_ref surfacing, orphan PG cleanup by prefix)

Checklist

I am familiar with the Contributing Guide.
New or Existing tests cover these changes.
The documentation is up to date with these changes.

greptile-apps · 2026-04-17T22:18:50Z

Greptile Summary

This PR lands the Ray-placement-group-based subprocess manager as a self-contained infra library, split across placement.py (PG planning + construction + bundle helpers) and subprocess_mgr.py (actor factory + subprocess lifecycle). It also deduplicates CURATOR_IGNORE_RAY_HEAD_NODE parsing into a shared helper and adds a check_total_gpu_capacity pre-check. Prior review concerns (private Ray API access, tp_size <= 0 guard, module-level constant mutation) are addressed.

Confidence Score: 5/5

Safe to merge; all remaining findings are P2 style/defensive-coding suggestions that do not affect correctness of the happy path or orphan-cleanup guarantees.

Prior review concerns (private Ray API, missing tp_size guard, constant mutation) are resolved. The two new findings are both P2: the TOCTOU race in _stop_subprocess causes unnecessary SIGKILL escalation on clean subprocess exits but leaves no orphans, and the check_total_gpu_capacity semantic mismatch is a docstring/naming concern rather than a runtime failure. Tests cover the critical lifecycle paths end-to-end.

nemo_curator/core/serve/subprocess_mgr.py for the _stop_subprocess signal-sending race; nemo_curator/backends/utils.py for the available-vs-total GPU capacity semantics.

Important Files Changed

Filename	Overview
nemo_curator/core/serve/subprocess_mgr.py	New file: Ray actor + subprocess lifecycle manager; well-structured with SIGTERM→SIGKILL escalation, but `_stop_subprocess` has a TOCTOU race where `proc.terminate()` can raise if the process exits naturally between `poll()` and signal delivery
nemo_curator/core/serve/placement.py	New file: PG planner + construction + bundle helpers; clean logic for single/multi-node TP splits with proper `tp_size < 1` guard (addressed from prior review) and head-node filtering
nemo_curator/core/serve/dynamo/infra.py	New file: Dynamo-specific helpers (`build_infra_pg`, `build_worker_actor_name`, `engine_kwargs_to_cli_flags`); well-isolated from generic placement/subprocess modules
nemo_curator/backends/utils.py	Adds `check_total_gpu_capacity`; functional but checks currently-available rather than total installed GPUs, which may cause false-positive failures during cluster load
nemo_curator/backends/base.py	Replaces inline env-var parsing with shared `ignore_ray_head_node()` helper; semantically equivalent and correctly deduplicates the logic
nemo_curator/core/utils.py	Adds `ignore_ray_head_node()` helper; consistent with old inline parsing, well-tested with full truth table in `tests/core/test_utils.py`
nemo_curator/core/serve/constants.py	Adds subprocess/PG tunables (SIGTERM/SIGKILL waits, PG timeout, worker label, NOSET env fragment); well-documented with docstrings
tests/core/serve/test_subprocess_mgr.py	Comprehensive lifecycle tests covering PG readiness, bundle IP/port discovery, CUDA env injection, subprocess env overrides, graceful stop, and actor-death-surfacing via run_ref

Sequence Diagram

sequenceDiagram
    participant Driver
    participant placement.py
    participant subprocess_mgr.py
    participant RayActor as _SubprocessActor
    participant Subprocess

    Driver->>placement.py: plan_replica_bundle_shape(tp_size)
    placement.py-->>Driver: ReplicaBundleSpec
    Driver->>placement.py: build_replica_pg(spec, name)
    placement.py->>Ray: placement_group lifetime=detached
    placement.py->>Ray: pg.ready() timeout=180s
    placement.py-->>Driver: PlacementGroup
    Driver->>subprocess_mgr.py: ManagedSubprocess.spawn(label, pg, bundle_index)
    subprocess_mgr.py->>RayActor: actor_cls.options(num_gpus=N, pg_strategy).remote()
    subprocess_mgr.py->>RayActor: actor.initialize.remote(command, env, log_file)
    RayActor->>RayActor: inject CUDA_VISIBLE_DEVICES from accelerator IDs
    RayActor->>Subprocess: Popen(command, start_new_session=True)
    subprocess_mgr.py->>RayActor: actor.run.remote() returns run_ref
    subprocess_mgr.py-->>Driver: ManagedSubprocess(label, actor, run_ref)
    Driver->>subprocess_mgr.py: proc.stop()
    subprocess_mgr.py->>subprocess_mgr.py: graceful_stop_actors
    subprocess_mgr.py->>RayActor: actor.stop.remote() SIGTERM group
    RayActor->>Subprocess: os.killpg(pgid, SIGTERM)
    alt stop drains in time
        RayActor-->>subprocess_mgr.py: rc
    else timeout
        subprocess_mgr.py->>RayActor: actor.force_sigkill_subprocess.remote()
        RayActor->>Subprocess: os.killpg(pgid, SIGKILL)
    end
    subprocess_mgr.py->>Ray: ray.kill(actor, no_restart=True)

_{Reviews (4): Last reviewed commit: "chore: trigger DCO re-check after base r..." | Re-trigger Greptile}

copy-pr-bot · 2026-04-17T22:19:37Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

praateekmahajan · 2026-04-17T22:20:34Z

/claude review

oyilmaz-nvidia

@praateekmahajan Can you give a bit more context why we need these new classes (subprocess manager for instance) and functions?

praateekmahajan · 2026-04-22T20:15:23Z

@oyilmaz-nvidia fair question on subprocess manager, the answer is that we use this in PR4/5.

This is a completely backend (dynamo / serve) agnostic PR. Rather in future if we add native SGLang support (or vLLM) this all will be leveraged then as well.
In PR 4 and PR 5, we start dynamo as a fleet of plain processes (etcd, nats-server, python -m dynamo.frontend, N × python -m dynamo.vllm) and use these classes. Here is a high level description

subprocess_mgr.py — classes / utilities to help spawn each process inside a named detached Ray actor and kill the whole process tree on teardown.
placement.py — plan ray's placement group bundles for TP (e.g. if it fits one node vs when it doesn't i.e. STRICT_PACK vs STRICT_SPREAD) and resolve rank-0's node IP + free ports from inside the bundle. Backend-agnostic.

PR 4's DynamoBackend.start() is ~200 lines of glue on top: uses plan_replica_bundle_shape / ManagedSubprocess.spawn.
PR 5 adds disagg by calling the same primitives once per prefill/decode worker — no new lifecycle code.

…rs infra library Land the Ray-placement-group-backed subprocess manager and its SubprocessError type as a self-contained, zero-consumer infra library. Nothing in nemo_curator imports these modules yet -- DynamoBackend is still the NotImplementedError placeholder from PR 2 -- so the diff is reviewable in isolation before PR 4 wires it into the real backend. Concepts landed by the module: - Replica bundle-shape planner: STRICT_PACK single-bundle when the replica fits on one node; STRICT_SPREAD N-bundle across nodes otherwise. vLLM requires an equal per-node local_world_size so asymmetric splits (e.g. 1+3 for TP=4) are rejected up front. - CURATOR_IGNORE_RAY_HEAD_NODE=1 is translated to a per-bundle {"ray.io/node-type": "worker"} label selector (Ray-native; matches the Ray Serve deployment-scheduler pattern) rather than a Python-side inventory filter. OSS Ray nodes must be launched with the label. - Detached, named placement groups under a stable "nemo_curator_dynamo" namespace so a reconnecting driver can find and reap them across the server.start -> pipeline.run -> server.stop ray.init cycles. - ManagedSubprocess actor overrides __ray_terminate__ and registers an atexit hook; the subprocess process-group is reaped even when Ray hard-kills the actor. graceful_stop_actors parallelises teardown and falls back to ray.kill on timeout. - Post-pg.ready() bundle helpers (get_bundle_node_ip, get_free_port_in_bundle) resolve master-addr and bind ports on the same node the consuming actor will land on, avoiding pre-scheduling IP lookups. - remove_named_pgs_with_prefix as an orphan sweep keyed on the stable PG name prefix. Tests are intentionally consolidated: pure-logic cases are parametrised, and the GPU slice collapses PG creation, bundle IP/port discovery, actor spawn, CUDA env propagation, subprocess env override semantics, inherited-PATH behaviour, and graceful stop into one cohesive end-to-end test that shares a single PG. This avoids N x (Ray actor + PG + subprocess) startup cost per run while still covering the public-API invariants. Signed-off-by: Praateek <praateekm@nvidia.com> Signed-off-by: Praateek <praateekm@gmail.com>

… dedup head-node env check Scope-narrow pass on PR 3 following the subprocess_mgr / Dynamo-infra separation discussion: subprocess_mgr.py is now strictly generic Ray-placement-group + subprocess infrastructure. Dynamo-specific pieces moved out: - NEMO_CURATOR_DYNAMO_NAMESPACE -> dynamo/constants.py - build_infra_pg (etcd+NATS+FE) -> dynamo/infra.py - build_worker_actor_name -> dynamo/infra.py - engine_kwargs_to_cli_flags -> dynamo/infra.py (also renamed from the underscore-private form because it is a public Dynamo-layer helper now) Shared Ray-env helper lives once, not twice: core/utils.py grows ignore_ray_head_node(). backends/base.py now uses it instead of its own inline parse of CURATOR_IGNORE_RAY_HEAD_NODE, and subprocess_mgr.plan_replica_bundle_shape calls the same helper. check_total_gpu_capacity kept (Ray's PG scheduler can hang indefinitely on pg.ready() when GPUs are oversubscribed -- a coarse pre-check gives a cleaner error) and moved next to get_available_cpu_gpu_resources in backends/utils.py. The new implementation reuses that function so ignore_head_node is honoured consistently. ManagedSubprocess grows real instance methods so callers stop hand-threading ray.get / actor.X.remote / ray_mod into free functions: - proc.is_alive(), .pid(), .read_log_tail(), .wait(timeout) - proc.stop(timeout_s=...) and classmethod stop_many(procs, ...) graceful_stop_actors stays as the raw-actor primitive for cases without a ManagedSubprocess handle (e.g. the reconnecting-driver orphan sweep), but drops its ray_mod parameter and imports ray directly -- tests / consumers no longer pass ray through. graceful_stop_actor (single-actor free function) deleted; callers use proc.stop() or the list primitive. A TODO comment flags that _run_in_bundle / get_free_port_in_bundle / get_bundle_node_ip (plus spawn_actor's (pg, bundle_index) tuple) could collapse into a Bundle(pg, index) wrapper once PR 4's DynamoBackend wiring demonstrates how often the same pair gets threaded through. Not doing it speculatively. Tests realigned accordingly: - test_subprocess_mgr.py: exercises proc.wait / proc.read_log_tail / proc.is_alive / proc.stop instead of the boilerplate form. - Moved test_engine_kwargs_to_cli_flags + test_build_worker_actor_name to tests/core/serve/dynamo/test_infra.py (next to the code under test). - Moved test_check_total_gpu_capacity to tests/backends/test_utils.py (next to check_total_gpu_capacity's new home), uses monkey-patched get_available_cpu_gpu_resources so it stays a pure unit test. - Added test_ignore_ray_head_node_env_parsing in tests/core/test_utils.py. Verified on 2 GPUs: - 35/35 non-GPU tests pass (subprocess_mgr + dynamo/infra + core utils + backends/test_utils::TestCheckTotalGpuCapacity) - 3/3 GPU tests pass (TestReplicaLifecycle end-to-end, test_actor_death_surfaces_via_run_ref, test_orphan_pg_cleanup_by_prefix) Signed-off-by: Praateek <praateekm@nvidia.com> Signed-off-by: Praateek <praateekm@gmail.com>

…s to constants, spawn as classmethod Three related cleanups on top of the prior PR 3 commit: 1. subprocess_mgr.py lost its placement-group half to a new placement.py. subprocess_mgr is now strictly ManagedSubprocess + the _SubprocessActor factory + graceful_stop_actors primitive; placement.py owns ReplicaBundleSpec, the planner (plan_replica_bundle_shape, _get_gpu_topology), build_pg / build_replica_pg, the bundle-scoped discovery helpers (get_bundle_node_ip, get_free_port_in_bundle, _run_in_bundle), and remove_named_pgs_with_prefix. Concrete win: dynamo/infra.py imports build_pg from placement now, which reads correctly; previously it pulled it out of subprocess_mgr which was wrong name-wise. The Bundle(pg, idx) wrapper TODO lives with the bundle helpers in placement.py. 2. The module-level tunables that used to sit at the top of subprocess_mgr moved to nemo_curator/core/serve/constants.py and dropped the underscore prefix -- they were being imported across files (dynamo/infra.py pulled PG_READY_TIMEOUT_S and WORKER_NODE_LABEL), so "private" was misleading. Same file now holds SIGTERM_WAIT_S, SIGKILL_WAIT_S, PLACEMENT_GROUP_READY_TIMEOUT_S (renamed from PG_... for readability), WORKER_NODE_LABEL, and NOSET_CUDA_RUNTIME_ENV, alongside the existing DEFAULT_SERVE_PORT / DEFAULT_SERVE_HEALTH_TIMEOUT_S. 3. spawn_actor free function is gone; spawning is now ManagedSubprocess.spawn(label, pg, bundle_index, ...) classmethod, so the factory lives on the type it returns. The previously underscore-private _build_pg is now just build_pg (public; it's imported across modules). Tests tracked the split: - tests/core/serve/test_placement.py (new) -- planner matrix, head-node exclusion, ReplicaBundleSpec, orphan PG cleanup. - tests/core/serve/test_subprocess_mgr.py narrowed to the lifecycle coverage: TestReplicaLifecycle (end-to-end spawn + env propagation + wait + read_log_tail + stop) and test_actor_death_surfaces_via_run_ref. Verified on 2 GPUs: - 21/21 non-GPU tests pass (placement planner + dynamo/infra + test_infra + test_config + test_server, etc.) - 3/3 GPU tests pass (TestReplicaLifecycle.test_end_to_end, test_actor_death_surfaces_via_run_ref, test_orphan_pg_cleanup_by_prefix) Signed-off-by: Praateek <praateekm@nvidia.com> Signed-off-by: Praateek <praateekm@gmail.com>

…ors.py ``nemo_curator/core/serve/errors.py`` held a single 23-line class (``SubprocessError``) with no consumers yet. Given SubprocessError is tightly coupled to subprocess lifecycle and only subprocess_mgr is going to raise it, it belongs next to the code that owns it. A dedicated errors module is premature until multiple sibling modules need to share exception types. Signed-off-by: Praateek <praateekm@nvidia.com> Signed-off-by: Praateek <praateekm@gmail.com>

…calation in graceful_stop_actors Review-driven cleanups on subprocess_mgr.py and placement.py: Dead code removed (absent in v2 consumer too): - ManagedSubprocess.log_file dataclass field (never read) - _SubprocessActor.log_file() method (never called) - _SubprocessActor.get_node_ip() method (placement.get_bundle_node_ip does its own remote via _run_in_bundle instead) Two-RTT -> one-RTT spawn: - _SubprocessActor.initialize() now injects CUDA_VISIBLE_DEVICES from ray.get_accelerator_ids() inside the actor. ManagedSubprocess.spawn no longer needs a separate actor.get_assigned_gpus.remote() round-trip before the actor.initialize.remote() call. - initialize() now returns the pid directly (it used to return the echo dict {"pid", "log_file"} whose log_file echo was only used to populate the now-deleted ManagedSubprocess.log_file field). File-handle safety: - initialize() wraps Popen in try/except that closes self._log_fh on launch failure, preventing the handle from leaking if the binary isn't on $PATH or exec fails. SIGKILL escalation in graceful_stop_actors: - Added _SubprocessActor.force_sigkill_subprocess() -- non-blocking os.killpg(..., SIGKILL) on the subprocess group. Escalation path when stop() is hung. - Bumped actor max_concurrency from 2 to 4 so force_sigkill_subprocess doesn't queue behind a stuck stop() + run(). - graceful_stop_actors now: (1) actor.stop() with bounded wait, (2) if stop did not drain, actor.force_sigkill_subprocess() with short timeout, (3) ray.kill. Previously step 2 was missing -- ray.kill bypasses __ray_terminate__ and atexit, so subprocesses orphaned whenever the actor went hard-kill. Lazy-import cleanup in placement.py: - Hoisted `import ray` to module top (ray is a hard dep of the package; the defensive lazy imports were noise). Enables: - @ray.remote decorators for _remote_get_free_port and _remote_get_node_ip now live at module scope. Previously the wrapper functions redefined their RemoteFunction on every call. Narration comment trimmed: - Dropped the speculative TODO(dynamo-refactor) block suggesting a Bundle(pg, idx) wrapper. That was PR-narrating noise that would rot once PR 4 lands. If the repetition turns out real in PR 4, we can add the wrapper then -- no header comment needed to tell us to. Dual validation kept: - (command is None) == (python_args is None) check remains in ManagedSubprocess.spawn for a clean caller-side ValueError. The actor-side check in initialize is now dead in practice but stays as defense-in-depth; it's three lines. Verified on 2 GPUs: - 21/21 non-GPU tests pass - 3/3 GPU tests pass (TestReplicaLifecycle end-to-end, actor-death run_ref surfacing, orphan PG cleanup by prefix) Signed-off-by: Praateek <praateekm@nvidia.com> Signed-off-by: Praateek <praateekm@gmail.com>

… fix CI fix: - tests/gpu_test_groups.json: register the two new GPU test files (tests/core/serve/test_placement.py, tests/core/serve/test_subprocess_mgr.py) under the "sdg" group, next to ray_serve/test_integration.py. Remove the private-Ray-API teardown hack: - Deleted _SubprocessActor.__ray_terminate__ entirely. Nothing in our teardown flow calls __ray_terminate__ (the driver goes through graceful_stop_actors: actor.stop -> force_sigkill_subprocess -> ray.kill), so the override was inert and the access to _ray._private.worker.global_worker was pure private-API risk. atexit is the actor's only self-registered reap path now -- standard Python, no Ray internals. - Also dropped the dead _SubprocessActor.get_assigned_gpus() method (called from spawn() before the simplify-pass two-RTT collapse, now replaced by initialize() injecting CUDA_VISIBLE_DEVICES itself). Review P2 (greptile): NOSET_CUDA_RUNTIME_ENV shallow-copy: - subprocess_mgr.py spawn(): the else-branch of the runtime_env merge was assigning `merged_runtime_env = NOSET_CUDA_RUNTIME_ENV` by reference. Mutation in-place would poison the module-level constant. Now shallow-copies via `{**NOSET_CUDA_RUNTIME_ENV}`, matching the if-branch. Review P2 (greptile): tp_size validation: - placement.plan_replica_bundle_shape now rejects tp_size < 1 with a ValueError. Previously tp_size=0 silently returned a {"CPU": 1, "GPU": 0} bundle. New parametrised test covers 0 and -1. Review P2 (greptile): head-node detection consistency: - placement._get_gpu_topology previously used ray.get_runtime_context().get_node_id() alone to identify the head, which is only correct when the driver runs on the head node. Now uses backends.utils.get_head_node_id() (resource-marker based -- looks for "node:__internal_head__") with the runtime-context as a fallback. Matches how the rest of the codebase identifies the head and is robust to driver-off-head deployments. Review P3 (claude[bot]): test coverage for engine_kwargs_to_cli_flags dict/list branches: - Added parametrised cases for list (served_model_name=["a", "b"]) and dict (generation_config={"temperature": 0.7}) values so the json.dumps path isn't an untested branch. Verified on 2 GPUs: 25/25 non-GPU pass, 3/3 GPU pass (TestReplicaLifecycle end-to-end -- which exercises the atexit-only teardown path via proc.stop() -- still green). Signed-off-by: Praateek <praateekm@nvidia.com> Signed-off-by: Praateek <praateekm@gmail.com>

Signed-off-by: Praateek <praateekm@gmail.com>

praateekmahajan requested review from a team, abhinavg4, ayushdg and oyilmaz-nvidia as code owners April 17, 2026 22:13

copy-pr-bot Bot temporarily deployed to test April 17, 2026 22:13 Inactive

praateekmahajan changed the title ~~refactor: PG-based subprocess manager infra library - inference (3/5)~~ feat: Placement Group based subprocess manager - inference (3/5) Apr 17, 2026

copy-pr-bot Bot temporarily deployed to nemo-ci April 17, 2026 22:13 Inactive

greptile-apps Bot reviewed Apr 17, 2026

View reviewed changes

praateekmahajan marked this pull request as draft April 17, 2026 22:19

claude Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread tests/core/serve/dynamo/test_infra.py

praateekmahajan force-pushed the praateek/infeserv-dynamo-2-typed-config-surface branch from 3f4dc18 to abf139e Compare April 17, 2026 22:29

praateekmahajan marked this pull request as ready for review April 18, 2026 01:39

oyilmaz-nvidia reviewed Apr 20, 2026

View reviewed changes

praateekmahajan added 6 commits April 23, 2026 11:47

praateekmahajan force-pushed the praateek/infeserv-dynamo-3-internal-split branch from 2e0108d to 1d23acb Compare April 23, 2026 18:50

praateekmahajan requested review from a team, rlratzel, sarahyurick and suiyoubi as code owners April 23, 2026 18:50

praateekmahajan changed the base branch from praateek/infeserv-dynamo-2-typed-config-surface to main April 23, 2026 18:50

copy-pr-bot Bot temporarily deployed to test April 23, 2026 18:51 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci April 23, 2026 18:51 Error

copy-pr-bot Bot temporarily deployed to nemo-ci April 23, 2026 18:51 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci April 23, 2026 18:51 Error

copy-pr-bot Bot temporarily deployed to nemo-ci April 23, 2026 18:51 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci April 23, 2026 18:51 Error

copy-pr-bot Bot temporarily deployed to nemo-ci April 23, 2026 18:51 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci April 23, 2026 18:51 Error

copy-pr-bot Bot temporarily deployed to nemo-ci April 23, 2026 18:51 Inactive

chore: trigger DCO re-check after base retarget

cea8de9

Signed-off-by: Praateek <praateekm@gmail.com>

oyilmaz-nvidia approved these changes Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Placement Group based subprocess manager - inference (3/5)#1833

feat: Placement Group based subprocess manager - inference (3/5)#1833
praateekmahajan merged 7 commits intomainfrom
praateek/infeserv-dynamo-3-internal-split

praateekmahajan commented Apr 17, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

copy-pr-bot Bot commented Apr 17, 2026

Uh oh!

praateekmahajan commented Apr 17, 2026

Uh oh!

Uh oh!

oyilmaz-nvidia left a comment

Uh oh!

praateekmahajan commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

praateekmahajan commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Design callouts

What's intentionally NOT in this PR

Internal usage (preview of how PR 4 will consume this)

Verification

Checklist

Uh oh!

greptile-apps Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

copy-pr-bot Bot commented Apr 17, 2026

Uh oh!

praateekmahajan commented Apr 17, 2026

Uh oh!

Uh oh!

oyilmaz-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

praateekmahajan commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

praateekmahajan commented Apr 17, 2026 •

edited

Loading

greptile-apps Bot commented Apr 17, 2026 •

edited

Loading