refactor: extract Ray Serve backend boundary - inference (1/5) by praateekmahajan · Pull Request #1813 · NVIDIA-NeMo/Curator

praateekmahajan · 2026-04-15T22:18:18Z

Description

PR 1/5 of the inference-server restack
turn nemo_curator.core.serve into a package and move Ray Serve deployment, shutdown, and quiet runtime-env handling into nemo_curator/core/serve/ray_serve/backend.py
keep InferenceServer focused on lifecycle and backend dispatch while preserving the existing InferenceModelConfig / InferenceServer public API for this PR
rename the public activity helper to is_inference_server_active() and move Ray Serve GPU integration coverage under tests/core/serve/ray_serve/test_integration.py

Usage

from nemo_curator.core.serve import InferenceModelConfig, InferenceServer

config = InferenceModelConfig(
    model_identifier="google/gemma-3-27b-it",
    deployment_config={"autoscaling_config": {"min_replicas": 1, "max_replicas": 1}},
    engine_kwargs={"tensor_parallel_size": 1},
)

with InferenceServer(models=[config]) as server:
    print(server.endpoint)

Checklist

I am familiar with the Contributing Guide.
New or Existing tests cover these changes.
The documentation is up to date with these changes.

Signed-off-by: Praateek <praateekm@gmail.com>

greptile-apps · 2026-04-15T22:21:28Z

Greptile Summary

This PR refactors nemo_curator/core/serve from a single module into a package, extracting Ray Serve deployment, shutdown, and quiet runtime-env logic into RayServeBackend behind a new InferenceBackend Protocol. State management, atexit cleanup, and the error path on failed start are all handled correctly, and the public API (InferenceModelConfig, InferenceServer, is_inference_server_active) is preserved.

Confidence Score: 5/5

Safe to merge; the only finding is a P2 test cleanup — a stale patch.object(serve, 'shutdown') that is dead code after the refactor.

All production code changes are correct — Protocol contract satisfied, atexit lifecycle is sound, error paths clean up state properly. The single finding is a non-blocking test quality issue (misleading patch in an existing test), and the actual delegation behavior it was meant to exercise is covered by the new test_start_stop_delegates_to_backend test.

tests/core/test_serve.py — test_stop_calls_shutdown has a stale patch.object(serve, 'shutdown') that should be removed or replaced with a stub backend.

Important Files Changed

Filename	Overview
nemo_curator/core/serve/init.py	New package init that re-exports InferenceModelConfig, InferenceServer, and is_inference_server_active — preserves the public API surface.
nemo_curator/core/serve/ray_serve/backend.py	New RayServeBackend class correctly encapsulates Ray Serve deploy/shutdown, client-cache reset, and quiet runtime env helpers moved from InferenceServer.
nemo_curator/core/serve/server.py	Refactored InferenceServer correctly introduces InferenceBackend Protocol, delegates start/stop to backend, and handles error paths and atexit cleanup properly.
tests/core/test_serve.py	Mostly well updated; test_stop_calls_shutdown has a stale patch.object(serve, 'shutdown') that is dead code after the refactor — shutdown is no longer called directly but via the backend.
tests/pipelines/test_pipelines.py	Cleanly switched to patching is_inference_server_active with a mock, removing fragile _active_servers mutation.
tests/core/serve/ray_serve/test_integration.py	Integration tests migrated from test_serve.py; fixture and test logic are equivalent to the old file with correct use of is_inference_server_active.
nemo_curator/pipeline/pipeline.py	Straightforward rename from is_ray_serve_active to is_inference_server_active; no behavioral change.

Sequence Diagram

sequenceDiagram
    participant U as User
    participant IS as InferenceServer
    participant B as RayServeBackend
    participant R as Ray / Serve

    U->>IS: start()
    IS->>IS: check _active_servers (singleton guard)
    IS->>IS: atexit.register(self.stop)
    IS->>IS: _create_backend()
    IS->>B: start()
    B->>R: ray.init(ignore_reinit_error=True)
    B->>B: _deploy()
    B->>R: serve.start(http_options={port})
    B->>R: serve.run(app, blocking=False)
    B->>IS: _wait_for_healthy()
    IS-->>B: healthy
    R-->>B: driver disconnects (context exit)
    B-->>IS: start() returns
    IS->>IS: _active_servers.add(name), _started=True
    IS-->>U: Inference server is ready at endpoint

    U->>IS: stop()
    IS->>B: stop()
    B->>R: ray.init(ignore_reinit_error=True)
    B->>R: serve.shutdown()
    R-->>B: done
    B-->>IS: stop() returns
    IS->>IS: _active_servers.discard(name), _started=False
    IS->>IS: atexit.unregister(self.stop)
    IS-->>U: done

_{Reviews (5): Last reviewed commit: "Merge branch 'main' into praateek/infese..." | Re-trigger Greptile}

praateekmahajan · 2026-04-15T22:30:13Z

/claude review

Signed-off-by: Praateek <praateekm@gmail.com>

abhinavg4

Simple refactor

praateekmahajan added 3 commits April 15, 2026 14:46

Refactor Ray Serve behind a backend

8914d69

Signed-off-by: Praateek <praateekm@gmail.com>

Preserve serve rename and simplify delegation test

f602d5e

Signed-off-by: Praateek <praateekm@gmail.com>

Move Ray Serve integration coverage under serve tests

018ffc4

Signed-off-by: Praateek <praateekm@gmail.com>

praateekmahajan requested a review from a team as a code owner April 15, 2026 22:18

praateekmahajan requested review from abhinavg4 and removed request for a team April 15, 2026 22:18

copy-pr-bot Bot temporarily deployed to test April 15, 2026 22:18 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 15, 2026 22:18 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci April 15, 2026 22:18 Error

copy-pr-bot Bot temporarily deployed to nemo-ci April 15, 2026 22:18 Inactive

praateekmahajan changed the title ~~Refactor Ray Serve behind a backend boundary~~ Inference Server - 1/N - Refactor Ray Serve behind a backend boundary Apr 15, 2026

greptile-apps Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread nemo_curator/core/serve/server.py Outdated

Comment thread nemo_curator/core/serve/server.py Outdated

praateekmahajan changed the title ~~Inference Server - 1/N - Refactor Ray Serve behind a backend boundary~~ refactor: Inference Server - 1 /5 - extract Ray Serve backend boundary Apr 15, 2026

praateekmahajan changed the title ~~refactor: Inference Server - 1 /5 - extract Ray Serve backend boundary~~ refactor: Inference Server - 1/5 - extract Ray Serve backend boundary Apr 15, 2026