refactor: extract Ray Serve backend boundary - inference (1/5)#1813
Conversation
Signed-off-by: Praateek <praateekm@gmail.com>
Signed-off-by: Praateek <praateekm@gmail.com>
Signed-off-by: Praateek <praateekm@gmail.com>
Greptile SummaryThis PR refactors Confidence Score: 5/5Safe to merge; the only finding is a P2 test cleanup — a stale patch.object(serve, 'shutdown') that is dead code after the refactor. All production code changes are correct — Protocol contract satisfied, atexit lifecycle is sound, error paths clean up state properly. The single finding is a non-blocking test quality issue (misleading patch in an existing test), and the actual delegation behavior it was meant to exercise is covered by the new test_start_stop_delegates_to_backend test. tests/core/test_serve.py — test_stop_calls_shutdown has a stale patch.object(serve, 'shutdown') that should be removed or replaced with a stub backend. Important Files Changed
Sequence DiagramsequenceDiagram
participant U as User
participant IS as InferenceServer
participant B as RayServeBackend
participant R as Ray / Serve
U->>IS: start()
IS->>IS: check _active_servers (singleton guard)
IS->>IS: atexit.register(self.stop)
IS->>IS: _create_backend()
IS->>B: start()
B->>R: ray.init(ignore_reinit_error=True)
B->>B: _deploy()
B->>R: serve.start(http_options={port})
B->>R: serve.run(app, blocking=False)
B->>IS: _wait_for_healthy()
IS-->>B: healthy
R-->>B: driver disconnects (context exit)
B-->>IS: start() returns
IS->>IS: _active_servers.add(name), _started=True
IS-->>U: Inference server is ready at endpoint
U->>IS: stop()
IS->>B: stop()
B->>R: ray.init(ignore_reinit_error=True)
B->>R: serve.shutdown()
R-->>B: done
B-->>IS: stop() returns
IS->>IS: _active_servers.discard(name), _started=False
IS->>IS: atexit.unregister(self.stop)
IS-->>U: done
Reviews (5): Last reviewed commit: "Merge branch 'main' into praateek/infese..." | Re-trigger Greptile |
|
/claude review |
Signed-off-by: Praateek <praateekm@gmail.com>
Description
nemo_curator.core.serveinto a package and move Ray Serve deployment, shutdown, and quiet runtime-env handling intonemo_curator/core/serve/ray_serve/backend.pyInferenceServerfocused on lifecycle and backend dispatch while preserving the existingInferenceModelConfig/InferenceServerpublic API for this PRis_inference_server_active()and move Ray Serve GPU integration coverage undertests/core/serve/ray_serve/test_integration.pyUsage
Checklist