Improve health endpoint by lvhan028 · Pull Request #4615 · InternLM/lmdeploy

lvhan028 · 2026-05-23T07:59:50Z

Improve the API server /health endpoint so it reflects inference engine health instead of only reporting that the HTTP server is alive.

This change adds backend health probing for both PyTorch and TurboMind engines. The API server now runs a background EngineHealthMonitor, caches the latest health snapshot, and
returns 503 when the inference backend is unhealthy while keeping 200 for healthy or sleeping engines.

The health probe uses a bounded, non-overlapping backend check and validates scheduler progress with a backend-owned monotonic scheduler_tick. This allows /health to detect cases
where requests have been dispatched but the backend scheduler stops making progress. Idle periods are handled separately so the backend is not marked unhealthy simply because there is
no active work.

Both engines expose scheduler_tick through schedule metrics, which is update in every inference iter. so health probing sees current sequence/block state.

Beside "scheduler_tick`, PyTorch engine health status now also checks engine loop/task liveness

Copilot

Pull request overview

This PR enhances the OpenAI server /health endpoint by adding an engine health monitor that actively probes backend liveness and detects scheduler stalls via a new monotonic scheduler_tick metric surfaced from both TurboMind (C++/pybind) and PyTorch backends.

Changes:

Add scheduler_tick to schedule metrics across TurboMind (C++ + Python binding) and PyTorch scheduler metrics.
Introduce EngineHealthMonitor + AsyncEngine.health_probe() and wire /health to return structured JSON with 200/503 based on engine status.
Add lightweight backend-specific get_health_status() implementations (TurboMind, PyTorch, mp engines) for the health probe.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`src/turbomind/utils/metrics.h`	Adds `scheduler_tick` field to TurboMind schedule metrics and prints it in the stream operator.
`src/turbomind/python/bind.cpp`	Exposes `scheduler_tick` to Python via pybind `ScheduleMetrics`.
`src/turbomind/engine/engine.cc`	Tracks `scheduler_tick` and adjusts schedule-metrics update/get logic (also initializes metrics after seq manager creation).
`lmdeploy/turbomind/turbomind.py`	Propagates `scheduler_tick` into Python `ScheduleMetrics` and adds TurboMind `get_health_status()`.
`lmdeploy/serve/openai/api_server.py`	Switches `/health` to JSON output backed by `EngineHealthMonitor`; wires monitor into FastAPI lifespan.
`lmdeploy/serve/managers/session_manager.py`	Adds `num_dispatched` to track checked-out request handles for stall detection logic.
`lmdeploy/serve/core/health.py`	New `EngineHealthMonitor` background task that periodically probes engine health.
`lmdeploy/serve/core/async_engine.py`	Adds bounded, non-overlapping health probing + scheduler progress validation.
`lmdeploy/serve/core/__init__.py`	Exports `EngineHealthMonitor`.
`lmdeploy/pytorch/paging/scheduler.py`	Adds `scheduler_tick` and includes it in schedule metrics.
`lmdeploy/pytorch/engine/mp_engine/zmq_engine.py`	Adds health status check for ZMQ process liveness before probing.
`lmdeploy/pytorch/engine/mp_engine/base.py`	Adds `get_health_status()` RPC wrapper.
`lmdeploy/pytorch/engine/mp_engine/base_worker.py`	Adds RPC-exposed `get_health_status()` implementation.
`lmdeploy/pytorch/engine/engine.py`	Adds PyTorch engine `get_health_status()` checking request/main loop task liveness.
`lmdeploy/pytorch/engine/engine_loop.py`	Increments scheduler tick on each main-loop iteration.
`lmdeploy/pytorch/engine/base.py`	Adds `get_health_status()` to the engine base interface.
`lmdeploy/messages.py`	Adds `scheduler_tick` field to the Python `ScheduleMetrics` dataclass.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

+    @staticmethod
+    def _health_check_tasks(tasks):
+        done_tasks = []
+        for task in list(tasks):
+            if task.done():
+                done_tasks.append(task.get_name())
+        return len(done_tasks) == 0, done_tasks


Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

+        self._health_probe_task = asyncio.create_task(self.engine.get_health_status(), name='EngineHealthProbe')
+        try:
+            backend_status = await asyncio.wait_for(asyncio.shield(self._health_probe_task), timeout=timeout)
+        except asyncio.TimeoutError:
+            return self._make_health_result(
+                status='unhealthy',
+                message=f'Backend health probe timed out after {timeout:.1f}s.',
+            )
+        except Exception as e:
+            self._health_probe_task = None
+            return self._make_health_result(
+                status='unhealthy',
+                message=f'Backend health probe failed: {e}',
+            )
+
+        self._health_probe_task = None


                               total_blocks=tm_metrics.total_blocks,
                               active_blocks=tm_metrics.active_blocks,
-                               free_blocks=tm_metrics.free_blocks)
+                               free_blocks=tm_metrics.free_blocks,


lvhan028 added 3 commits May 22, 2026 12:55

Improve api_server health check with backend probes

c04e598

Report /health based on backend liveness, bounded probe execution,

8feca5f

remove rest_health.py

60b7e46

Copilot AI review requested due to automatic review settings May 23, 2026 07:59

Copilot started reviewing on behalf of lvhan028 May 23, 2026 08:00 View session

lvhan028 added the improvement label May 23, 2026

Copilot AI reviewed May 23, 2026

View reviewed changes

Comment thread lmdeploy/turbomind/turbomind.py Outdated

Comment thread src/turbomind/engine/engine.cc Outdated

Comment thread lmdeploy/serve/core/health.py

lvhan028 added 2 commits May 25, 2026 07:40

fix lint

b05d299

fix according to reviewer comments

2be204d

lvhan028 requested a review from Copilot May 25, 2026 08:13

Copilot started reviewing on behalf of lvhan028 May 25, 2026 08:13 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

fix according to reviewer comments

f938c6a

lvhan028 requested a review from Copilot May 25, 2026 08:46

Copilot started reviewing on behalf of lvhan028 May 25, 2026 08:46 View session

lvhan028 requested a review from lzhangzz May 25, 2026 08:47

Copilot AI reviewed May 25, 2026

View reviewed changes

lvhan028 requested a review from grimoire May 25, 2026 09:45

grimoire reviewed May 25, 2026

View reviewed changes

Comment thread lmdeploy/pytorch/paging/scheduler.py

grimoire approved these changes May 26, 2026

View reviewed changes

lzhangzz approved these changes May 27, 2026

View reviewed changes

lvhan028 merged commit 4dad4c9 into InternLM:main May 28, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve health endpoint#4615

Improve health endpoint#4615
lvhan028 merged 6 commits into
InternLM:mainfrom
lvhan028:improve-health

lvhan028 commented May 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lvhan028 commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lvhan028 commented May 23, 2026 •

edited

Loading