From 6c97e39a50b2aa283e982c3f4755b1d0d939a3d3 Mon Sep 17 00:00:00 2001 From: Youhe Jiang <85312798+Youhe-Jiang@users.noreply.github.com> Date: Mon, 29 Sep 2025 11:05:29 +0100 Subject: [PATCH] Update README.md --- src/scheduling/README.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/src/scheduling/README.md b/src/scheduling/README.md index ce6118c2..13b1cf13 100644 --- a/src/scheduling/README.md +++ b/src/scheduling/README.md @@ -2,8 +2,17 @@ This directory implements a two-phase scheduler for distributed LLM inference: -- **Phase 1 — Layer allocation**: assign contiguous decoder layer ranges to nodes and rebalance in place. -- **Phase 2 — Request routing**: compute an end-to-end, minimum-latency path across the assigned node ranges. +### Phase 1 — Layer allocation + +Assign contiguous decoder layer ranges to nodes and rebalance in place, as illustrated below: + +parallax_1 + +### Phase 2 — Request routing + +Compute an end-to-end, minimum-latency path across the assigned node ranges, as illustrated below: + +parallax_2 The main entrypoint is `scheduling.scheduler.Scheduler`, which orchestrates allocation, dynamic joins/leaves, health checks, and routing.