From 6c97e39a50b2aa283e982c3f4755b1d0d939a3d3 Mon Sep 17 00:00:00 2001
From: Youhe Jiang <85312798+Youhe-Jiang@users.noreply.github.com>
Date: Mon, 29 Sep 2025 11:05:29 +0100
Subject: [PATCH] Update README.md

---
 src/scheduling/README.md | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/src/scheduling/README.md b/src/scheduling/README.md
index ce6118c2..13b1cf13 100644
--- a/src/scheduling/README.md
+++ b/src/scheduling/README.md
@@ -2,8 +2,17 @@
 
 This directory implements a two-phase scheduler for distributed LLM inference:
 
-- **Phase 1 — Layer allocation**: assign contiguous decoder layer ranges to nodes and rebalance in place.
-- **Phase 2 — Request routing**: compute an end-to-end, minimum-latency path across the assigned node ranges.
+### Phase 1 — Layer allocation
+
+Assign contiguous decoder layer ranges to nodes and rebalance in place, as illustrated below:
+
+<img width="1874" height="852" alt="parallax_1" src="https://github.com/user-attachments/assets/c57cde77-0cda-48fc-b1ad-6d4aa1b1787b" />
+
+### Phase 2 — Request routing
+
+Compute an end-to-end, minimum-latency path across the assigned node ranges, as illustrated below:
+
+<img width="1828" height="705" alt="parallax_2" src="https://github.com/user-attachments/assets/8a6b4d8f-8d97-402b-ba84-3ce61e4ee313" />
 
 The main entrypoint is `scheduling.scheduler.Scheduler`, which orchestrates allocation, dynamic joins/leaves, health checks, and routing.