Report inference, forward, and network RPS separately #358

borzunov · 2023-07-16T00:52:12Z

Inference RPS may be very different from forward RPS. E.g., currently bnb uses a completely different algorithm for NF4 inference. We report detailed RPS info that can be then used for shortest-path routing for inference.

borzunov · 2023-07-16T00:53:54Z

src/petals/utils/ping.py

@@ -16,7 +16,7 @@ async def ping(
    _dht: hivemind.DHT,
    node: hivemind.dht.DHTNode,
    *,
-    wait_timeout: float = 1,
+    wait_timeout: float = 5,


Peers that use relay may yield large first ping, but it'll be smoothed in future.

This PR: 1. **Adds shortest path routing for inference.** We build a graph with client-server and server-server latencies and compute costs, as well as empirically measured overheads. For client-server latencies, we ping possible first and last servers in a sequence in `SequenceManager.update()`. We penalize servers who may not have enough cache for our request. This uses info added to DHT in #355, #356, #358. 2. **Makes a server ping neighboring servers in addition to next ones.** This is to get an opportunity to change the server even before we use all its blocks (e.g., because a neighboring server is faster). This feature is not enabled though, since it increases graph size for N servers to O(N^2) - but we may enable it if needed. 3. **Fixes a `SequenceManager` bug with the first `update()`.** Previously, this update was likely to produce incorrect information and cause to `MissingBlocksErrors` until the next update happens.

borzunov added 2 commits July 16, 2023 00:06

Report more throughput info

b4bd918

Increase wait_timeout for peers using relay

f83a3bb

borzunov commented Jul 16, 2023

View reviewed changes

borzunov requested a review from justheuristic July 16, 2023 00:56

borzunov and others added 4 commits July 16, 2023 01:17

Fix tests

c2a0e2e

Don't use past cache on forward benchmark

3b6cdb7

Merge branch 'main' into report-speeds

726d8be

Pass full server info to make_sequence()

37729c1

borzunov force-pushed the report-speeds branch from ddd8030 to 37729c1 Compare July 17, 2023 06:31

borzunov merged commit 11f0d99 into main Jul 17, 2023
7 checks passed

borzunov deleted the report-speeds branch July 17, 2023 09:46

borzunov mentioned this pull request Jul 18, 2023

Implement shortest-path routing for inference #362

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report inference, forward, and network RPS separately #358

Report inference, forward, and network RPS separately #358

borzunov commented Jul 16, 2023 •

edited

borzunov Jul 16, 2023

Report inference, forward, and network RPS separately #358

Report inference, forward, and network RPS separately #358

Conversation

borzunov commented Jul 16, 2023 • edited

borzunov Jul 16, 2023

Choose a reason for hiding this comment

borzunov commented Jul 16, 2023 •

edited