Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report inference, forward, and network RPS separately #358

Merged
merged 6 commits into from
Jul 17, 2023
Merged

Conversation

borzunov
Copy link
Collaborator

@borzunov borzunov commented Jul 16, 2023

Inference RPS may be very different from forward RPS. E.g., currently bnb uses a completely different algorithm for NF4 inference. We report detailed RPS info that can be then used for shortest-path routing for inference.

@@ -16,7 +16,7 @@ async def ping(
_dht: hivemind.DHT,
node: hivemind.dht.DHTNode,
*,
wait_timeout: float = 1,
wait_timeout: float = 5,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Peers that use relay may yield large first ping, but it'll be smoothed in future.

@borzunov borzunov merged commit 11f0d99 into main Jul 17, 2023
7 checks passed
@borzunov borzunov deleted the report-speeds branch July 17, 2023 09:46
borzunov added a commit that referenced this pull request Jul 18, 2023
This PR:

1. **Adds shortest path routing for inference.** We build a graph with client-server and server-server latencies and compute costs, as well as empirically measured overheads. For client-server latencies, we ping possible first and last servers in a sequence in `SequenceManager.update()`. We penalize servers who may not have enough cache for our request. This uses info added to DHT in #355, #356, #358.

2. **Makes a server ping neighboring servers in addition to next ones.** This is to get an opportunity to change the server even before we use all its blocks (e.g., because a neighboring server is faster). This feature is not enabled though, since it increases graph size for N servers to O(N^2) - but we may enable it if needed.

3. **Fixes a `SequenceManager` bug with the first `update()`.** Previously, this update was likely to produce incorrect information and cause to `MissingBlocksErrors` until the next update happens.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant