Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a server ping next servers #356

Merged
merged 8 commits into from
Jul 15, 2023
Merged

Make a server ping next servers #356

merged 8 commits into from
Jul 15, 2023

Conversation

borzunov
Copy link
Collaborator

@borzunov borzunov commented Jul 15, 2023

This PR makes a server ping potential next servers in a chain and report the RTTs to DHT:

Screenshot 2023-07-15 at 20 14 55

This will be used for shortest-path routing.

module_uids,
server_info,
expiration_time=get_dht_time() + expiration,
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored away

from hivemind.utils.logging import get_logger

logger = get_logger(__name__)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@borzunov borzunov merged commit 81c4a45 into main Jul 15, 2023
7 checks passed
@borzunov borzunov deleted the ping branch July 15, 2023 16:16
borzunov added a commit that referenced this pull request Jul 18, 2023
This PR:

1. **Adds shortest path routing for inference.** We build a graph with client-server and server-server latencies and compute costs, as well as empirically measured overheads. For client-server latencies, we ping possible first and last servers in a sequence in `SequenceManager.update()`. We penalize servers who may not have enough cache for our request. This uses info added to DHT in #355, #356, #358.

2. **Makes a server ping neighboring servers in addition to next ones.** This is to get an opportunity to change the server even before we use all its blocks (e.g., because a neighboring server is faster). This feature is not enabled though, since it increases graph size for N servers to O(N^2) - but we may enable it if needed.

3. **Fixes a `SequenceManager` bug with the first `update()`.** Previously, this update was likely to produce incorrect information and cause to `MissingBlocksErrors` until the next update happens.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants