Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share more info about a server in DHT #355

Merged
merged 10 commits into from
Jul 14, 2023
Merged

Share more info about a server in DHT #355

merged 10 commits into from
Jul 14, 2023

Conversation

borzunov
Copy link
Collaborator

@borzunov borzunov commented Jul 14, 2023

Now we share this info and validate the DHT record format with pydantic:

@pydantic.dataclasses.dataclass
class ServerInfo:
    state: ServerState
    throughput: pydantic.confloat(ge=0, allow_inf_nan=False, strict=True)

    adapters: Sequence[str] = ()
    version: Optional[str] = None
    torch_dtype: Optional[str] = None
    quant_type: Optional[str] = None
    using_relay: Optional[bool] = None
    cache_tokens_left: Optional[pydantic.conint(ge=0, strict=True)] = None

Then it can be displayed in the health service:

Screenshot 2023-07-15 at 03 35 45

@borzunov borzunov marked this pull request as draft July 14, 2023 20:01
@borzunov borzunov force-pushed the more-dht-info branch 4 times, most recently from 13c5717 to a484cc7 Compare July 14, 2023 21:26
@borzunov borzunov marked this pull request as ready for review July 14, 2023 23:35
@borzunov borzunov merged commit 2c8959e into main Jul 14, 2023
7 checks passed
@borzunov borzunov deleted the more-dht-info branch July 14, 2023 23:36
borzunov added a commit that referenced this pull request Jul 18, 2023
This PR:

1. **Adds shortest path routing for inference.** We build a graph with client-server and server-server latencies and compute costs, as well as empirically measured overheads. For client-server latencies, we ping possible first and last servers in a sequence in `SequenceManager.update()`. We penalize servers who may not have enough cache for our request. This uses info added to DHT in #355, #356, #358.

2. **Makes a server ping neighboring servers in addition to next ones.** This is to get an opportunity to change the server even before we use all its blocks (e.g., because a neighboring server is faster). This feature is not enabled though, since it increases graph size for N servers to O(N^2) - but we may enable it if needed.

3. **Fixes a `SequenceManager` bug with the first `update()`.** Previously, this update was likely to produce incorrect information and cause to `MissingBlocksErrors` until the next update happens.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant