Skip to content
earonesty edited this page Sep 18, 2023 · 9 revisions

Overview

The ai "worker" operates at a high level, responding to model inference, training fine tuning requests for specific model parameters.

Currently we only support llama2, via the llama-cpp-python library because it's fast, compatible with any platform and gpu, feature-complete and handles model-splitting. Stable diffusion will likely be the next target.

Overview of protocol

graph TD

subgraph "Many Workers"
    W1[Connect to Spider URL via WebSockets]
    W2[Send a Registration Message]
    W3[Listen for Inferency & Training Websocket Requests]
    W4[Reply with OpenAI Formatted Response]
    W1 --> W2
    W2 --> W3
    W3 --> W4
end

subgraph "One Spider"
    S1[Listen for Worker Connections]
    S2[Accumulate List of Active Workers]
    S3[Listen for Inference, Training REST Requests]
    S4[Pick an Available Worker]
    S5[Send Request Over a Websocket]
    S6[Translate Reply Into REST Response]
    S7[Complete Any Billing]
    S1 --> S2
    S2 --> S3
    S3 --> S4
    S4 --> S5
    S5 --> S6
    S6 --> S7
end

subgraph "Many Clients"
    C1[Make REST Requests Against the Spider]
end

%%% Interactions %%%
W1 -.-> S1
S2 -.-> W2
S3 -.-> C1
C1 -.-> S3
W3 -.-> S5
S4 -.-> W4
W4 -.-> S6
Loading
Clone this wiki locally