Implement direct server-to-server communication #331

borzunov · 2023-06-24T02:25:16Z

Implement #226.

TODO:

Session pipes => session queues (manager.Queue())
Send block indices instead of uids to compactify next_servers

borzunov · 2023-06-24T02:50:25Z

src/petals/server/handler.py

@@ -47,6 +51,8 @@ def __init__(
        dht: DHT,
        module_backends: Dict[str, TransformerBackend],
        *,
+        push_manager: SyncManager,
+        session_pipes: Dict[str, Tuple[PipeConnection, threading.Lock]],


rpc_push() may be received by a connection handler different from the one holding the inference session, so we use some multiprocess communication here.

borzunov · 2023-07-01T02:55:03Z

src/petals/server/handler.py

@@ -92,11 +101,20 @@ def _unpack(req: runtime_pb2.ExpertRequest) -> Iterable[runtime_pb2.Tensor]:
        assert isinstance(block_uid, str) and isinstance(metadata, dict)
        return block_uid, inputs, metadata

+    async def rpc_push(self, request: runtime_pb2.ExpertRequest, context: P2PContext) -> runtime_pb2.ExpertResponse:


TODO: This can be stream-to-unary handler, so that (a) the previous server doesn't have to make a new connection each time and (b) we don't have to parse metadata at this stage each time (now it's done to find session_id). I'm not sure if it affects performance much though, so I'd postpone that to a later PR.

borzunov · 2023-07-01T23:46:35Z

src/petals/client/inference_session.py

            for attempt_no in itertools.count():
                logger.debug(f"Inference: block {block_idx}, attempt {attempt_no}")
                span = None
                try:
                    if not self._chosen_spans or not self._server_sessions or attempt_no >= 1:
-                        # If there is a failed server session, this code closes it


This code was moved to a separate method InferenceSession._update_sequence() to simplify this method.

borzunov · 2023-07-10T00:11:52Z

src/petals/client/inference_session.py

+            inputs = self.history  # Pass full inputs including prefix
+        else:
+            inputs = inputs[:, -n_input_tokens:]  # No need to pass prefix further
+


Refactored: This code was moved from InferenceSession.step() to _ServerInferenceSession.step(), since it's actually about one server only. The overall structure is more clear this way.

src/petals/client/inference_session.py

src/petals/server/server.py

src/petals/server/handler.py

) Implement bigscience-workshop#226.

Fix nits in rpc_inference()

2f8525a

borzunov force-pushed the server-to-server branch from 4b94939 to 9aebf0a Compare June 24, 2023 02:30

Draft merging requests from rpc_inference() and rpc_push()

234bafb

borzunov force-pushed the server-to-server branch from 9aebf0a to 234bafb Compare June 24, 2023 02:32

borzunov commented Jun 24, 2023

View reviewed changes

borzunov commented Jul 1, 2023

View reviewed changes

borzunov added 2 commits July 1, 2023 23:27

InferenceSession: send session_id, request_id

01b8f34

Merge branch 'main' into server-to-server

ece4a4d

borzunov commented Jul 1, 2023

View reviewed changes

Refactor InferenceSession.step()

49bbce4

borzunov force-pushed the server-to-server branch from 73a0563 to 49bbce4 Compare July 1, 2023 23:50

borzunov added 3 commits July 1, 2023 23:56

Refactor _ServerInferenceSession.create()

a2a180a

Set links to the next _ServerInferenceSession

6af6215

Refactor InferenceSession: remove self._chosen_spans

1ab2706

borzunov force-pushed the server-to-server branch 2 times, most recently from 0e5ce37 to 2445aa5 Compare July 2, 2023 00:36

Move InferenceSession.server_inputs to _ServerInferenceSession.history

cef5662

borzunov force-pushed the server-to-server branch from 2445aa5 to cef5662 Compare July 2, 2023 00:37

borzunov added 2 commits July 2, 2023 00:41

Remove recovery_until

53c3089

Send next_servers when available

e03f531

borzunov force-pushed the server-to-server branch from 7d73698 to 429146e Compare July 2, 2023 02:21

Send step_id instead of request_id

9a23c1e

borzunov force-pushed the server-to-server branch from 429146e to 9a23c1e Compare July 2, 2023 02:24

borzunov added 2 commits July 2, 2023 03:25

Send rpc_push() on server

035ef3d

Fix bugs

ed147d9

borzunov force-pushed the server-to-server branch from bccdc21 to 845c172 Compare July 5, 2023 13:01

Use manager.Queue() instead of pipes and locks

0635968

borzunov force-pushed the server-to-server branch from 845c172 to 0635968 Compare July 5, 2023 13:06

borzunov added 2 commits July 9, 2023 22:37

Fix bugs

1953237

Add config.use_server_to_server flag

714eb93

borzunov force-pushed the server-to-server branch from c5dfba2 to a637fd2 Compare July 9, 2023 22:58

Improve debug logging

7e204d0

borzunov force-pushed the server-to-server branch from a637fd2 to 7e204d0 Compare July 9, 2023 22:58

borzunov added 3 commits July 9, 2023 23:10

isort

26767f6

Merge branch 'main' into server-to-server

e67ad12

Send (start, end) instead of all uids in metadata["next_servers"]

5b3f180

borzunov force-pushed the server-to-server branch from 9427cae to 5b3f180 Compare July 9, 2023 23:32

borzunov added 3 commits July 9, 2023 23:37

Show stats on late pushes in a session

bc36852

Set version to 1.2.0.dev1

5fee13c

Remove debug exception handlers

eadbb87

borzunov commented Jul 10, 2023

View reviewed changes

borzunov marked this pull request as ready for review July 10, 2023 00:16

Fix typing

3670214

justheuristic approved these changes Jul 10, 2023

View reviewed changes

src/petals/client/inference_session.py Outdated Show resolved Hide resolved

src/petals/client/inference_session.py Outdated Show resolved Hide resolved

src/petals/client/inference_session.py Outdated Show resolved Hide resolved

src/petals/server/server.py Show resolved Hide resolved

justheuristic reviewed Jul 10, 2023

View reviewed changes

src/petals/server/handler.py Show resolved Hide resolved

borzunov added 2 commits July 10, 2023 14:57

Fix review comments by @justheuristic

413521a

_push_outputs: Show errors with logger.debug()

6872fee

borzunov merged commit 158013a into main Jul 11, 2023
7 checks passed

borzunov deleted the server-to-server branch July 11, 2023 13:29

artek0chumak pushed a commit to artek0chumak/petals that referenced this pull request Jul 11, 2023

Implement direct server-to-server communication (bigscience-workshop#331

768f52f

) Implement bigscience-workshop#226.

borzunov mentioned this pull request Sep 28, 2023

Fix retries during inference #523

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement direct server-to-server communication #331

Implement direct server-to-server communication #331

borzunov commented Jun 24, 2023 •

edited

borzunov Jun 24, 2023

borzunov Jul 1, 2023 •

edited

borzunov Jul 1, 2023 •

edited

borzunov Jul 10, 2023 •

edited

Implement direct server-to-server communication #331

Implement direct server-to-server communication #331

Conversation

borzunov commented Jun 24, 2023 • edited

borzunov Jun 24, 2023

Choose a reason for hiding this comment

borzunov Jul 1, 2023 • edited

Choose a reason for hiding this comment

borzunov Jul 1, 2023 • edited

Choose a reason for hiding this comment

borzunov Jul 10, 2023 • edited

Choose a reason for hiding this comment

borzunov commented Jun 24, 2023 •

edited

borzunov Jul 1, 2023 •

edited

borzunov Jul 1, 2023 •

edited

borzunov Jul 10, 2023 •

edited