Initial SBD Client and Server #2

neonphog · 2024-04-22T17:17:03Z

RESOLVES: #6
RESOLVES: #7
RESOLVES: #10
RESOLVES: #12

The autobahn-style runners are largely placeholders at the moment, with single smoke tests in them. We'll need to fill those out in a follow-on PR.

I'm going to copy the spec.md file into this PR description so you can see a rendered version of it:

SBD Spec

This SBD spec defines the protocol for SBD servers and clients to communicate.
SBD is a simple websocket-based message relay protocol.

1. Websocket Stack

The SBD protocol is built upon websockets.

1.1. Websocket Configuration

1.1.1. Message and Frame Size

The maximum SBD message size (including 32 byte header) is 20000 bytes.

SBD clients and servers MAY set the max message size in the websocket library to 20000 to help enforce this.

The maximum frame size MUST be set larger than 20000 so that sbd messages always fit in a single websocket frame.

2. Cryptography

2.1. TLS

Generally over the WAN SBD servers SHOULD be available over TLS (wss://), and on a LAN without (ws://).

2.2. Ed25519

Clients will be identified by ed25519 public key. Client sessions will be validated by ed25519 signature.

3. Protocol

3.1. Connect Path

Clients MUST specify exactly 1 http path item on the websocket connection url.
This item must be the base64url encoded public key that this client will be identified by.
This public key MUST be unique to this new connection.

3.2. Messages

3.2.1. Header

SBD messages always contain first a 32 byte header. Messages less than 32 bytes are invalid.

If the header starts with 28 zero bytes, the message is a "command". Otherwise, the message is a "forward".

If the header is a "command" the next four literal ascii bytes are interpreted as the command type:

lbrt - limit byte nanos - 4 byte i32be limit - server sent
lidl - limit idle millis - 4 byte i32be limit - server sent
areq - authentication request - 32 byte nonce - server sent
ares - authentication response - 64 byte signature - client sent
srdy - server ready - no additional data - server sent
keep - keepalive - no additional data - client sent

If the header is a "forward" type, the 32 byte header is interpreted as the public key to forward the message to.
The remaining bytes in the message are the data to forward.

3.2.2. Forward

When a client sends a "forward" message to the server, the first 32 bytes represent the
peer the message should be forwarded to.

When a server sends a "forward" message to the client the message should be forwarded to
the first 32 bytes will be altered to represent the peer from which the message originated.

3.2.3. Flow

The server MUST send areq with a random nonce once for every new opened connection.
The server MAY send any limit messages before or after this areq, but it MUST come before the srdy.
The client MUST respond with a signature over the nonce by the private key associated with the public key
sent in the url path segment websocket request
If the signature is valid the server MUST send the srdy message.
After receiving srdy the client MAY begin sending "forward" type messages.
At any point, the server MAY send an updated lbrt message.
At any point after srdy, the server MAY send "forward" type messages to the client.
At any point after srdy, the client MAY send a keep message.

sequenceDiagram
S->>C: lbrt
S->>C: lidl
S->>C: areq
C->>S: ares
S->>C: srdy
C->>S: "forward"
S->>C: "forward"
S->>C: lbrt
C->>S: keep

3.2.4. `lidl` and Keepalive

The lidl "Limit Idle Millis" message is the millisecond count the server will keep a connection around
without having seen any messages sent from that client. If a client does not have any forward messages
to send within this time period and wishes to keep the connection open they SHOULD send a keep message
to maintain the connection. Note this keep message will be counted against rate limiting.

3.2.5. `lbrt` and rate limiting

The lbrt "Limit Byte Nanos" message indicates the nanoseconds of rate limiting used up by a single byte sent.
This is intuitively backwards of a "limit" because higher values indicate you need to send data more slowly,
but this direction is easier to work with in code. That is, if lbrt is 1, you can send one byte every nanosecond.
A more reasonable librt value of 8000 means you can send 1 byte every 8000 nanoseconds. Or more reasonably,
if you send a 16000 byte message, you should wait 8000 * 16000 nanoseconds before sending the next message.

A server MUST provide a "burst" grace window to account for message size.

A server MAY track rate limiting by some metric other than individual connection. IP address, for example.
Then, if additional connections are established from the same other metric, all connections could be notified
of needing to send data more slowly.

3.2.6. Extensibility

In order to make this protocol extensible without versioning, clients and servers MUST ignore unknown command types.
(With the exception that servers should still count the raw bytes in rate limiting.)

3.3. Violations

If a server receives an invalid message from a client it MUST immediately drop the connection with no closing frame.

If a server receives a message that violates the rate limit, the connection MUST similarly be dropped with no closing frame.
The server MAY also block connections (perhaps by IP address) for an unspecified amount of time.

rust/sbd-bench/src/c_turnover.rs

rust/sbd-bench/src/thru.rs

rust/sbd-client/examples/client-o-bahn-runner.rs

rust/sbd-client/src/lib.rs

rust/sbd-client/src/raw_client.rs

rust/sbd-server/src/cmd.rs

spec.md

Co-authored-by: ThetaSinner <ThetaSinner@users.noreply.github.com>

ThetaSinner

I think this looks good. It's much clearer on a second read having through through the spec. It's still a lot of code in one go though :)

rust/sbd-client/src/lib.rs

ThetaSinner · 2024-04-23T10:16:27Z

rust/sbd-client/src/lib.rs

+
+                match match data.parse() {
+                    Ok(data) => data,
+                    Err(_) => break,


Should the write_task also stop if the read_task gets a bad message from the server?

Maybe it does and I'm missing how they're linked?

i'm curious about the error cases here as well because silently break. @neonphog please explain why this is and if appropriate make the errors visible at runtime somehow.

Yep, when it breaks out of the loop it closes the connection, (through the send_buf2.lock().await.close().await call below).

There really aren't a lot of varied error cases... If you were able to establish the connection, but now it no longer exists, all you know is that the connection was closed. By design, you don't know it was closed due to rate-limiting, for example (because we don't want to waste system resources sending data to potential denial of service node).

rust/sbd-client/src/raw_client.rs

ThetaSinner · 2024-04-23T11:00:17Z

rust/sbd-client/src/send_buf.rs

+        let now = self.origin.elapsed().as_nanos() as u64;
+
+        // first check if we need to keepalive
+        if now - self.last_send >= self.idle_keepalive_nanos {


This is skipping the next_step_dur check which seems right to me. The keepalive shouldn't wait on the rate limiting. So maybe the check above isn't needed and the docs for this function can be updated?

As in, this check https://github.com/holochain/sbd/pull/2/files#diff-55f2f95932add9c4e180147077386efd04a8e1ef6635e49bb6fcf514351105b5R70-R73

i read the Call `next_step_dur()` first as an instruction to the user. the keepalive check is indeed redundant to the one in that function.

i wonder if it could be simplified by having a single async fn that uses an async sleep to wait and send rather than needing to use two functions sequentially as can be seen here: https://github.com/holochain/sbd/pull/2/files#diff-55f2f95932add9c4e180147077386efd04a8e1ef6635e49bb6fcf514351105b5R125-R128

These functions are called from two distinct code paths across multiple threads/tasks. First, if we have filled up our send buffer, they will be called in a back pressure loop from the user public send function potentially many concurrent times (the link @steveej provided). Second, from a single write task that manages making sure buffered data gets sent out even if the app isn't doing any follow-on sends.

I decided the best way to do that from the two paths was to have a non-mutable function that would tell you how long to wait before calling the mutable function that may or may not do anything on account of it potentially getting false positives on the wait side.

Let me know if you think there is a clearer way to do this, or a more clear way to document this.

I decided the best way to do that from the two paths was to have a non-mutable function that would tell you how long to wait before calling the mutable function that may or may not do anything on account of it potentially getting false positives on the wait side.

this split makes some intuitive sense to me however i've got a hunch that there isn't much practical gain by it. not worth discussing much more to me at this point though so i'm complete on this thread; as i wasn't the start i won't close it.

rust/sbd-server/src/cmd.rs

rust/sbd-server/src/cslot.rs

jost-s · 2024-04-23T14:06:11Z

@neonphog What does SBD stand for? I suggest to mention that somewhere =)

steveej

first review pass until and excluding rust/sbd-client/src/raw_client.rs

.github/workflows/static.yml

rust/sbd-bench/src/c_turnover.rs

rust/sbd-bench/src/thru.rs

rust/sbd-client/src/lib.rs

rust/sbd-client/src/send_buf.rs

steveej · 2024-04-23T14:19:16Z

rust/sbd-client/src/send_buf.rs

+        let now = self.origin.elapsed().as_nanos() as u64;
+
+        // first check if we need to keepalive
+        if now - self.last_send >= self.idle_keepalive_nanos {


i read the Call `next_step_dur()` first as an instruction to the user. the keepalive check is indeed redundant to the one in that function.

i wonder if it could be simplified by having a single async fn that uses an async sleep to wait and send rather than needing to use two functions sequentially as can be seen here: https://github.com/holochain/sbd/pull/2/files#diff-55f2f95932add9c4e180147077386efd04a8e1ef6635e49bb6fcf514351105b5R125-R128

rust/sbd-client/src/send_buf.rs

neonphog · 2024-04-23T17:11:21Z

@jost-s

@neonphog What does SBD stand for? I suggest to mention that somewhere =)

1856628

SBD doesn't stand for anything. Imagine it however you like. Secure By Design... STUN By Default... Silent But Deadly...

Co-authored-by: Stefan Junker <1181362+steveej@users.noreply.github.com>

maackle · 2024-04-23T23:27:10Z

spec.md

+
+SBD messages always contain first a 32 byte header. Messages less than 32 bytes are invalid.
+
+If the header starts with 28 zero bytes, the message is a "command". Otherwise, the message is a "forward".


If I did the math right (2^28), that puts the probability of a "forward" being misinterpreted as a "command" at around 1 in 268,000,000. That's just the floor -- the probability goes up if e.g. a proof-of-work scheme is used which requires leading zeroes for pub keys.

What are the implications of this misinterpretation? How broadly will those effects spread across the entire network? If a network has 10k agents, the probability of a bad agent key is 1 in 30k, which is beginning to look risky if the effects are widespread.

@maackle - though derived from random seed, ed25519 pub keys are not random numbers, my impression is that it would be even more unlikely to result in 28 initial zeros. But regardless, I can put in a simple check to reject public keys that start with 28 zeroes.

Done: fade996

steveej

i do have lots of comments and am still vague around rust/sbd-server/src/cslot.rs.

however, i'm content with this being merged to proceed with testing while also having my comments addressed either via follow-up issues or my explained even post-merge.

steveej · 2024-05-07T14:46:39Z

rust/sbd-server/tests/suite.rs

@@ -0,0 +1,27 @@
+#[test]
+fn suite() {


any idea how long the builds take? because they're considered test runtime, this may appear as a slow test.

ideally we'd build the binaries before running this test. i'm interested in creating a follow-up issue for this comment and see whether we can establish a better pattern for this on CI.

totally dependent on system resources, on my system it's around 30 seconds after deleting target, but caching obviously makes this take no time at all... We could add a job ahead of the test job that causes them to be built with the correct settings, but I don't know how easy it would be to make sure the settings stay in sync. Feels pretty low priority to me.

steveej · 2024-05-07T14:54:33Z

rust/sbd-server/src/ws/ws_tungstenite.rs

+                        Text(s) => return Ok(Payload::Vec(s.into_bytes())),
+                        Binary(v) => return Ok(Payload::Vec(v)),
+                        Ping(_) | Pong(_) => (),
+                        Close(_) => return Err(Error::other("closed")),


i'm surprised to see an error returned here, as it looks like closing a connection is a regular part of the flow.

Yes, this could be an Option instead of a Result, but it's an internal api, and other backends might have more relevant errors to report.

rust/sbd-server/src/ws/ws_tungstenite.rs

rust/sbd-server/src/ws/ws_fastwebsockets.rs

steveej · 2024-05-07T14:58:29Z

rust/sbd-server/src/ws/ws_fastwebsockets.rs

+            .serve_connection(
+                io,
+                hyper::service::service_fn(move |mut req| {
+                    // TODO report the PUBKEY here


TODO for when? does it justify a follow-up issue?

the fastwebsockets backend currently doesn't even compile, since it was a dead end. I didn't want to loose the work incase it's viable some day, but I don't think it needs to be reviewed right now.

i suspect we'll lose that context over time and it deserves to be mentioned in the docstring

rust/sbd-server/src/ip_deny.rs

rust/sbd-server/src/ip_rate.rs

rust/sbd-server/src/cslot.rs

steveej · 2024-05-07T16:55:10Z

rust/sbd-server/src/cslot.rs

+        let rate_send_list = {
+            let list = {
+                // WARN - allocation here!
+                // Also, do we want to limit the max connections from same ip?


i don't think we should start out with such a limit, because it would make it difficult for groups that share the same public IP to communicate beyond their local group.

i think the limit should somehow understand unique users, e.g. via public IP key.

@steveej - I can't read between the typo here. What did you mean to suggest that was not public IP?

sorry, quite the typo 😆 i meant public key (corrected above as well)

Ah, gottcha : ) - Yeah, the main concern here would be an attacker blowing out our memory by opening up a bunch of connections all from the same ip address... since the Vec storing that is currently unbound. Distinguishing by key wouldn't mitigate that since all the connections would have unique keys.

steveej · 2024-05-07T17:00:24Z

rust/sbd-server/src/cslot.rs

+    .await;
+
+    if auth_res.is_err() {
+        return;


would it be helpful to eprintln this for debugging?

I'm experimenting with not having tracing enabled on the server. There is overhead to tracing that I'd like to avoid if possible. Separately, we may someday want some kind of audit log that could show when clients connect and when/why they are disconnected. This data could make it into that if we choose to implement that.

Co-authored-by: Stefan Junker <1181362+steveej@users.noreply.github.com>

neonphog added 17 commits April 17, 2024 18:07

checkpoint

05a736e

checkpoint

8497c14

checkpoint

a7e5a10

ci

2946ab2

ci

1ff9281

rename

356655c

ratelimit

4cfd9cc

Create spec.md

a10639c

Create README.md

69bccf9

close and benchmark

7e6161b

client tester

2ad552f

warmup on rate-limit test

5626cf1

client turnover benchmark

b27ee2c

workspace deps

6b5ad36

tls

80a82ea

tweaks

9351563

Merge branch 'blank-main' into initial

ae4d0bc

neonphog requested review from maackle, steveej and ThetaSinner April 22, 2024 17:27

ThetaSinner reviewed Apr 22, 2024

View reviewed changes

neonphog and others added 2 commits April 22, 2024 13:46

Apply suggestions from code review

161dba7

Co-authored-by: ThetaSinner <ThetaSinner@users.noreply.github.com>

address code review comments

216d1c6

ThetaSinner approved these changes Apr 23, 2024

View reviewed changes

steveej reviewed Apr 23, 2024

View reviewed changes

neonphog added 4 commits April 23, 2024 10:24

address code review comments

481b229

address code review comment

5c74530

address code review comments

4cd0be1

address code review comments

7a1887d

address code review comment

1856628

neonphog and others added 5 commits April 23, 2024 11:20

address code review commenst

8f9b50e

address code review comments

58e4c11

code review comment

52b8907

Update rust/sbd-client/src/send_buf.rs

5c8561c

Co-authored-by: Stefan Junker <1181362+steveej@users.noreply.github.com>

code review comment

f1bad2f

maackle reviewed Apr 23, 2024

View reviewed changes

address code review comments

fade996

This was referenced Apr 24, 2024

introduce Error::MsgOversized #4

Open

[spike] feature request: support trusting self-signed CA/key/key combinations #5

Open

arc in pubkey

60595d0

neonphog mentioned this pull request Apr 26, 2024

e2e-crypto #1

Merged

small api tweak and test

1f164a2

steveej approved these changes May 7, 2024

View reviewed changes

neonphog and others added 2 commits May 7, 2024 13:11

Apply suggestions from code review

93e0888

Co-authored-by: Stefan Junker <1181362+steveej@users.noreply.github.com>

address code review comments

4dc47e9

neonphog merged commit 1d0af9f into main May 7, 2024
5 checks passed

neonphog deleted the initial branch May 7, 2024 19:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial SBD Client and Server #2

Initial SBD Client and Server #2

neonphog commented Apr 22, 2024 •

edited

ThetaSinner left a comment

ThetaSinner Apr 23, 2024

ThetaSinner Apr 23, 2024

steveej Apr 23, 2024

neonphog Apr 23, 2024

ThetaSinner Apr 23, 2024

ThetaSinner Apr 23, 2024

steveej Apr 23, 2024

neonphog Apr 23, 2024

steveej May 7, 2024

jost-s commented Apr 23, 2024

steveej left a comment

steveej Apr 23, 2024

neonphog commented Apr 23, 2024 •

edited

maackle Apr 23, 2024

neonphog Apr 24, 2024

neonphog Apr 24, 2024

steveej left a comment

steveej May 7, 2024

neonphog May 7, 2024

steveej May 7, 2024

neonphog May 7, 2024

steveej May 7, 2024

neonphog May 7, 2024

steveej May 8, 2024

steveej May 7, 2024 •

edited

neonphog May 7, 2024

steveej May 8, 2024

neonphog May 8, 2024

steveej May 7, 2024

neonphog May 7, 2024

Initial SBD Client and Server #2

Initial SBD Client and Server #2

Conversation

neonphog commented Apr 22, 2024 • edited

SBD Spec

1. Websocket Stack

1.1. Websocket Configuration

1.1.1. Message and Frame Size

2. Cryptography

2.1. TLS

2.2. Ed25519

3. Protocol

3.1. Connect Path

3.2. Messages

3.2.1. Header

3.2.2. Forward

3.2.3. Flow

3.2.4. lidl and Keepalive

3.2.5. lbrt and rate limiting

3.2.6. Extensibility

3.3. Violations

ThetaSinner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jost-s commented Apr 23, 2024

steveej left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neonphog commented Apr 23, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steveej left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steveej May 7, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neonphog commented Apr 22, 2024 •

edited

3.2.4. `lidl` and Keepalive

3.2.5. `lbrt` and rate limiting

neonphog commented Apr 23, 2024 •

edited

steveej May 7, 2024 •

edited