Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channel buffer autosize #4

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Channel buffer autosize #4

wants to merge 4 commits into from

Conversation

djmdjm
Copy link
Owner

@djmdjm djmdjm commented Oct 2, 2024

Portable OpenSSH version of djmdjm/openssh-wip#29

djmdjm added 4 commits October 3, 2024 02:13
The SSH protocol performs flow control using a channel receive
window that may only be set at the time a channel is created and
cannot be subsequently modified.

The current default window size is sufficient for most uses, but
on high bandwidth x delay links it is not enough. Increasing the
default window size is possible, but carries the cost of additional
worst-case memory usage if the output side is slow.

This implements a protocol extension to vary the maximum channel
window after the channel has been created, and uses this extension
to grow the window if the TCP receive window increases in size.
TCP has had a half-century of optimisation, so we should copy
its homework wherever possible.

This is based on work by Denzel Strauss and Sally Sun during their
STEP iternship at Google.
spotted by Alex James
avoids getpeername calls that violate pledge sandbox later
@anthonyryan1
Copy link

After apply this patch, my ssh log is filled with the following errors:

Oct 14 21:06:27 receiving-server sshd-session[11366]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 32768)
Oct 14 21:06:27 receiving-server sshd-session[11366]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 65536)
Oct 14 21:06:27 receiving-server sshd-session[11366]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 98304)
Oct 14 21:06:27 receiving-server sshd-session[11366]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 131072)
Oct 14 21:06:27 receiving-server sshd-session[11366]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 163840)
Oct 14 21:06:27 receiving-server sshd-session[11366]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 196608)
Oct 14 21:06:27 receiving-server sshd-session[11366]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 32768)
Oct 14 21:06:27 receiving-server sshd-session[11366]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 65536)
Oct 14 21:06:27 receiving-server sshd-session[11366]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 98304)
Oct 14 21:06:27 receiving-server sshd-session[11366]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 131072)
Oct 14 21:06:27 receiving-server sshd-session[11366]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 163840)
Oct 14 21:06:27 receiving-server sshd-session[11366]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 196608)
...
Oct 14 21:14:16 receiving-server sshd-session[48231]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 196664)
Oct 14 21:14:16 receiving-server sshd-session[48231]: channel 0: rcvd too much data 8, win 0/134217728 (excess 196672)
Oct 14 21:14:16 receiving-server sshd-session[48231]: channel 0: rcvd too much data 32768, win 32760/134217728 (excess 8)
Oct 14 21:14:16 receiving-server sshd-session[48231]: channel 0: rcvd too much data 8, win 0/134217728 (excess 16)
Oct 14 21:14:16 receiving-server sshd-session[48231]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 32784)
Oct 14 21:14:16 receiving-server sshd-session[48231]: channel 0: rcvd too much data 8, win 0/134217728 (excess 32792)
Oct 14 21:14:16 receiving-server sshd-session[48231]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 65560)
Oct 14 21:14:16 receiving-server sshd-session[48231]: channel 0: rcvd too much data 8, win 0/134217728 (excess 65568)
Oct 14 21:14:16 receiving-server sshd-session[48231]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 98336)
Oct 14 21:14:16 receiving-server sshd-session[48231]: channel 0: rcvd too much data 8, win 0/134217728 (excess 98344)
Oct 14 21:14:16 receiving-server sshd-session[48231]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 131112)
Oct 14 21:14:16 receiving-server sshd-session[48231]: channel 0: rcvd too much data 8, win 0/134217728 (excess 131120)
Oct 14 21:14:16 receiving-server sshd-session[48231]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 163888)
Oct 14 21:14:16 receiving-server sshd-session[48231]: channel 0: rcvd too much data 8, win 0/134217728 (excess 163896)
Oct 14 21:14:16 receiving-server sshd-session[48231]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 196664)
Oct 14 21:14:16 receiving-server sshd-session[48231]: channel 0: rcvd too much data 8, win 0/134217728 (excess 196672)
Oct 14 21:14:16 receiving-server sshd-session[48231]: channel 0: rcvd too much data 32768, win 32760/134217728 (excess 8)
Oct 14 21:14:17 receiving-server sshd-session[48231]: channel 0: rcvd too much data 8, win 0/134217728 (excess 16)
Oct 14 21:14:17 receiving-server sshd-session[48231]: channel 0: rcvd too much data 32768, win 0/134217728 (excess 32784)

Which seems like the next item on this patch. Definitely seeing higher throughput though.

@anthonyryan1
Copy link

anthonyryan1 commented Oct 14, 2024

I tried the following patch to see how this patch works without the "rcvd too much data" flood, based on v9.8p1

diff --git a/channels.c b/channels.c
index 3ee694717..3ce3061c1 100644
@@ -3526,16 +3526,8 @@
 		return 0;
 	}
 	if (win_len > c->local_window) {
-		c->local_window_exceeded += win_len - c->local_window;
-		logit("channel %d: rcvd too much data %zu, win %u/%u "
-		    "(excess %u)", c->self, win_len, c->local_window,
-		    c->local_window_max, c->local_window_exceeded);
-		c->local_window = 0;
-		/* Allow 10% grace before bringing the hammer down */
-		if (c->local_window_exceeded > (c->local_window_max / 10)) {
-			ssh_packet_disconnect(ssh, "channel %d: peer ignored "
-			    "channel window", c->self);
-		}
+		c->local_window += win_len - c->local_window;
+		c->local_window_exceeded = 0;
 	} else {
 		c->local_window -= win_len;
 		c->local_window_exceeded = 0;

And with that plus this patch I'm now seeing 3 Gbps of rsync throughput over 250 ms of latency, which a dramatic improvement.

So this patch is definitely working, I feel like the problems I was still seeing were stemming from that error flood and it's local_window reset? Hopefully someone more familiar can explore the correct way to resolve this.

@anthonyryan1
Copy link

anthonyryan1 commented Oct 14, 2024

I'm also still seeing disconnects after a random amount of transfer time.

Read from remote host receiving-server: Connection reset by peer
client_loop: send disconnect: Broken pipe

I haven't found any correlation yet.

@dgl
Copy link

dgl commented Nov 4, 2024

@anthonyryan1 maybe try something like this:

diff --git a/packet.c b/packet.c
index 046ca67ef..cefe762df 100644
--- a/packet.c
+++ b/packet.c
@@ -698,6 +698,10 @@ ssh_packet_update_tcp_rwin(struct ssh *ssh)
                    ssh->state->connection_in, strerror(errno));
                return;
        }
+#ifdef __linux__
+        /* See tcp(7): "TCP actually allocates twice the size of the buffer requested in the setsockopt" */
+        rwin /= 2;
+#endif
        if (ssh->state->tcp_rwin != -1 && rwin != ssh->state->tcp_rwin) {
                debug3_f("TCP receive window %d -> %d (%+d)",
                    ssh->state->tcp_rwin, rwin, rwin - ssh->state->tcp_rwin);

Without this I saw some strange behaviour because I think channel updates were stuck in a buffer so it would essentially deadlock. (I wonder if this is related to the comment just above the code you mention for half open channels...)

@dgl
Copy link

dgl commented Nov 5, 2024

Some of this depends on TCP window size and what your systems will tune them to, for example if using Cloudflare's WAN tuning the maximum is 512MB, which is more than SSHBUF_SIZE_MAX. it is possible to get into a situation where autotuning of the window size will exceed the hard limit on the buffer. I think this is why @anthonyryan1 is seeing some disconnections -- removing the window limiting code means it can reach the hard limit which is a fatal error .

The below patch on top of this one does a few things which result in better performance without any drops for me (tested on Linux netem with 200ms latency, as well as real connection):

  • Calculates a maximum that is 90% of the sshbuf max size, minus a packet (a slight heuristic, but sometimes the window updates seem to be slightly off, this just avoids a debug warning)
  • Preallocates the whole buffer, as TCP has resized it anyway; this avoids many small reallocs which therefore avoids a drop in speed while the buffer is resized many times. (I tested this along with the sshbuf patch and with both I see the best performance).
diff --git a/channels.c b/channels.c
index b36c84799..7cd27f894 100644
--- a/channels.c
+++ b/channels.c
@@ -2409,7 +2409,10 @@ static void
 channel_check_max_window(struct ssh *ssh, Channel *c)
 {
        int r, tcp_rwin;
-       u_int new_window_max;
+       u_int new_window_max, buf_max_size, add_win;
+
+        // XXX: Leave enough space incase we have a TCP window as large as the sshbuf size to allow for a few window updates.
+        buf_max_size = (sshbuf_max_size(c->output) * .9) - CHAN_SES_PACKET_DEFAULT;

        if (ssh->kex == NULL ||
            (ssh->kex->flags & KEX_HAS_CHANNEL_MAX_WINDOW) == 0)
@@ -2421,7 +2424,7 @@ channel_check_max_window(struct ssh *ssh, Channel *c)
                return; /* channel not in open state */
        if ((tcp_rwin = ssh_packet_get_tcp_rwin(ssh)) == -1)
                return; /* no TCP window recorded */
-       if (c->local_window_max >= sshbuf_max_size(c->output))
+       if (c->local_window_max >= buf_max_size)
                return; /* Already at max */

        /*
@@ -2448,10 +2451,11 @@ channel_check_max_window(struct ssh *ssh, Channel *c)

        /* Allow some oversubscription of the TCP window across channels */
        new_window_max = (3 * (u_int)tcp_rwin) / channel_count_open(ssh);
+
        if (new_window_max > (u_int)tcp_rwin)
                new_window_max = (u_int)tcp_rwin;
-       if (new_window_max > sshbuf_max_size(c->output))
-               new_window_max = sshbuf_max_size(c->output);
+       if (new_window_max > buf_max_size)
+               new_window_max = buf_max_size;

        if (new_window_max <= c->local_window_max)
                return; /* don't shrink */
@@ -2464,7 +2468,11 @@ channel_check_max_window(struct ssh *ssh, Channel *c)
        debug2_f("channel %d: sent max-window@openssh.com "
            "%d -> %d (+%d), peer TCP rwin %d", c->self, c->local_window_max,
            new_window_max, new_window_max - c->local_window_max, tcp_rwin);
+        add_win = new_window_max - c->local_window_max;
+        c->local_window += add_win;
        c->local_window_max = new_window_max;
+        /* TCP has resized its buffer, so preallocate the whole buffer to avoid many small resizes. */
+        sshbuf_allocate(c->output, add_win);
 }

 static void

I'm also running this with the Linux specific patch I posted above to correct the TCP rwin (otherwise this potentially just wastes buffer space).

Posting this in the portable PR as the discussion is here, although the above should be applicable to everything. There is an interesting post from Cloudflare on improving HTTP/2 upload speed, which has a similar flow control mechanism and that uses Linux's tcp_info to optimise the buffer size using the RTT, I think that could be worth investigation if the heuristics in this patch needed improving, but obviously it's not portable.

@djmdjm
Copy link
Owner Author

djmdjm commented Nov 5, 2024

btw sorry for being inactive here - I haven't had time to work on these patches and probably won't for a while longer.

@anthonyryan1
Copy link

Some of this depends on TCP window size and what your systems will tune them to. [If the] maximum is 512MB, which is more than SSHBUF_SIZE_MAX. it is possible to get into a situation where autotuning of the window size will exceed the hard limit on the buffer.

That is, in hindsight obviously something I should have mentioned about my configuration. The machine was tuned with buffers not unlike cloudflare to allow TCP throughput over the same long fat pipe scenario. I've had them set that way for so many years that I forgot to mention it. My apologies.

I've confirmed your patch iteration on top of this PR resolves all the disconnects I was experiencing. Successfully transferred 1 TB of data at a reasonable speed without any disconnects.

For my purposes, this PR, plus your iteration is a world better than the current state. I'm still getting what looks like abnormally high CPU usage at the start of transfers, but it calms down once they get up to speed. But I feel like that could be addressed later to avoid growing this PR too large and unweildly.

We've made it work! So next we need to make it right (probably by getting more eyes on this patch), then we can make it fast(er) later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants