-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Channel buffer autosize #4
base: master
Are you sure you want to change the base?
Conversation
The SSH protocol performs flow control using a channel receive window that may only be set at the time a channel is created and cannot be subsequently modified. The current default window size is sufficient for most uses, but on high bandwidth x delay links it is not enough. Increasing the default window size is possible, but carries the cost of additional worst-case memory usage if the output side is slow. This implements a protocol extension to vary the maximum channel window after the channel has been created, and uses this extension to grow the window if the TCP receive window increases in size. TCP has had a half-century of optimisation, so we should copy its homework wherever possible. This is based on work by Denzel Strauss and Sally Sun during their STEP iternship at Google.
spotted by Alex James
avoids getpeername calls that violate pledge sandbox later
114a332 to
67a115e
Compare
|
After apply this patch, my ssh log is filled with the following errors: Which seems like the next item on this patch. Definitely seeing higher throughput though. |
|
I tried the following patch to see how this patch works without the "rcvd too much data" flood, based on v9.8p1 diff --git a/channels.c b/channels.c
index 3ee694717..3ce3061c1 100644
@@ -3526,16 +3526,8 @@
return 0;
}
if (win_len > c->local_window) {
- c->local_window_exceeded += win_len - c->local_window;
- logit("channel %d: rcvd too much data %zu, win %u/%u "
- "(excess %u)", c->self, win_len, c->local_window,
- c->local_window_max, c->local_window_exceeded);
- c->local_window = 0;
- /* Allow 10% grace before bringing the hammer down */
- if (c->local_window_exceeded > (c->local_window_max / 10)) {
- ssh_packet_disconnect(ssh, "channel %d: peer ignored "
- "channel window", c->self);
- }
+ c->local_window += win_len - c->local_window;
+ c->local_window_exceeded = 0;
} else {
c->local_window -= win_len;
c->local_window_exceeded = 0;And with that plus this patch I'm now seeing 3 Gbps of rsync throughput over 250 ms of latency, which a dramatic improvement. So this patch is definitely working, I feel like the problems I was still seeing were stemming from that error flood and it's local_window reset? Hopefully someone more familiar can explore the correct way to resolve this. |
|
I'm also still seeing disconnects after a random amount of transfer time. I haven't found any correlation yet. |
|
@anthonyryan1 maybe try something like this: diff --git a/packet.c b/packet.c
index 046ca67ef..cefe762df 100644
--- a/packet.c
+++ b/packet.c
@@ -698,6 +698,10 @@ ssh_packet_update_tcp_rwin(struct ssh *ssh)
ssh->state->connection_in, strerror(errno));
return;
}
+#ifdef __linux__
+ /* See tcp(7): "TCP actually allocates twice the size of the buffer requested in the setsockopt" */
+ rwin /= 2;
+#endif
if (ssh->state->tcp_rwin != -1 && rwin != ssh->state->tcp_rwin) {
debug3_f("TCP receive window %d -> %d (%+d)",
ssh->state->tcp_rwin, rwin, rwin - ssh->state->tcp_rwin);Without this I saw some strange behaviour because I think channel updates were stuck in a buffer so it would essentially deadlock. (I wonder if this is related to the comment just above the code you mention for half open channels...) |
|
Some of this depends on TCP window size and what your systems will tune them to, for example if using Cloudflare's WAN tuning the maximum is 512MB, which is more than The below patch on top of this one does a few things which result in better performance without any drops for me (tested on Linux netem with 200ms latency, as well as real connection):
diff --git a/channels.c b/channels.c
index b36c84799..7cd27f894 100644
--- a/channels.c
+++ b/channels.c
@@ -2409,7 +2409,10 @@ static void
channel_check_max_window(struct ssh *ssh, Channel *c)
{
int r, tcp_rwin;
- u_int new_window_max;
+ u_int new_window_max, buf_max_size, add_win;
+
+ // XXX: Leave enough space incase we have a TCP window as large as the sshbuf size to allow for a few window updates.
+ buf_max_size = (sshbuf_max_size(c->output) * .9) - CHAN_SES_PACKET_DEFAULT;
if (ssh->kex == NULL ||
(ssh->kex->flags & KEX_HAS_CHANNEL_MAX_WINDOW) == 0)
@@ -2421,7 +2424,7 @@ channel_check_max_window(struct ssh *ssh, Channel *c)
return; /* channel not in open state */
if ((tcp_rwin = ssh_packet_get_tcp_rwin(ssh)) == -1)
return; /* no TCP window recorded */
- if (c->local_window_max >= sshbuf_max_size(c->output))
+ if (c->local_window_max >= buf_max_size)
return; /* Already at max */
/*
@@ -2448,10 +2451,11 @@ channel_check_max_window(struct ssh *ssh, Channel *c)
/* Allow some oversubscription of the TCP window across channels */
new_window_max = (3 * (u_int)tcp_rwin) / channel_count_open(ssh);
+
if (new_window_max > (u_int)tcp_rwin)
new_window_max = (u_int)tcp_rwin;
- if (new_window_max > sshbuf_max_size(c->output))
- new_window_max = sshbuf_max_size(c->output);
+ if (new_window_max > buf_max_size)
+ new_window_max = buf_max_size;
if (new_window_max <= c->local_window_max)
return; /* don't shrink */
@@ -2464,7 +2468,11 @@ channel_check_max_window(struct ssh *ssh, Channel *c)
debug2_f("channel %d: sent max-window@openssh.com "
"%d -> %d (+%d), peer TCP rwin %d", c->self, c->local_window_max,
new_window_max, new_window_max - c->local_window_max, tcp_rwin);
+ add_win = new_window_max - c->local_window_max;
+ c->local_window += add_win;
c->local_window_max = new_window_max;
+ /* TCP has resized its buffer, so preallocate the whole buffer to avoid many small resizes. */
+ sshbuf_allocate(c->output, add_win);
}
static voidI'm also running this with the Linux specific patch I posted above to correct the TCP rwin (otherwise this potentially just wastes buffer space). Posting this in the portable PR as the discussion is here, although the above should be applicable to everything. There is an interesting post from Cloudflare on improving HTTP/2 upload speed, which has a similar flow control mechanism and that uses Linux's |
|
btw sorry for being inactive here - I haven't had time to work on these patches and probably won't for a while longer. |
That is, in hindsight obviously something I should have mentioned about my configuration. The machine was tuned with buffers not unlike cloudflare to allow TCP throughput over the same long fat pipe scenario. I've had them set that way for so many years that I forgot to mention it. My apologies. I've confirmed your patch iteration on top of this PR resolves all the disconnects I was experiencing. Successfully transferred 1 TB of data at a reasonable speed without any disconnects. For my purposes, this PR, plus your iteration is a world better than the current state. I'm still getting what looks like abnormally high CPU usage at the start of transfers, but it calms down once they get up to speed. But I feel like that could be addressed later to avoid growing this PR too large and unweildly. We've made it work! So next we need to make it right (probably by getting more eyes on this patch), then we can make it fast(er) later. |
Portable OpenSSH version of djmdjm/openssh-wip#29