-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DLPX-73623 SSHJ remote port forwarding buffers can grow without limits #1
Conversation
I made all the requested changes, but possibly due to an earlier force-push this keeps showing up.
An update on the latest version: there was a race condition in |
This is merged to parent repo hierynomus#913 |
Problem
When storage is slow, information coming through a tunnel (remote forwarding) at a fast rate causes the forwarding buffers to grow without control and OOM the program.
Diagnosis
SSHJ's tunneling functionality has the ability to pause the data coming from the server, but it does not use it effectively. Under specific conditions such as the Linux ZFS write pauses we observed, that functionality grows the buffer when it does not really need to and fails to stop asking the server for more data.
This is how the buffer looks like from start (left) to end (right):
Here,
rpos
is the current read position where buffered data is read and sent to the destination, whereaswpos
is where the next data from the server is written.Both
rpos
andwpos
move to the right as SSHJ reads from and writes to the buffer. Whenrpos
reacheswpos
, both are reset to the start of the buffer. However, ifrpos
does not catch up withwpos
, the "tail
" (remainder of the buffer to the right ofwpos
) shrinks until no more space is left there, at which point the code grows the buffer to the right. This, however, is not necessary because there is typically a large "wasted
" portion of the buffer to the left ofrpos
that is not being used.Solution
First, make the buffer used by remote forwarding circular and ensure that the whole buffer is used before it needs to grow.
Second, add a configurable limit to the size of the buffer. By default, use no limit, which is the previous behavior. But if the limit is set, make sure that the
SSH_MSG_CHANNEL_WINDOW_ADJUST
messages sent back to the server are only sent when there is space left in the buffer, and limit the size requested from the server so that the buffer cannot grow beyond the limit.I decided not to fully convert the existing
Buffer
class, which would have been a larger and riskier change, but to create a separateCircularBuffer
which contains only the methods used by remote forwarding (ChannelInputStream
). I have no evidence that the other areas of SSHJ that useBuffer
suffer from this bug, but if that's the case, in the future this solution can be extended to add toCircularBuffer
all methods fromBuffer
used by the rest of SSHJ, and get rid ofBuffer
altogether.Testing
Created a performance test
RemotePFPerformanceTest
(which I'll have to move it out of the unit test suite because of its performance impact) that simulates a fast producer and slower reader of data in an remote-forwarding ssh tunnel. This test reproduced the bug consistently: it runs out of memory due to unlimited buffer growth. After the fix, the test completes. I verified that the buffer grows only up to the specified limit.To do: test this fix in the context of the Delphix product in the scenario that originally hit this bug.
Note: this review is about the SSHJ bug fix only. Separate reviews will be created for updating the use of this library in the Delphix product.