Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DLPX-73623 SSHJ remote port forwarding buffers can grow without limits #1

Merged
merged 2 commits into from
Feb 26, 2021

Conversation

rasantel
Copy link

@rasantel rasantel commented Feb 5, 2021

Problem

When storage is slow, information coming through a tunnel (remote forwarding) at a fast rate causes the forwarding buffers to grow without control and OOM the program.

Diagnosis

SSHJ's tunneling functionality has the ability to pause the data coming from the server, but it does not use it effectively. Under specific conditions such as the Linux ZFS write pauses we observed, that functionality grows the buffer when it does not really need to and fails to stop asking the server for more data.

This is how the buffer looks like from start (left) to end (right):

  start     rpos   wpos   end
    | wasted | data | tail |

Here, rpos is the current read position where buffered data is read and sent to the destination, whereas wpos is where the next data from the server is written.

Both rpos and wpos move to the right as SSHJ reads from and writes to the buffer. When rpos reaches wpos, both are reset to the start of the buffer. However, if rpos does not catch up with wpos, the "tail" (remainder of the buffer to the right of wpos) shrinks until no more space is left there, at which point the code grows the buffer to the right. This, however, is not necessary because there is typically a large "wasted" portion of the buffer to the left of rpos that is not being used.

Solution

First, make the buffer used by remote forwarding circular and ensure that the whole buffer is used before it needs to grow.

Second, add a configurable limit to the size of the buffer. By default, use no limit, which is the previous behavior. But if the limit is set, make sure that the SSH_MSG_CHANNEL_WINDOW_ADJUST messages sent back to the server are only sent when there is space left in the buffer, and limit the size requested from the server so that the buffer cannot grow beyond the limit.

I decided not to fully convert the existing Buffer class, which would have been a larger and riskier change, but to create a separate CircularBuffer which contains only the methods used by remote forwarding (ChannelInputStream). I have no evidence that the other areas of SSHJ that use Buffer suffer from this bug, but if that's the case, in the future this solution can be extended to add to CircularBuffer all methods from Buffer used by the rest of SSHJ, and get rid of Buffer altogether.

Testing

Created a performance test RemotePFPerformanceTest (which I'll have to move it out of the unit test suite because of its performance impact) that simulates a fast producer and slower reader of data in an remote-forwarding ssh tunnel. This test reproduced the bug consistently: it runs out of memory due to unlimited buffer growth. After the fix, the test completes. I verified that the buffer grows only up to the specified limit.

To do: test this fix in the context of the Delphix product in the scenario that originally hit this bug.

Note: this review is about the SSHJ bug fix only. Separate reviews will be created for updating the use of this library in the Delphix product.

sebroy
sebroy previously requested changes Feb 10, 2021
src/main/java/net/schmizz/sshj/common/CircularBuffer.java Outdated Show resolved Hide resolved
src/main/java/net/schmizz/sshj/common/CircularBuffer.java Outdated Show resolved Hide resolved
src/main/java/net/schmizz/sshj/common/CircularBuffer.java Outdated Show resolved Hide resolved
src/main/java/net/schmizz/sshj/common/CircularBuffer.java Outdated Show resolved Hide resolved
src/main/java/net/schmizz/sshj/common/CircularBuffer.java Outdated Show resolved Hide resolved
src/main/java/net/schmizz/sshj/common/CircularBuffer.java Outdated Show resolved Hide resolved
src/main/java/net/schmizz/sshj/common/CircularBuffer.java Outdated Show resolved Hide resolved
src/main/java/net/schmizz/sshj/common/CircularBuffer.java Outdated Show resolved Hide resolved
@rasantel rasantel requested a review from sebroy February 10, 2021 23:44
@rasantel rasantel dismissed sebroy’s stale review February 10, 2021 23:46

I made all the requested changes, but possibly due to an earlier force-push this keeps showing up.

@rasantel
Copy link
Author

An update on the latest version: there was a race condition in ChannelInputStream which caused sometimes to request the server more bytes than the buffer had the capacity to hold. This happened because reading and writing to the buffer was done in separate synchronized blocks, so that after reading from the buffer the reading thread would base its message on a window size that was larger than it actually is,

@rasantel rasantel changed the base branch from master to DLPX-73623 February 26, 2021 20:26
@rasantel rasantel merged commit de9e4af into delphix:DLPX-73623 Feb 26, 2021
@rasantel rasantel mentioned this pull request Mar 4, 2021
@VenkatanadhanG
Copy link

This is merged to parent repo hierynomus#913

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants