Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Memory leak in Flight DoGet when client cancels #23623

Closed
asfimport opened this issue Dec 6, 2019 · 6 comments
Closed

[Java] Memory leak in Flight DoGet when client cancels #23623

asfimport opened this issue Dec 6, 2019 · 6 comments

Comments

@asfimport
Copy link

I believe this causes things like ARROW-4765.

If a stream is interrupted or otherwise not drained by the client, the serialized form of the ArrowMessage (DrainableByteBufInputStream) will sit around forever, leaking memory.

Reporter: David Li / @lidavidm
Assignee: David Li / @lidavidm

PRs and other links:

Note: This issue was originally created as ARROW-7343. Please see the migration documentation for further details.

@asfimport
Copy link
Author

David Li / @lidavidm:
Found via debugging an internal application:

  1. Client cancels a doGet.
  2. Server doesn't observe this and calls putNext. (or, the cancellation happens in between when the server calls putNext and the client cancels.)
  3. An ArrowMessage is created, retaining references to buffers. The references won't be cleared until ArrowMessage#asInputStream is called.
  4. ServerCallStreamObserverImpl#onNext finds that the call has been cancelled, and Flight has set an onCancel handler in FlightService.GetListener.<constructor>. Thus, it doesn't throw, and returns early (before calling ServerCallImpl#sendMessage).
  5. Thus, the message is never freed.

@asfimport
Copy link
Author

David Li / @lidavidm:
I don't particularly like this, but one way to fix this is to have GetListener retain (Java) references to ArrowMessage so it can force clean them (perhaps on a rolling basis).

@asfimport
Copy link
Author

Wes McKinney / @wesm:
cc [~jacques]

@asfimport
Copy link
Author

David Li / @lidavidm:
I think a reasonable fix is to have a flag on ArrowMessage indicating whether we've transitioned ownership of the buffer to Netty/gRPC; if we send a message and the flag isn't set, then we should immediately clean up the buffers. (I'm wondering if there is a case where we transition ownership to Netty but Netty itself drops the buffers, I don't believe there is such a case.)

There are some follow up issues I want to investigate and maybe file:

  • Is DoPut susceptible to the same issue? (Probably.)
  • ServerStreamListener#isCancelled is rather useless, since gRPC actually maintains two distinct (and only somewhat correlated) flags for whether a call has been cancelled, and only the one one that isn't easily observable by application code actually matters. (Maybe this is worth filing an upstream bug, and maybe we can wrap gRPC to expose stronger semantics - but I'm not sure given the asynchronicity of gRPC.)
  • In general, we should review how we're wrapping gRPC concepts and either make precise the guarantees we provide relative to gRPC, or make sure we actually expose the full range of gRPC concepts.
  • Again, maybe we really need an async API, at least in Java; while the sync API is more convenient, it does lead to weird issues like this.

@asfimport
Copy link
Author

David Li / @lidavidm:
I've implemented the fix in the linked PR. I also found that DoPut could leak for different reasons (we call gRPC methods that can throw, causing us to skip cleanup), which I've fixed.

@asfimport
Copy link
Author

David Li / @lidavidm:
Issue resolved by pull request 6003
#6003

@asfimport asfimport added this to the 0.16.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants