-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-38318: [Java][FlightRPC] Enable tests that leaked #38719
Conversation
I'm not sure if explicitly closing child allocators is really finding the real problem here. This prevents leaks the same way as FlightServerTestRule. Perhaps a follow-up to find the root cause? |
@lidavidm @danepitkin , is there a way to run this PR against CI repeatedly? Since the tests were flakey failures. |
You can use the GitHub UI to restart the pipeline after it completes (need to be a committer, though), or you could add a temporary commit that configures surefire to repeat tests |
But yeah, we've observed Windows-specific quirks like this for a while and haven't been able to get to the root of them - I never managed to reproduce, either. It would be nice if CI could enable the debug log and dump out the allocation history of leaked buffers |
9d2b4c2
to
199002e
Compare
- Enable tests that were disabled due to flakey memory leaks - Explicitly close child allocators in these tests to match the behavior of FlightServerTestRule which does not leak. - Change TestBasicAuth to allocate only one server - Change TestBasicAuth2 to allocate only one server and client - Fix a bug in testAuth2#asyncPut() not including credentials
This ran 10 iterations of the problematic tests successfully. Rebasing and re-running, then will do another rebase to remove the dummy commit that triggers 10 iterations. |
199002e
to
1edc865
Compare
This reverts commit 1edc865.
@lidavidm @danepitkin , this PR is ready for review. |
@@ -97,7 +96,12 @@ public void didntAuth() { | |||
} | |||
|
|||
@BeforeEach | |||
public void setup() throws IOException { | |||
public void testSetup() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test utilize stateful authentication so a new FlightClient must be created for each test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find! Thanks @jduo
allocator.getChildAllocators().forEach(BufferAllocator::close); | ||
AutoCloseables.close(allocator); | ||
allocator = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were child allocators always not getting closed? If so, is that a design choice with Flight? I wonder if a util function to identify and close all Flight child allocators would be helpful (what if those child allocators create their own child allocators?). Could be a follow up issue if it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests were flakey before adding this, so it's likely they do correctly close() but there's a race condition that causes them to get missed. I experimented (changed these tests to use a separate allocator for FlightClient and FlightServer) and found that it's the Flight server's allocator that leaks. I think it relates to the server taking too long to shutdown during FlightServer.close(), which mentions it will leak resources if shutdown is taking too long.
I'll create a new Issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #38735 for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely not a design choice. And the only use of child allocators I see in flight is in tests, except for one created in the FlightClient (which is closed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like what we should do for the future is (1) tests need to explicitly awaitTermination on the server with a long enough timeout that all RPC calls terminate and (2) the weird behavior of FlightServer#close should be removed (it should just use a long timeout or no timeout)
allocator.getChildAllocators().forEach(BufferAllocator::close); | ||
AutoCloseables.close(allocator); | ||
allocator = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely not a design choice. And the only use of child allocators I see in flight is in tests, except for one created in the FlightClient (which is closed).
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 563078f. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them. |
) ### Rationale for this change This enables tests that are currently disabled to improve coverage and help others build tests based on these. ### What changes are included in this PR? - Enable tests that were disabled due to flakey memory leaks - Explicitly close child allocators in these tests to match the behavior of FlightServerTestRule which does not leak. - Change TestBasicAuth to allocate only one server - Change TestBasicAuth2 to allocate only one server and client - Fix a bug in testBasucAuth2#asyncPut() not including credentials ### Are these changes tested? Tested locally. ### Are there any user-facing changes? No. * Closes: apache#38318 Authored-by: James Duong <james.duong@improving.com> Signed-off-by: David Li <li.davidm96@gmail.com>
Rationale for this change
This enables tests that are currently disabled to improve coverage and help others build tests based on these.
What changes are included in this PR?
the behavior of FlightServerTestRule which does not leak.
Are these changes tested?
Tested locally.
Are there any user-facing changes?
No.