Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38318: [Java][FlightRPC] Enable tests that leaked #38719

Merged
merged 3 commits into from
Nov 15, 2023

Conversation

jduo
Copy link
Member

@jduo jduo commented Nov 14, 2023

Rationale for this change

This enables tests that are currently disabled to improve coverage and help others build tests based on these.

What changes are included in this PR?

  • Enable tests that were disabled due to flakey memory leaks
  • Explicitly close child allocators in these tests to match
    the behavior of FlightServerTestRule which does not leak.
  • Change TestBasicAuth to allocate only one server
  • Change TestBasicAuth2 to allocate only one server and client
  • Fix a bug in testBasucAuth2#asyncPut() not including credentials

Are these changes tested?

Tested locally.

Are there any user-facing changes?

No.

@jduo
Copy link
Member Author

jduo commented Nov 14, 2023

I'm not sure if explicitly closing child allocators is really finding the real problem here. This prevents leaks the same way as FlightServerTestRule. Perhaps a follow-up to find the root cause?

@jduo
Copy link
Member Author

jduo commented Nov 14, 2023

@lidavidm @danepitkin , is there a way to run this PR against CI repeatedly? Since the tests were flakey failures.

@lidavidm
Copy link
Member

You can use the GitHub UI to restart the pipeline after it completes (need to be a committer, though), or you could add a temporary commit that configures surefire to repeat tests

@lidavidm
Copy link
Member

But yeah, we've observed Windows-specific quirks like this for a while and haven't been able to get to the root of them - I never managed to reproduce, either. It would be nice if CI could enable the debug log and dump out the allocation history of leaked buffers

@jduo jduo force-pushed the 38318-leaky-flight-tests branch 4 times, most recently from 9d2b4c2 to 199002e Compare November 14, 2023 21:13
- Enable tests that were disabled due to flakey memory leaks
- Explicitly close child allocators in these tests to match
  the behavior of FlightServerTestRule which does not leak.
- Change TestBasicAuth to allocate only one server
- Change TestBasicAuth2 to allocate only one server and client
- Fix a bug in testAuth2#asyncPut() not including credentials
@jduo
Copy link
Member Author

jduo commented Nov 14, 2023

This ran 10 iterations of the problematic tests successfully. Rebasing and re-running, then will do another rebase to remove the dummy commit that triggers 10 iterations.

@jduo jduo marked this pull request as ready for review November 15, 2023 16:29
@jduo jduo requested a review from lidavidm as a code owner November 15, 2023 16:29
@jduo
Copy link
Member Author

jduo commented Nov 15, 2023

@lidavidm @danepitkin , this PR is ready for review.

@@ -97,7 +96,12 @@ public void didntAuth() {
}

@BeforeEach
public void setup() throws IOException {
public void testSetup() throws IOException {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test utilize stateful authentication so a new FlightClient must be created for each test.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Nov 15, 2023
Copy link
Member

@danepitkin danepitkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! Thanks @jduo

Comment on lines +119 to 121
allocator.getChildAllocators().forEach(BufferAllocator::close);
AutoCloseables.close(allocator);
allocator = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were child allocators always not getting closed? If so, is that a design choice with Flight? I wonder if a util function to identify and close all Flight child allocators would be helpful (what if those child allocators create their own child allocators?). Could be a follow up issue if it makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests were flakey before adding this, so it's likely they do correctly close() but there's a race condition that causes them to get missed. I experimented (changed these tests to use a separate allocator for FlightClient and FlightServer) and found that it's the Flight server's allocator that leaks. I think it relates to the server taking too long to shutdown during FlightServer.close(), which mentions it will leak resources if shutdown is taking too long.

I'll create a new Issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #38735 for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely not a design choice. And the only use of child allocators I see in flight is in tests, except for one created in the FlightClient (which is closed).

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like what we should do for the future is (1) tests need to explicitly awaitTermination on the server with a long enough timeout that all RPC calls terminate and (2) the weird behavior of FlightServer#close should be removed (it should just use a long timeout or no timeout)

Comment on lines +119 to 121
allocator.getChildAllocators().forEach(BufferAllocator::close);
AutoCloseables.close(allocator);
allocator = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely not a design choice. And the only use of child allocators I see in flight is in tests, except for one created in the FlightClient (which is closed).

@lidavidm lidavidm merged commit 563078f into apache:main Nov 15, 2023
22 checks passed
@lidavidm lidavidm removed the awaiting committer review Awaiting committer review label Nov 15, 2023
@github-actions github-actions bot added awaiting review Awaiting review awaiting merge Awaiting merge labels Nov 15, 2023
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 563078f.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
)

### Rationale for this change
This enables tests that are currently disabled to improve coverage and help others build tests based on these.

### What changes are included in this PR?
- Enable tests that were disabled due to flakey memory leaks
- Explicitly close child allocators in these tests to match
  the behavior of FlightServerTestRule which does not leak.
- Change TestBasicAuth to allocate only one server
- Change TestBasicAuth2 to allocate only one server and client
- Fix a bug in testBasucAuth2#asyncPut() not including credentials

### Are these changes tested?
Tested locally.

### Are there any user-facing changes?
No.
* Closes: apache#38318

Authored-by: James Duong <james.duong@improving.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Java] Fix memory leaks in disabled Flight tests
3 participants