Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41599] Memory leak in FileSystem.CACHE when submitting apps to secure cluster using InProcessLauncher #41692

Closed
wants to merge 1 commit into from

Conversation

risyomei
Copy link
Contributor

@risyomei risyomei commented Jun 21, 2023

What changes were proposed in this pull request?

Using FileSystem.closeAllForUGI to close the cache to prevent memory leak.

Why are the changes needed?

There seems to be a memory leak in FileSystem.CACHE when submitting apps to secure cluster using InProcessLauncher.
For more detail, see SPARK-41599

Does this PR introduce any user-facing change?

No

How was this patch tested?

I have tested the patch with my code which uses inProcessLauncher.
Confirmed that the memory leak issue is mitigated.

Screenshot 2023-06-23 at 11 46 52

I will be very helpful if I can have some feedback and I will add some test cases if required.

@github-actions github-actions bot added the CORE label Jun 21, 2023
@risyomei risyomei marked this pull request as ready for review June 27, 2023 11:59
@risyomei
Copy link
Contributor Author

The CI/CD test failure is not related to this PR.
image

@srowen
Copy link
Member

srowen commented Jun 28, 2023

Looks reasonable, @steveloughran WDYT?
Try re-running the tests, I agree they're not related and may have been transient.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not familiar enough with the code to say "this is safe", only that "just make sure that nothing has any open filesystem instances retrieved from FileSystem.get() before doing this".

(note we are having lots of fun with close() in finalize() in https://issues.apache.org/jira/browse/HADOOP-18781); there's simply no one single good solution here. pity)

@risyomei
Copy link
Contributor Author

@srowen @steveloughran
Thank you very much for your comment.

only that "just make sure that nothing has any open filesystem instances retrieved from FileSystem.get() before doing this".

This is exactly what I'm bearing in mind.
I made some in-line comment which may help you evaluate the situation.

Additionally, please advice if there is anything I can do to help you review the PR?

@@ -186,6 +186,8 @@ private[spark] class SparkSubmit extends Logging {
} else {
throw e
}
} finally {
FileSystem.closeAllForUGI(proxyUser)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing here because the UserGroupInformation#createProxyUser will create a new subject every time, and then any code executed in doAs section will be leaked if we don't close it properly.

@@ -149,6 +150,9 @@ private[spark] class HadoopDelegationTokenManager(
creds.addAll(newTokens)
}
})
if(!currentUser.equals(freshUGI)) {
FileSystem.closeAllForUGI(freshUGI)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing here because doLogin() may create new proxyuser sometime and may cause leaks

@steveloughran
Copy link
Contributor

I'm happy; if there are problems they will surface pretty rapidly

@risyomei
Copy link
Contributor Author

@steveloughran
Thank you for your comment Steve.

In that case, I will trigger the CI again.

@risyomei
Copy link
Contributor Author

@srowen
Hello, Sean,
May I ask you to take a look at this PR again, please?

@srowen srowen closed this in 7971e1c Jun 30, 2023
@srowen
Copy link
Member

srowen commented Jun 30, 2023

Merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants