-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-41599] Memory leak in FileSystem.CACHE when submitting apps to secure cluster using InProcessLauncher #41692
Conversation
fa3ebfd
to
fb0c8a6
Compare
Looks reasonable, @steveloughran WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not familiar enough with the code to say "this is safe", only that "just make sure that nothing has any open filesystem instances retrieved from FileSystem.get() before doing this".
(note we are having lots of fun with close() in finalize() in https://issues.apache.org/jira/browse/HADOOP-18781); there's simply no one single good solution here. pity)
@srowen @steveloughran
This is exactly what I'm bearing in mind. Additionally, please advice if there is anything I can do to help you review the PR? |
@@ -186,6 +186,8 @@ private[spark] class SparkSubmit extends Logging { | |||
} else { | |||
throw e | |||
} | |||
} finally { | |||
FileSystem.closeAllForUGI(proxyUser) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closing here because the UserGroupInformation#createProxyUser will create a new subject every time, and then any code executed in doAs section will be leaked if we don't close it properly.
@@ -149,6 +150,9 @@ private[spark] class HadoopDelegationTokenManager( | |||
creds.addAll(newTokens) | |||
} | |||
}) | |||
if(!currentUser.equals(freshUGI)) { | |||
FileSystem.closeAllForUGI(freshUGI) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closing here because doLogin() may create new proxyuser sometime and may cause leaks
I'm happy; if there are problems they will surface pretty rapidly |
@steveloughran In that case, I will trigger the CI again. |
fb0c8a6
to
6e4c34d
Compare
@srowen |
Merged to master |
What changes were proposed in this pull request?
Using
FileSystem.closeAllForUGI
to close the cache to prevent memory leak.Why are the changes needed?
There seems to be a memory leak in FileSystem.CACHE when submitting apps to secure cluster using InProcessLauncher.
For more detail, see SPARK-41599
Does this PR introduce any user-facing change?
No
How was this patch tested?
I have tested the patch with my code which uses inProcessLauncher.
Confirmed that the memory leak issue is mitigated.
I will be very helpful if I can have some feedback and I will add some test cases if required.