-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54528][CONNECT] Close URLClassLoader eagerly to avoid OOM #53233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
vicennial
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the improvement!
| // Close classloaders to release resources. | ||
| try { | ||
| state.replClassLoader match { | ||
| case urlClassLoader: URLClassLoader => | ||
| urlClassLoader.close() | ||
| logInfo(log"Closed replClassLoader (URLClassLoader) for evicted session " + | ||
| log"${MDC(SESSION_ID, state.sessionUUID)}") | ||
| case _ => | ||
| } | ||
| } catch { | ||
| case NonFatal(e) => | ||
| logWarning(log"Failed to close replClassLoader for session " + | ||
| log"${MDC(SESSION_ID, state.sessionUUID)}", e) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is redundant since, for isolated sessions, the replClassLoader is always set to ExecutorClassLoader here (and thus, never hits the match case) since the URI is always set
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @cloud-fan .
Merged to master/4.1 for Apache Spark 4.1.0.
### What changes were proposed in this pull request? In Spark Connect, every client session has its own class loader in the server side (both driver and executors), for isolation purpose. However, when closing the session, we don't close the class loader. We rely on GC to close the class loader and release the resource, but it's unpredicatable and not efficient. When sessions are being created frequently, OOM may happen. This PR fixes the issue by closing URLClassLoader eagerly when closing a session. It also adds a config so that we can tune the session cache size in the executor side to save memory. With this patch, OOM is gone. ### Why are the changes needed? avoid OOM ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Our internal CI service has limited memory and `SparkConnectJdbcDataTypeSuite` keeps failing due to OOM. This test suite creates and closes sessions back to back frequently, and GC can't catch up to release the class loaders. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 2.1.36 Closes #53233 from cloud-fan/mem. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit e264965) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
In Spark Connect, every client session has its own class loader in the server side (both driver and executors), for isolation purpose. However, when closing the session, we don't close the class loader. We rely on GC to close the class loader and release the resource, but it's unpredicatable and not efficient. When sessions are being created frequently, OOM may happen.
This PR fixes the issue by closing URLClassLoader eagerly when closing a session. It also adds a config so that we can tune the session cache size in the executor side to save memory. With this patch, OOM is gone.
Why are the changes needed?
avoid OOM
Does this PR introduce any user-facing change?
no
How was this patch tested?
Our internal CI service has limited memory and
SparkConnectJdbcDataTypeSuitekeeps failing due to OOM. This test suite creates and closes sessions back to back frequently, and GC can't catch up to release the class loaders.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 2.1.36