Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Nov 26, 2025

What changes were proposed in this pull request?

In Spark Connect, every client session has its own class loader in the server side (both driver and executors), for isolation purpose. However, when closing the session, we don't close the class loader. We rely on GC to close the class loader and release the resource, but it's unpredicatable and not efficient. When sessions are being created frequently, OOM may happen.

This PR fixes the issue by closing URLClassLoader eagerly when closing a session. It also adds a config so that we can tune the session cache size in the executor side to save memory. With this patch, OOM is gone.

Why are the changes needed?

avoid OOM

Does this PR introduce any user-facing change?

no

How was this patch tested?

Our internal CI service has limited memory and SparkConnectJdbcDataTypeSuite keeps failing due to OOM. This test suite creates and closes sessions back to back frequently, and GC can't catch up to release the class loaders.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 2.1.36

@cloud-fan cloud-fan changed the title [SPARK-XXX][CONNECT] Close URLClassLoader eagerly to avoid OOM [SPARK-54528][CONNECT] Close URLClassLoader eagerly to avoid OOM Nov 26, 2025
Copy link
Contributor

@vicennial vicennial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the improvement!

Comment on lines 223 to 236
// Close classloaders to release resources.
try {
state.replClassLoader match {
case urlClassLoader: URLClassLoader =>
urlClassLoader.close()
logInfo(log"Closed replClassLoader (URLClassLoader) for evicted session " +
log"${MDC(SESSION_ID, state.sessionUUID)}")
case _ =>
}
} catch {
case NonFatal(e) =>
logWarning(log"Failed to close replClassLoader for session " +
log"${MDC(SESSION_ID, state.sessionUUID)}", e)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is redundant since, for isolated sessions, the replClassLoader is always set to ExecutorClassLoader here (and thus, never hits the match case) since the URI is always set

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @cloud-fan .

Merged to master/4.1 for Apache Spark 4.1.0.

dongjoon-hyun pushed a commit that referenced this pull request Nov 26, 2025
### What changes were proposed in this pull request?

In Spark Connect, every client session has its own class loader in the server side (both driver and executors), for isolation purpose. However, when closing the session, we don't close the class loader. We rely on GC to close the class loader and release the resource, but it's unpredicatable and not efficient. When sessions are being created frequently, OOM may happen.

This PR fixes the issue by closing URLClassLoader eagerly when closing a session. It also adds a config so that we can tune the session cache size in the executor side to save memory. With this patch, OOM is gone.

### Why are the changes needed?

avoid OOM

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

Our internal CI service has limited memory and `SparkConnectJdbcDataTypeSuite` keeps failing due to OOM. This test suite creates and closes sessions back to back frequently, and GC can't catch up to release the class loaders.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 2.1.36

Closes #53233 from cloud-fan/mem.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit e264965)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants