Skip to content

Conversation

@scottme
Copy link
Contributor

@scottme scottme commented Dec 31, 2025

What changes were proposed in this pull request?

As stated in https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/ref/Cleaner.html
The cleaning action could be a lambda but all too easily will capture the object reference, by referring to fields of the object being cleaned, preventing the object from becoming phantom reachable. Using a static nested class, as above, will avoid accidentally retaining the object reference.

For more details, and the test and analysis are in https://issues.apache.org/jira/browse/SPARK-54753

image

After running with Spark 4.0.1, the ArtififactManager is leaked, its referenced SessionState/SparkSession is as well leaked.

Why are the changes needed?

use a separate class to ref the cleanup state

Does this PR introduce any user-facing change?

No

How was this patch tested?

with test program in https://issues.apache.org/jira/browse/SPARK-54753, and use Visual VM to monitor the memory usage

Was this patch authored or co-authored using generative AI tooling?

No

This PR backports the fix in #53591 to branch 4.1
cc @dongjoon-hyun @pranavdev022 @hvanhovell @vicennial @HyukjinKwon

@github-actions
Copy link

github-actions bot commented Dec 31, 2025

JIRA Issue Information

=== Bug SPARK-54753 ===
Summary: memory leak in Apache Spark 4.0.1 as we persist/unpersist the dataset
Assignee: xihuan
Status: Resolved
Affected: ["4.0.0","4.0.1"]


This comment was automatically generated by GitHub Actions

@scottme scottme changed the title Spark 54753 4.1 [SPARK-54753][SQL] backport the fix in trunk branch to branch-4.1 Dec 31, 2025
@pan3793
Copy link
Member

pan3793 commented Dec 31, 2025

@scottme PR title should be [SPARK-54753][SQL][4.1] Fix memory leak of ArtifactManager, and please keep the original PR description with additional info like "This PR backports <PR link> to branch-4.1".

@HyukjinKwon, this looks like a critical issue, will you consider this as a blocker for 4.1.1?

@HyukjinKwon
Copy link
Member

@hvanhovell @vicennial should this be a release blocker?

@scottme scottme changed the title [SPARK-54753][SQL] backport the fix in trunk branch to branch-4.1 [SPARK-54753][SQL][4.1] Fix memory leak of ArtifactManager Dec 31, 2025
@scottme
Copy link
Contributor Author

scottme commented Dec 31, 2025

@scottme PR title should be [SPARK-54753][SQL][4.1] Fix memory leak of ArtifactManager, and please keep the original PR description with additional info like "This PR backports <PR link> to branch-4.1".

@HyukjinKwon, this looks like a critical issue, will you consider this as a blocker for 4.1.1?

updated as suggested , thank you @pan3793

@HyukjinKwon
Copy link
Member

Alright let me -1 for 4.1.1 rc1 and start RC2 with this fix tmr. It's my phone so I can't merge now 🫠

@HyukjinKwon
Copy link
Member

Merged to branch-4.1.

HyukjinKwon pushed a commit that referenced this pull request Jan 2, 2026
### What changes were proposed in this pull request?

As stated in https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/ref/Cleaner.html
**The cleaning action could be a lambda but all too easily will capture the object reference, by referring to fields of the object being cleaned, preventing the object from becoming phantom reachable. Using a static nested class, as above, will avoid accidentally retaining the object reference.**

For more details, and the test and analysis are in https://issues.apache.org/jira/browse/SPARK-54753

<img width="1462" height="559" alt="image" src="https://github.com/user-attachments/assets/83de9e8e-8f63-41fe-8318-b1cea6a1de9c" />

After running with Spark 4.0.1, the ArtififactManager is leaked, its referenced SessionState/SparkSession is as well leaked.

### Why are the changes needed?

use a separate class to ref the cleanup state

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

with  test  program in https://issues.apache.org/jira/browse/SPARK-54753, and use Visual VM to monitor the memory usage

### Was this patch authored or co-authored using generative AI tooling?

No

**This PR backports the fix in #53591 to branch 4.1**
cc dongjoon-hyun pranavdev022  hvanhovell vicennial HyukjinKwon

Closes #53654 from scottme/SPARK-54753-4.1.

Authored-by: xihuan_mstr <xihuan@strategy.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@HyukjinKwon HyukjinKwon closed this Jan 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants