Skip to content

Conversation

@zuston
Copy link
Member

@zuston zuston commented Sep 25, 2024

What changes were proposed in this pull request?

Fix the resource leak of yarn allocator

Why are the changes needed?

When the target < running containers number, the assigned containers from the resource manager will be skipped, but these containers are not released by invoking the amClient.releaseAssignedContainer , that will make these containers reserved into the Yarn resourceManager at least 10 minutes. And so, the cluster resource will be wasted at a high ratio.

And this will reflect that the vcore * seconds statistics from yarn side will be greater than the result from the spark event logs.

From my statistics, the cluster resource waste ratio is ~25% if the spark jobs are exclusive in this cluster.

Does this PR introduce any user-facing change?

No

How was this patch tested?

In our internal hadoop cluster

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the YARN label Sep 25, 2024
@zuston zuston changed the title [SPARK-49783] fix(yarn): Resource leak of yarn allocator [SPARK-49783][YARN] Fix resource leak of yarn allocator Sep 25, 2024
@zuston
Copy link
Member Author

zuston commented Sep 26, 2024

@LuciferYang
Copy link
Contributor

LuciferYang commented Sep 26, 2024

also cc @tgravescs and @pan3793

@LuciferYang
Copy link
Contributor

Is it possible to add a new test case with MiniYARNCluster? @zuston

@zuston
Copy link
Member Author

zuston commented Sep 26, 2024

Is it possible to add a new test case with MiniYARNCluster? @zuston

From my sight, this is hard to simulate this case to reproduce in test case. But I have verified in our internal cluster, the detail verification could be found in this blog: https://zuston.vercel.app/publish/resource-leak-of-spark-yarn-allocator#Verification

@LuciferYang
Copy link
Contributor

LuciferYang commented Sep 26, 2024

The code is OK for me, but it would be even better if we could continuously ensure that this behavior meets expectations by adding an additional test case.

Also, could you provide more detailed information in the How was this patch tested? section? The description In our internal hadoop cluster is too opaque box, we need a reproducible verification method to allow reviewers to confirm that the issue truly exists and has been fixed.

@pan3793
Copy link
Member

pan3793 commented Sep 26, 2024

Changes make sense to me, but I'm confused with the impact.

cluster resource waste ratio is ~25%

Does it mean, with this change, the wasted resource could be leveraged so that all spark jobs can use more resources and execute faster? Or just shrink the gap of metric between the Spark event log and YARN?

@zuston
Copy link
Member Author

zuston commented Sep 26, 2024

Does it mean, with this change, the waste resource could be leveraged so that all spark jobs can use more resources and execute faster? Or just shrink the gap of metric between the Spark event log and YARN?

The unrelease resource will be still occupied at least 10 min in the Yarn ResouceManager, but these resources are not used by spark. So these resources are wasted.

And the gap of yarn collected vcore * seconds metrics and spark collected vcore * seconds from all finished spark jobs is the wasted resource.

I will attach some online cluster report if possible.

@mridulm
Copy link
Contributor

mridulm commented Dec 15, 2024

Looks reasonable to me, but it would be better if @tgravescs could take a look.

@zuston
Copy link
Member Author

zuston commented Dec 19, 2024

Could you help review this? @tgravescs

Copy link
Contributor

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change looks fine to me. I can't think of any reason to not do this but its been a while since I did YARN stuff too.

Can you clarify what all testing you have done? I just see tested on internal cluster. Was that one a single job, is it running on thousands of jobs every day, etc?

@zuston
Copy link
Member Author

zuston commented Dec 20, 2024

the change looks fine to me. I can't think of any reason to not do this but its been a while since I did YARN stuff too.

Can you clarify what all testing you have done? I just see tested on internal cluster. Was that one a single job, is it running on thousands of jobs every day, etc?

This patch has been applied into our internal spark3.5 version, and has been running on the hadoop3.2.1 cluster for 2 months+ with 150K+ spark jobs daily.

@tgravescs
Copy link
Contributor

Thanks, +1.

@cxzl25
Copy link
Contributor

cxzl25 commented Dec 23, 2024

This PR looks good. YARN-11702 proposes a general method. I am not sure whether it is related to this PR.

JIRA

YARN-11702: Fix Yarn over allocating containers
https://issues.apache.org/jira/browse/YARN-11702
Fix Version/s: 3.5.0

@zuston
Copy link
Member Author

zuston commented Dec 23, 2024

This PR looks good. YARN-11702 proposes a general method. I am not sure whether it is related to this PR.

JIRA

YARN-11702: Fix Yarn over allocating containers https://issues.apache.org/jira/browse/YARN-11702 Fix Version/s: 3.5.0

Thanks for your reply. I have seen YARN-11702, from my side, it solves the concurrency allocation problem about the AM and RM connection, and it is only valid that scoped in the resouce request (not including the scheduling request)

@zuston
Copy link
Member Author

zuston commented Jan 7, 2025

Looks reasonable to me, but it would be better if @tgravescs could take a look.

Could you help take another look? @mridulm

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM for Improvement.

Thank you, @zuston , @mridulm , @tgravescs , @LuciferYang , @pan3793 , @cxzl25.

Merged to master for Apache Spark 4.0.0 (Feature Freeze on January 15th).

turboFei added a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…pache#672)

[SPARK-49783][YARN] Fix resource leak of yarn allocator

Fix the resource leak of yarn allocator

When the target < running containers number, the assigned containers from the resource manager will be skipped, but these containers are not released by invoking the amClient.releaseAssignedContainer , that will make these containers reserved into the Yarn resourceManager at least 10 minutes. And so, the cluster resource will be wasted at a high ratio.

And this will reflect that the vcore * seconds statistics from yarn side will be greater than the result from the spark event logs.

From my statistics, the cluster resource waste ratio is ~25% if the spark jobs are exclusive in this cluster.

No

In our internal hadoop cluster

No

Closes apache#48238 from zuston/patch-1.

Authored-by: Junfan Zhang <zuston@apache.org>

Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Junfan Zhang <zuston@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants