-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-49783][YARN] Fix resource leak of yarn allocator #48238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
also cc @tgravescs and @pan3793 |
|
Is it possible to add a new test case with MiniYARNCluster? @zuston |
From my sight, this is hard to simulate this case to reproduce in test case. But I have verified in our internal cluster, the detail verification could be found in this blog: https://zuston.vercel.app/publish/resource-leak-of-spark-yarn-allocator#Verification |
|
The code is OK for me, but it would be even better if we could continuously ensure that this behavior meets expectations by adding an additional test case. Also, could you provide more detailed information in the |
|
Changes make sense to me, but I'm confused with the impact.
Does it mean, with this change, the wasted resource could be leveraged so that all spark jobs can use more resources and execute faster? Or just shrink the gap of metric between the Spark event log and YARN? |
The unrelease resource will be still occupied at least 10 min in the Yarn ResouceManager, but these resources are not used by spark. So these resources are wasted. And the gap of yarn collected vcore * seconds metrics and spark collected vcore * seconds from all finished spark jobs is the wasted resource. I will attach some online cluster report if possible. |
|
Looks reasonable to me, but it would be better if @tgravescs could take a look. |
|
Could you help review this? @tgravescs |
tgravescs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the change looks fine to me. I can't think of any reason to not do this but its been a while since I did YARN stuff too.
Can you clarify what all testing you have done? I just see tested on internal cluster. Was that one a single job, is it running on thousands of jobs every day, etc?
This patch has been applied into our internal spark3.5 version, and has been running on the hadoop3.2.1 cluster for 2 months+ with 150K+ spark jobs daily. |
|
Thanks, +1. |
|
This PR looks good. YARN-11702 proposes a general method. I am not sure whether it is related to this PR. JIRAYARN-11702: Fix Yarn over allocating containers |
Thanks for your reply. I have seen YARN-11702, from my side, it solves the concurrency allocation problem about the AM and RM connection, and it is only valid that scoped in the resouce request (not including the scheduling request) |
Could you help take another look? @mridulm |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM for Improvement.
Thank you, @zuston , @mridulm , @tgravescs , @LuciferYang , @pan3793 , @cxzl25.
Merged to master for Apache Spark 4.0.0 (Feature Freeze on January 15th).
…pache#672) [SPARK-49783][YARN] Fix resource leak of yarn allocator Fix the resource leak of yarn allocator When the target < running containers number, the assigned containers from the resource manager will be skipped, but these containers are not released by invoking the amClient.releaseAssignedContainer , that will make these containers reserved into the Yarn resourceManager at least 10 minutes. And so, the cluster resource will be wasted at a high ratio. And this will reflect that the vcore * seconds statistics from yarn side will be greater than the result from the spark event logs. From my statistics, the cluster resource waste ratio is ~25% if the spark jobs are exclusive in this cluster. No In our internal hadoop cluster No Closes apache#48238 from zuston/patch-1. Authored-by: Junfan Zhang <zuston@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> Co-authored-by: Junfan Zhang <zuston@apache.org>
What changes were proposed in this pull request?
Fix the resource leak of yarn allocator
Why are the changes needed?
When the target < running containers number, the assigned containers from the resource manager will be skipped, but these containers are not released by invoking the amClient.releaseAssignedContainer , that will make these containers reserved into the Yarn resourceManager at least 10 minutes. And so, the cluster resource will be wasted at a high ratio.
And this will reflect that the vcore * seconds statistics from yarn side will be greater than the result from the spark event logs.
From my statistics, the cluster resource waste ratio is ~25% if the spark jobs are exclusive in this cluster.
Does this PR introduce any user-facing change?
No
How was this patch tested?
In our internal hadoop cluster
Was this patch authored or co-authored using generative AI tooling?
No