New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30069][CORE][YARN] Clean up non-shuffle disk block manager files following executor exists on YARN #26711
Conversation
…es following executor exists on YARN
Test build #114626 has finished for PR 26711 at commit
|
retest this please |
Test build #114690 has finished for PR 26711 at commit
|
This is actually writing temp shuffle files into YARN container local dirs instead of local dirs? Could there be any performance difference after this PR? (Sorry I'm not familiar with YARN) |
Yes. I think only this part could have performance difference in YARN mode:
|
Ah, But I think it's not a problem. |
gentle ping @cloud-fan @jiangxb1987 @dongjoon-hyun @jerryshao |
retest this please |
Test build #119692 has finished for PR 26711 at commit
|
perhaps you can expand on the description as to the approach you are taking? How do you know the temp_ files aren't being used? |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
@LantaoJin Has this problem been solved? We are facing the same problem now |
@tianshuang This problem should be resolved by this PR. But it was closed by githut bot. So I think it still exists in 3.0. |
After thrift-server runs for a long time, you will encounter an error of 'no space left on device'. I hope to reopen this issue to completely solve the problem. |
It cannot reopen now. Seems I have to re-create a new PR. |
@tianshuang recreated as #29378 |
What changes were proposed in this pull request?
Currently we only clean up the local directories on application removed. However, when executors die and restart repeatedly, many temp files are left untouched in the local directories, which is undesired behavior and could cause disk space used up gradually. Especially, in long running service like Spark thrift-server with dynamic resource allocation disabled, it's very easy causes local disk full.
#21390 fixed the same problem on Standalone mode. On YARN, this issue still exists.
From #21390 (comment), YARN only cleans container local dirs when container (executor) is exited. But these files are not in container local dirs.
So this patch is very straightforward:
We create these "temp_xxx " files under the container dirs when the executor is running in YARN container.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Add an UT and manually test.