Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Hudi spill path not getting cleared when containers getting killed abruptly. #4771

Closed
Prasheel3001 opened this issue Feb 9, 2022 · 1 comment

Comments

@Prasheel3001
Copy link

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced
When yarn kills the containers abruptly for any reason while hudi stage is in progress then the spill path created by hudi on the disk is not cleaned and as a result of which the nodes on the cluster start running out of space. We need to clear the spill path manually to free out disk space.

To Reproduce

Steps to reproduce the behavior:
Problem would be reproducible if hudi stage fails with containers getting killed abruptly.

Expected behavior

Hudi should clear the spill path even if the containers are killed abruptly by yarn for any reason.

Environment Description

  • Hudi version : 0.7.0

  • Spark version : 2.4.6

  • Hive version :

  • Hadoop version :

  • Storage (HDFS/S3/GCS..) : HDFS

  • Running on Docker? (yes/no) : no

@nsivabalan
Copy link
Contributor

thanks for reporting. have filed a tracking jira. Let us know if you are interested to work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants