Task Description
What needs to be done:
Add a new policy for hoodie.clean.failed.writes.policy`
LAZY_WITH_PREWRITE : Same as LAZY, except that rollbackFailedWrites will also be attempted in startCommit before creating the new instant
Add a new clean config that, if enabled, attempts a clean in startCommit before creating the new instant
Why this task is needed:
- We have had cases where write ingestion jobs repeatedly fail. Because we use LAZY policy, files from these failed writes were not implicitly rolled back between these ingestion jobs. This leads to a buildup of files in DFS partition directory and increase unbounded. And can cause them to hit file quota limits. To mitigate we had to manually run clean jobs or temporarily set policy to EAGER
- We want to handle potential future cases where write jobs make keep on repeatedly failing after completing the commit but before attempting post-commit clean, which can result in the same impact as above. For example, if there are infra issues like spark jobs failing/timing out when the HUDI write reaches the post-commit phase.
We have added this functionality to our internal 0.x HUDI build to address these issues, we can upstream once we achieve consensus.
Task Type
Code improvement/refactoring
Related Issues
Parent feature issue: (if applicable )
Related issues:
NOTE: Use Relationships button to add parent/blocking issues after issue is created.