-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5954] Infer cleaning policy based on clean configs #8238
Conversation
.defaultValue(HoodieCleaningPolicy.KEEP_LATEST_COMMITS.name()) | ||
.withInferFunction(cfg -> { | ||
boolean isCommitsRetainedConfigured = cfg.contains(CLEANER_COMMITS_RETAINED_KEY); | ||
boolean isHoursRetainedConfigured = cfg.contains(CLEANER_HOURS_RETAINED_KEY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm so confused by these options, does the option hoodie.cleaner.policy
make any sense here? If all the specific cleaning param: hoodie.cleaner.commits.retained
, hoodie.cleaner.hours.retained
, hoodie.cleaner.fileversions.retained
all have detemistic policy, then this option should be eliminated.
For example, can we use a combination like HoodieCleaningPolicy.KEEP_LATEST_COMMITS
policy and hoodie.cleaner.fileversions.retained
, if not, introduce the redundant option key hoodie.cleaner.policy
is totally unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, I marked it as deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm so confused by these options, does the option hoodie.cleaner.policy make any sense here? If all the specific cleaning param: hoodie.cleaner.commits.retained, hoodie.cleaner.hours.retained, hoodie.cleaner.fileversions.retained all have detemistic policy, then this option should be eliminated.
For example, can we use a combination like HoodieCleaningPolicy.KEEP_LATEST_COMMITS policy and hoodie.cleaner.fileversions.retained, if not, introduce the redundant option key hoodie.cleaner.policy is totally unnecessary.
@danny0405 what you mentioned totally makes sense. The reason I keep |
Then let's mark this option |
Makes sense. Fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
This commit adds the logic of inferring the cleaning policy ("hoodie.cleaner.policy") based on clean configs. By default, the cleaning policy is determined based on one of the following configs explicitly set by the user (at most one of them can be set; otherwise, KEEP_LATEST_COMMITS cleaning policy is used): - "hoodie.cleaner.commits.retained": the KEEP_LATEST_COMMITS cleaning policy is used; - "hoodie.cleaner.hours.retained": the KEEP_LATEST_BY_HOURS cleaning policy is used; - "hoodie.cleaner.fileversions.retained": the KEEP_LATEST_FILE_VERSIONS cleaning policy is used. Now setting only one of the configs above automatically switches the cleaning policy. Setting "hoodie.cleaner.policy" is deprecated.
Change Logs
This PR adds the logic of inferring the cleaning policy based on clean configs. By default, the cleaning policy is determined based on one of the following configs explicitly set by the user (at most one of them can be set; otherwise, KEEP_LATEST_COMMITS cleaning policy is used):
Now setting only one of the configs above automatically switches the cleaning policy. Setting
hoodie.cleaner.policy
is deprecated.Impact
A user does not need to explicitly set the cleaning policy alongside the one of following configs: "hoodie.cleaner.commits.retained", "hoodie.cleaner.hours.retained", or "hoodie.cleaner.fileversions.retained".
Risk level
none
Documentation Update
Docs update: HUDI-595
Contributor's checklist