-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Hi,
I would like to know the real functionality that hudi cleaner does.
In my opinion, there may be two choices a cleaner could provide per user's business usage.
-
Delete old commits and also the data, if a cleaner works in this way, then the historic data belonging to these commits will also be deleted. It could be useful if historic data is no use to end user's business and possibly speed up read/write since there are fewer commits/data there.
-
Merge the old commits into a new commit, also merge the data belonging to the old commits into new commit(like Spark's RDD checkpoint to cut off the long lineage). If a cleaner works in this way, then end user could keep the historic data, and since there fewer commits there, incremental read between commits will be speed up.
I want to know how hudi cleaner works ,thanks.