Skip to content

[SUPPORT]What is the functionality that hudi cleaner provides. #2299

@bithw1

Description

@bithw1

Hi,
I would like to know the real functionality that hudi cleaner does.

In my opinion, there may be two choices a cleaner could provide per user's business usage.

  1. Delete old commits and also the data, if a cleaner works in this way, then the historic data belonging to these commits will also be deleted. It could be useful if historic data is no use to end user's business and possibly speed up read/write since there are fewer commits/data there.

  2. Merge the old commits into a new commit, also merge the data belonging to the old commits into new commit(like Spark's RDD checkpoint to cut off the long lineage). If a cleaner works in this way, then end user could keep the historic data, and since there fewer commits there, incremental read between commits will be speed up.

I want to know how hudi cleaner works ,thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions