Skip to content

Fix the problem that incremental clean cannot be executed when the earliest ActiveTimeline is a pending commit. #16115

@hudi-bot

Description

@hudi-bot

When performing a clean, the earliest commit to be retained obtained by the getEarliestCommitToRetain method in CleanPlanner is used as the endpoint of the clean. However, when a pending commit takes a long time and all the commits earlier than the pending commit have been achieved, the pending commit becomes the earliest active timeline. In this situation, if getEarliestCommitToRetain is called, it will return empty because there is no earlier commit than the pending commit. During an incremental clean, the previous endpoint, which is the last commit retained in the previous clean, is used as the starting point. However, if this starting point is empty, a full clean will be triggered, which is very resource-intensive.

To solve this problem without affecting normal clean, I set the EarliestCommitToRetain obtained in this case to the earliest pending commit. Since the endpoint will not be cleaned in the current clean, this approach can solve the aforementioned problem without affecting normal clean.

JIRA info

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions