Skip to content

Support Rollback for Dropped Parititions  #15714

@hudi-bot

Description

@hudi-bot

Currently, when a user tries to drop a partition using spark sql [https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-alter-table.html#drop-partition] , and then perform a rollback on this dropped partition, they do not see this partition present when running  SHOW PARTITIONS command. The reason is that as part of drop partition operation, Hudi also deletes the partition from table metadata. However, rolling it back does not add the partition back to Hudi table metadata. Hence, SHOW PARTITIONS does not return the rolled back partition.

 

As part of drop partition command, Hudi will schedule a clean operation of this partition data treating this a HARD delete. However, it is possible that user rollsback the drop partition commit by the time the cleaner is run (or may be user turns off the cleaner). In such scenarios, even though the data is rolled back, the partition still does not appear in the table metadata leaving the Hudi table in a corrupt state.

 

We think we can enhance this functionality to support rollback for drop partitions. If we decide against it, then we should disallow rolling back of commits that drop partition so users don't end up in this state.

 

JIRA info

Metadata

Metadata

Assignees

No one assigned

    Labels

    from-jirapriority:highSignificant impact; potential bugstype:devtaskDevelopment tasks and maintenance work

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions