[HUDI-6675] Fix Clean action will delete the whole table#9413
[HUDI-6675] Fix Clean action will delete the whole table#9413danny0405 merged 3 commits intoapache:masterfrom
Conversation
| try { | ||
| deleteFileAndGetResult(table.getMetaClient().getFs(), table.getMetaClient().getBasePath() + "/" + entry); | ||
| if (!StringUtils.isNullOrEmpty(entry)) { | ||
| deleteFileAndGetResult(table.getMetaClient().getFs(), table.getMetaClient().getBasePath() + "/" + entry); |
There was a problem hiding this comment.
Kind of think the cleanerPlan.getPartitionsToBeDeleted() should be fixed, can we write a test case for it.
There was a problem hiding this comment.
yes. we do have a property in TableConfig to check if the table is partitioned or non-partitioned. We can consult w/ it and empty out partitionsToBeDeleted if its unpartitioned.
There was a problem hiding this comment.
Ideally, the cleanerPlan.getPartitionsToBeDeleted() should be empty for non-partitioned table and I added a test to verify whether the whole table is deleted or not. But, maybe, in an unlikely scenario, when table is partitioned but with empty partition path, this could happen.
There was a problem hiding this comment.
in an unlikely scenario, when table is partitioned but with empty partition path, this could happen.
It actually happened inside my company🥲, and all data are deleted, but have no clue how it happened...
There was a problem hiding this comment.
Hi @leosanqing
May I ask if you have encountered this issue? A writer generated data files incorrectly in table root path, but the table is a partitioned table.
E.g.
hdfs://..../hudi_table_folder/partition1/...parquet
hdfs://..../hudi_table_folder/partition2/...parquet
hdfs://..../hudi_table_folder/...parquet <- this file should not be here.
There was a problem hiding this comment.
Hi @leosanqing
May I ask if you have encountered this issue? A writer generated data files incorrectly in table root path, but the table is a partitioned table.
E.g.
hdfs://..../hudi_table_folder/partition1/...parquet hdfs://..../hudi_table_folder/partition2/...parquet hdfs://..../hudi_table_folder/...parquet <- this file should not be here.
No, I haven't.
There was a problem hiding this comment.
Got it. Thanks for your reply.
|
In my company, I also encountered a situation where the entire table directory was deleted |
|
@wqlsdb , would you mind to cherry-pick this fix into your local repo? |
Hi @wqlsdb would you like to discuss it offline or email? We encountered this issue multiple times internally, and we are trying to find the root cause. Think it could be helpful if we can sync some common information. |
|
@danny0405 yes.Our company plans to upgrade to the latest version |
|
@TengHuo ok my email :wqlxh7891@163.com |
Change Logs
Clean action.
Avoid to clean the whole table data on hdfs .
Impact
Clean action.
When it is a non-partition table, Clean action will input the
""as partition path , this will lead to Cleaner to clean the whole table data on the hdfs . path format istable_basepath + "/" + ""so, when path is
"", it means this clean action should not delete dir, just skip it;Risk level (write none, low medium or high below)
low
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist