[HUDI-5859] Adding standalone restore tool#8044
[HUDI-5859] Adding standalone restore tool#8044nsivabalan wants to merge 5 commits intoapache:masterfrom
Conversation
| break; | ||
| } else { | ||
| // we need to collect only partial list of log files to be deleted | ||
| /*TableSchemaResolver tableSchemaResolver = new TableSchemaResolver(metaClient); |
There was a problem hiding this comment.
It is feasible to support restoring to any delta commit. For now, have not tested this part yet.
038d577 to
8742a53
Compare
8742a53 to
1025352
Compare
| * Clears hoodie.table.metadata.partitions in hoodie.properties | ||
| */ | ||
| private void clearMetadataTablePartitionsConfig(Option<MetadataPartitionType> partitionType, boolean clearAll) { | ||
| public static void clearMetadataTablePartitionsConfig(Option<MetadataPartitionType> partitionType, boolean clearAll, HoodieTableMetaClient metaClient) { |
There was a problem hiding this comment.
Let's avoid moving methods to static. This only makes the code harder to unit test. Also changing from private to public static seems like maybe we should move this functionality outside of the HoodieTable class, what do you think?
| private final HoodieTableMetaClient metaClient; | ||
|
|
||
| public MORRestoreTool(HoodieTableMetaClient metaClient) { | ||
| this.metaClient = metaClient; |
There was a problem hiding this comment.
This constructor isn't setting the other instance vars, can we set those to avoid NPEs? Are there some vars that only need to exist within the constructor?
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Can you add some assertions that the table can still be read and that the records retrieved match expectations?
There was a problem hiding this comment.
Are there any assertions on the metadata table that also need to be added here to make sure that has the correct end result/state?
| DATA_GENERATOR.close(); | ||
| } | ||
|
|
||
| @Test |
There was a problem hiding this comment.
Add a sanity check test that the "dry run" is taken into account
| HoodieSparkEngineContext engineContext = new HoodieSparkEngineContext(jsc); | ||
| FileSystem fs = metaClient.getFs(); | ||
| if (fs.exists(new Path(basePath + "/" + METADATA_TABLE_FOLDER_PATH))) { | ||
| if (!cfg.cleanupMetadata) { |
There was a problem hiding this comment.
Should there be a dryrun option for this?
| filesToDelete.add(Pair.of(pPath, logFile.getPath().toString())); | ||
| }); | ||
| LOG.info(fileSlice.getBaseInstantTime() + " Not processing remaining file slices"); | ||
| break; |
There was a problem hiding this comment.
what is the purpose of using break here and below?
|
Lets build this as a CLI enhancement? and also fr COW and MOR in general. |
Change Logs
For MOR Table, restoring to a very old delta commit is very time consuming. since internally, we do rollback of 1 commit at a time. This standalone tool takes a stab at improving the performance of restore. You can choose a delta commit just before a compaction commit, and this tool will directly delete files for newer file slices after the delta commit chosen.
this tool does not yet suport restoring to middle of file slice.
Restore timestamp has to be latest delta commit before any compaction commit.
For eg,
dc1,
dc2
c3,
dc4,
dc5,
c6,
dc7,
dc8,
c9,
dc10,
dc11
Valid commit times to restore w/ this tool:
dc2, or dc5, dc8.
In other words, this tool can only clean up entire file slices and hence.
After cleaning up the data files, this toll will also delete the corresponding commit meta files from ".hoodie".
Caution:
Metadata has to be disbaled.
And this tool takesn unconventional route of not going via rollback. This tool directly lists the files and deletes them and also deleted the timeline files if necessary.
Sample command
Impact
Describe any public API or user-facing feature change or any performance impact.
Risk level (write none, low medium or high below)
If medium or high, explain what verification was done to mitigate the risks.
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist