-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Tips before filing an issue
-
Have you gone through our FAQs? yes
-
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org. sent email please help involve me in
-
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
My company is using deltastreamer to ingestion data from kafka to hdfs, the old hudi table type is COW. But the write latency of the COW table is much higher than MOR table. So we are going to migrate the COW tables to MOR.
According to the FAQ, we can change the existing COW table to MOR by just changing the hoodity.table.type property.
But there is one issue for continuing deltastreamer of MOR table on the existing path, the checkpoint from old COW table will lost, so there may be dataloss in such cases.
I find the cause of the checkpoint loss.
In the refreshTimeline method, when table is MOR only get checkpoint from deltacommits, that's why the checkpoint loss when migrating COW to MOR
hudi/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
Line 325 in ce21873
| this.commitTimelineOpt = Option.of(meta.getActiveTimeline().getDeltaCommitTimeline().filterCompletedInstants()); |
Simply put, I want the deltastreamer to support migrate hudi table directly by a parameter
Expected behavior
deltastreamer offer a parameter --migration-type to support migrate existing COW table to MOR,
the value of the param --migration-type can be:
- NONE (default no migration)
- COW_TO_MOR
- MOR_TO_COW (TODO)
I had raised a PR #8247, maybe someone can help review, thanks
Metadata
Metadata
Assignees
Labels
Type
Projects
Status