Skip to content

[SUPPORT] deltastreamer support migrate COW table to MOR  #8249

@waitingF

Description

@waitingF

Tips before filing an issue

  • Have you gone through our FAQs? yes

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org. sent email please help involve me in

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

My company is using deltastreamer to ingestion data from kafka to hdfs, the old hudi table type is COW. But the write latency of the COW table is much higher than MOR table. So we are going to migrate the COW tables to MOR.
According to the FAQ, we can change the existing COW table to MOR by just changing the hoodity.table.type property.
But there is one issue for continuing deltastreamer of MOR table on the existing path, the checkpoint from old COW table will lost, so there may be dataloss in such cases.

I find the cause of the checkpoint loss.
In the refreshTimeline method, when table is MOR only get checkpoint from deltacommits, that's why the checkpoint loss when migrating COW to MOR

this.commitTimelineOpt = Option.of(meta.getActiveTimeline().getDeltaCommitTimeline().filterCompletedInstants());

Simply put, I want the deltastreamer to support migrate hudi table directly by a parameter

Expected behavior

deltastreamer offer a parameter --migration-type to support migrate existing COW table to MOR,
the value of the param --migration-type can be:

  1. NONE (default no migration)
  2. COW_TO_MOR
  3. MOR_TO_COW (TODO)

I had raised a PR #8247, maybe someone can help review, thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:ingestIngestion into Hudipriority:mediumModerate impact; usability gapstype:featureNew features and enhancements

    Type

    No type

    Projects

    Status

    🏁 Triaged

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions