Skip to content

[SUPPORT] Time Travel (querying the historical versions of data) ability for Hudi Table #2329

@HengQian-chaine

Description

@HengQian-chaine

Hi, all:
We plan to use Hudi to sync mysql binlog data. There will be a flink ETL task to consume binlog records from kafka and save data to hudi every one hour. The binlog records are also grouped every one hour and all records of one hour will be saved in one commit. The data transmission pipeline should be like -- binlog -> kafka -> flink -> parquet.

After the data is synced to hudi, we want to querying the historical hourly versions of the Hudi table in hive SQL.

Here is a more detailed description of our issue along with a simply design of Time Travel for Hudi, the design is under development and testing:
https://docs.google.com/document/d/1r0iwUsklw9aKSDMzZaiq43dy57cSJSAqT9KCvgjbtUo/edit?usp=sharing

We have to support Time Travel ability recently for our business needs. We also have seen the RFC 07.
Be glad to receive any suggestion or dicussion.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions