Hi, all:
We plan to use Hudi to sync mysql binlog data. There will be a flink ETL task to consume binlog records from kafka and save data to hudi every one hour. The binlog records are also grouped every one hour and all records of one hour will be saved in one commit. The data transmission pipeline should be like – binlog -> kafka -> flink -> parquet.
After the data is synced to hudi, we want to querying the historical hourly versions of the Hudi table in hive SQL.
Here is a more detailed description of our issue along with a simply design of Time Travel for Hudi, the design is under development and testing:
[https://docs.google.com/document/d/1r0iwUsklw9aKSDMzZaiq43dy57cSJSAqT9KCvgjbtUo/edit?usp=sharing]
We have to support Time Travel ability recently for our business needs. We also have seen the [RFC 07|https://cwiki.apache.org/confluence/display/HUDI/RFC+-+07+%3A+Point+in+time+Time-Travel+queries+on+Hudi+table].
Be glad to receive any suggestion or dicussion.
JIRA info
Comments
14/Dec/20 16:07;xleesf;[~qian heng] sorry would not access the google doc you provided, and it would be better if you would send a discuss email to dev ML. ;;;
14/Dec/20 18:50;nishith29;[~qian heng] Like [~xleesf] pointed, even I was unable to access the google doc. Could you please start a discuss thread on the dev mailing list ? This will help you get feedback from other members as well. Based on that, we can see if this needs a separate RFC or we can make changes to RFC-07;;;
15/Dec/20 00:02;vinoth;+1 if we can keep discussions to the mailing list and then onto the cWIki, that would be great.
Happy to provide any access/permissions as needed. ;;;
15/Dec/20 06:20;qian heng;The doc is already available, sorry for the mistake;;;
12/Mar/22 14:12;xushiyan;[~x1q1j1] can you please go through the description and design doc to see if any further work needed?;;;
13/Mar/22 05:39;x1q1j1;hi [~qian heng] 1. SparkSQL already supports time travel to query Hudi table HUDI-3221
2. Hive SQL needs to add syntax support to hive source code.(This priority will be implemented later than presto)
- Presto/Trino SQL implemented time travel to query Hudi table. (will be next);;;
Hi, all:
We plan to use Hudi to sync mysql binlog data. There will be a flink ETL task to consume binlog records from kafka and save data to hudi every one hour. The binlog records are also grouped every one hour and all records of one hour will be saved in one commit. The data transmission pipeline should be like – binlog -> kafka -> flink -> parquet.
After the data is synced to hudi, we want to querying the historical hourly versions of the Hudi table in hive SQL.
Here is a more detailed description of our issue along with a simply design of Time Travel for Hudi, the design is under development and testing:
[https://docs.google.com/document/d/1r0iwUsklw9aKSDMzZaiq43dy57cSJSAqT9KCvgjbtUo/edit?usp=sharing]
We have to support Time Travel ability recently for our business needs. We also have seen the [RFC 07|https://cwiki.apache.org/confluence/display/HUDI/RFC+-+07+%3A+Point+in+time+Time-Travel+queries+on+Hudi+table].
Be glad to receive any suggestion or dicussion.
JIRA info
Comments
14/Dec/20 16:07;xleesf;[~qian heng] sorry would not access the google doc you provided, and it would be better if you would send a discuss email to dev ML. ;;;
14/Dec/20 18:50;nishith29;[~qian heng] Like [~xleesf] pointed, even I was unable to access the google doc. Could you please start a discuss thread on the dev mailing list ? This will help you get feedback from other members as well. Based on that, we can see if this needs a separate RFC or we can make changes to RFC-07;;;
15/Dec/20 00:02;vinoth;+1 if we can keep discussions to the mailing list and then onto the cWIki, that would be great.
Happy to provide any access/permissions as needed. ;;;
15/Dec/20 06:20;qian heng;The doc is already available, sorry for the mistake;;;
12/Mar/22 14:12;xushiyan;[~x1q1j1] can you please go through the description and design doc to see if any further work needed?;;;
13/Mar/22 05:39;x1q1j1;hi [~qian heng] 1. SparkSQL already supports time travel to query Hudi table HUDI-3221
2. Hive SQL needs to add syntax support to hive source code.(This priority will be implemented later than presto)