Describe the problem you faced
Based on Flink and Hudi's real-time data warehouse, the ADS layer job has malfunctioned, while the ODS and DWD layers have normal job. Restart the ADS layer job one day after the failure. At this point, there are two types of data anomalies. One is that the new data overwrites the old data. For example, the newly calculated March monthly report data will overwrite the old March monthly report; The second type is that new tasks only retrieve data from the most recent commit and do not count the entire amount of data, resulting in new monthly report data being less than the actual value.
How did everyone solve it?
Environment Description
Hudi version : 0.14.1
Flink version : 1.17
Storage (HDFS/S3/GCS..) : HDFS
Running on Yarn? (yes/no) : yes
Describe the problem you faced
Based on Flink and Hudi's real-time data warehouse, the ADS layer job has malfunctioned, while the ODS and DWD layers have normal job. Restart the ADS layer job one day after the failure. At this point, there are two types of data anomalies. One is that the new data overwrites the old data. For example, the newly calculated March monthly report data will overwrite the old March monthly report; The second type is that new tasks only retrieve data from the most recent commit and do not count the entire amount of data, resulting in new monthly report data being less than the actual value.
How did everyone solve it?
Environment Description
Hudi version : 0.14.1
Flink version : 1.17
Storage (HDFS/S3/GCS..) : HDFS
Running on Yarn? (yes/no) : yes