Skip to content

ADS layer malfunctioned. After restarting the job, the data statistics were incorrect #11017

@jack1234smith

Description

@jack1234smith

Describe the problem you faced

Based on Flink and Hudi's real-time data warehouse, the ADS layer job has malfunctioned, while the ODS and DWD layers have normal job. Restart the ADS layer job one day after the failure. At this point, there are two types of data anomalies. One is that the new data overwrites the old data. For example, the newly calculated March monthly report data will overwrite the old March monthly report; The second type is that new tasks only retrieve data from the most recent commit and do not count the entire amount of data, resulting in new monthly report data being less than the actual value.
How did everyone solve it?

Environment Description

Hudi version : 0.14.1
Flink version : 1.17
Storage (HDFS/S3/GCS..) : HDFS
Running on Yarn? (yes/no) : yes

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    ⏳ Awaiting Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions