Skip to content

[SUPPORT] A flow task batch task cannot write to the same table at the same time(flink sql) #7338

@zhaizhenhua

Description

@zhaizhenhua

Tips before filing an issue

  • Have you gone through our FAQs?
    yes

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

Now we use Flink sql+hudi+presto as the solution of streaming and batching. Hudi is used for storage, Flink sql is used for processing, and presto is used as the query engine. Considering the data volume, stream processing only deals with the data of the current day, but the history will change. At this time, we need to repair the historical data by batch processing in the early morning every day. When we tested and verified according to this expectation, we found that when stream processing is running, batch processing can no longer be run (at this time, stream processing will report an error). We cannot directly load data only through stream processing. The status of historical data will change (order status). Therefore, batch processing is required for regular data repair. The final expectation of this solution is the same processing engine, the same code and the same target table. It is not so much a problem as a solution. At present, there are few online introductions in this regard. No suitable method was found. So ask for help here.

To Reproduce

Steps to reproduce the behavior:
1.Start a stream processing task, write data to partition table A, and only write today's data.

2.Use the same finksql code to start in batch mode and write data to partition table A. Reload data for the past 30 days.

3.After the batch task is submitted successfully, when it starts to write data to the hudi table, the stream processing task starts to report an error and restarts continuously. The error message is as follows:
Exception:Ignoring the request to fail job because the job is already failing. The ignored failure cause is
org.apache.hudi.exception.HoodieException: Executor executes action [handle write metadata event for instant 20221125180423588] error

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version :
    0.13

  • Spark version :

  • Hive version :
    2.1.1

  • Hadoop version :
    3.0.0

  • Storage (HDFS/S3/GCS..) :
    HDFS

  • Running on Docker? (yes/no) :
    NO

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

Exception:Ignoring the request to fail job because the job is already failing. The ignored failure cause is
org.apache.hudi.exception.HoodieException: Executor executes action [handle write metadata event for instant 20221125180423588] error

Metadata

Metadata

Assignees

No one assigned

    Labels

    engine:flinkFlink integrationpriority:highSignificant impact; potential bugs

    Type

    No type

    Projects

    Status

    🏁 Triaged

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions