Skip to content

[SUPPORT]丢数了!flink cdc to hudi cow #6375

@sunmingqiaa

Description

@sunmingqiaa

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version :0.11.1

  • Spark version :

  • Hive version :2.11

  • Hadoop version :3.0

  • Storage (HDFS/S3/GCS..) :hdfs

  • Running on Docker? (yes/no) :no

Additional context

Add any other context about the problem here.
'connector'='hudi',\n" +
'path'= %s, \n" +
'table.type' = 'COPY_ON_WRITE',\n" +
'hoodie.datasource.write.recordkey.field'= %s,\n"
'write.tasks'= %s,\n" +
'write.task.max.size'=%s,\n" +
'write.merge.max_memory'=%s,\n" +
'hoodie.cleaner.commits.retained'=%s,\n" +
'hive_sync.enable'= 'true',\n" +
'hive_sync.mode'= 'hms',\n" +
'hive_sync.metastore.uris'= %s,\n" +
'hive_sync.jdbc_url'= %s,\n" +
'hive_sync.table'= %s,\n" +
'hive_sync.db'= %s,\n" +
'hive_sync.username'= %s,\n" +
'hive_sync.password'= %s,\n" +
'write.rate.limit'= %s,\n" +
'hive_sync.support_timestamp'= 'false'\n" +

试过导入100万数据和1000万数据,最近几次导完数都会发现少一部分数据,重复试过几次,每次导完查询结果都不一样,丢的数量不一定,hive和flilnksql查的结果一样,flink程序无异常。但是前几天测的时候没有发现丢数,两次区别在于,本次设置了write.rate.limit=1000。
Stacktrace
image
image

Add the stacktrace of the error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions