Skip to content

Initial Kafka Global Offsets in Hudi Kafka Sink Connector #15328

@hudi-bot

Description

@hudi-bot

Hi team,
I am trying to run Hudi Sink Connector with Kafka Connect. When the connectors starts, it starts the Transaction coordinator which initialises the global committed  offsets from the Hudi commit file. When its a first time run, there is no commit file and hence it outputs
[2022-08-08 19:58:20,529] INFO Hoodie Extra Metadata from latest commit is absent (org.apache.hudi.connect.writers.KafkaConnectTransactionServices:147)
But say in first time, the earliest kafka offset is not 0, then the process keeps on running the commit timelines. Ideally, the global offsets, at first run, should be set to the earliest kafka offset.
As per the current implementation, the participant checks the local offset with coordinator offset and when its a mismatch, it sets to 0. But this breaks, when its a fresh run and the global kafka commited offset is not 0

JIRA info

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions