Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug][KafkaSource]Fix commit kafka offset bug. #3933

Merged
merged 1 commit into from
Jan 16, 2023

Conversation

lightzhao
Copy link
Contributor

After the checkpoint is completed, the offset will be submitted to kafka. The offset of the submitted partition should be KafkaSourceSplit:: getStartOffset. Otherwise, some offsets are not submitted successfully, resulting in repeated data consumption.
image

Purpose of this pull request

Check list

@lightzhao
Copy link
Contributor Author

@Hisoka-X @TyrantLucifer PTAL.

@lightzhao
Copy link
Contributor Author

kafka2hdfs test case:
HdfsSink:
image
KafkaSource:
image
image
image
After the checkpoint is completed, the offset is not submitted.

@Hisoka-X
Copy link
Member

Hi, Please check KafkaSourceReader::pollNext, the endOffset value will be increase by consumer meesage from kafka. Means the endOffset always the last message offset consume from kafka.
image
By the way, the offset not commit into kafka may be other reason. Because if we commit endOffset or startOffset the kafka both should commit.

@Hisoka-X
Copy link
Member

I will close this PR, if you have any other suggestion or problem, please reopen it. Thanks!

@lightzhao
Copy link
Contributor Author

By the way, the offset not commit into kafka may be other reason. Because if we commit endOffset or startOffset the kafka both should commit.

@Hisoka-X Offset starts from 0. When submitting, it should be +1. Otherwise, the lag is always missing 1. I have verified it in the test environment. You can also try it.

@lightzhao
Copy link
Contributor Author

@Hisoka-X hello, I don't seem to have permission to reopen this PR. Can you reopen it? My test is really a bug. You can test it here。

@Hisoka-X
Copy link
Member

By the way, the offset not commit into kafka may be other reason. Because if we commit endOffset or startOffset the kafka both should commit.

@Hisoka-X Offset starts from 0. When submitting, it should be +1. Otherwise, the lag is always missing 1. I have verified it in the test environment. You can also try it.

I got your point.

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed a bug, thank you for your unremitting efforts

@EricJoy2048 EricJoy2048 merged commit e60ad93 into apache:dev Jan 16, 2023
harveyyue pushed a commit to harveyyue/seatunnel that referenced this pull request Feb 7, 2023
Co-authored-by: zhaoliang01 <zhaoliang01@58.com>
@lightzhao lightzhao deleted the kafka-snapshotState-offset-bug branch March 2, 2023 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants