-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aborted exactly-once messages are still consumed #3020
Comments
Hi @edenhill,
I can see ABORT markers in the segment logs
|
We experience the same issue. Our producer and consumer are golang applications using version 1.5.0 of https://github.com/confluentinc/confluent-kafka-go. We also tried to consume messages with 1.5.2 version of go client but result was the same. Somethimes we can read aborted messages with out app, but kafka-console-consumer with param |
@mhowlett Can you try to reproduce this? |
Our case faces the same problem: consumption of uncommitted messages despite the consumer configuration in part of isolation level. In our svenario we use https://github.com/confluentinc/confluent-kafka-go v1.5.2. Our consumer configured to consume only committed messages: |
@gridaphobe and I were able to reproduce this issue. Consumers using To reproduce the issue via tests:
|
Yes, I can reproduce, and the problem doesn't occur in the uncompressed case. The code path for compressed message sets and uncompressed ones is quite different, and none of the tests for this enable compression. Will debug tomorrow. |
Description
I have a case where I use the JVM client (via Flink) to produce exactly-once transactionalized messages, but if the system is stopped and restarted, it aborts previously failed messages. The Kafka JVM consumer properly skips over these aborted messages, but librdkafka clients do not.
How to reproduce
I have put up a repository at https://github.com/cretz/kafkatxn-issue that has the steps in the README to reproduce. Considerable effort went in to making it as easy as possible to reproduce (I was unable to reproduce easily w/ just the Kafka JVM producer alone, so it does use Flink, but I packaged it all to be easy to run). For easy reading, here are the contents of that README:
To replicate...
Start local Zookeeper in Kafka dir:
Start local Kafka in Kafka dir:
Run system for 25 seconds which writes to first topic listed every second, then reads from it and writes to the other
one with exactly-once semantics (committing every 10s):
This will forcefully kill the Kafka client meaning the messages will be uncommitted. Now run it again for 25 seconds:
Now target-topic will have some aborted messages that should be skipped over. Run the Kafka consumer in Kafka dir with
read_committed
to only see the ones properly committed:Shows values like:
Now do the same w/ the built-in Go client that is also set to
read_committed
:Shows values like:
Notice how records between
2020-08-06T15:42:57.185Z
and2020-08-06T15:43:00.186Z
are duplicated. Can dump the Kafkalogs:
Full dump:
The abort marker is present for the keys, but I don't know enough about Kafka formatting to know why.
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
v1.4.2
(in the reproducer via confluent-kafka-go, but have confirmed issue persists with v1.5.0 and persists in otherlibrdkafka
clients)2.5.0
Ubuntu 18.04
Let me know if there is more information I can provide.
The text was updated successfully, but these errors were encountered: