New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kafka: Play nice with transactional topics #5404
Comments
I have looked into this and found out that the issue is not really the transactional world (transactions are part of separate topics) but as pointed by @gianm the optional log compaction that can happen to the Kafka log as per docs Kafka Brokers can compact record with the same key, therefore Kafka log can have gaps. This option is on by default starting from Kafka 0.9.1 |
Docs can be updated if you are sure the problem in discussion is because of log compaction. I think some form of logging should be there indicating that offsets are being skipped, may be just count the number of skipped offsets and log it every X seconds and switch the current log to debug level if it logs too much. |
@b-slim are you sure the issue reported in the thread is because of compaction? I don't see a reason why the user's topic would have been compacted. Also he said that when he turned off transactionality then he stopped having issues. It seems more likely to me that Kafka is playing some games and not returning messages for every single offset (e.g. when those messages are part of failed transactions). That would make sense to me if the offset is assigned optimistically before the transaction commits. |
@gianm I thought Druid is using old consumer client that reads committed and uncommitted, so not sure why the consumer will not see a sequential order of offsets. Also according to my understanding commits are stored at Brokers and I don't think they get written on the log itself. The original claim says turned on exactly once am not sure what that means in practice I don't think there is such a flag. Anyway, this seems like the black swan... |
Hmm, I don't know enough about how Kafka transactions work to know what to do. Is anyone listening to the thread that has enough Kafka expertise to know for sure if we should be expecting offset gaps in transactional topics or not? Fwiw, we are using the new consumer API, kafka-clients version 0.10.2.0. |
Transaction is 0.11 https://issues.apache.org/jira/browse/KAFKA-4815. |
I added more detail to the Google groups topic. In particular, Kafka KIP-98 for transactions: Look at section 5.2 of that KIP "5.2 WriteTxnMarkerRequest" It's also mentioned in https://www.confluent.io/blog/transactions-apache-kafka/ (Search for "transaction commit markers"). |
Hi @ebolwidt, thanks for the research! Are you interested in contributing a test case (one that fails today) along with a patch to upgrade us to Kafka 0.11? If it turns out that the upgrade is all we need, that's great and it sounds like we should just do it. |
Interested but time-constrained. I'll look into it at home as work infrastructure doesn't allow me to run the docker-based integration testsuite. |
@ebolwidt @gianm @b-slim with transactional producers you can no longer expect data message offsets to be sequential. Control messages (used for transaction markers) take up their own offsets. Applications, however, never see those offsets, because they either get filtered out on the broker (when down-converting for older clients) or on the client (during transaction processing). Switching to >0.11 consumer and enabling It should be safe to use |
@xvrl thanks, good to know. IMO, it sounds good to me to trust Kafka more and remove our sanity check if there is no way to figure out which offsets are taken up by control messages. |
@jihoonson One thing we can check for though is to make sure that offsets haven't moved from where we expect the to be if we for whatever reason we have to re-create the consumer in the same consumer group (e.g. maybe on task restart). This would avoid losing data in situations where offsets were reset for some reason. |
Looking forward to #5674—we are having the same issue as in the original Google Groups topic, where indexing tasks get stuck when seeking to a transaction marker offset. |
* Support kafka transactional topics * update kafka to version 2.0.0 * Remove the skipOffsetGaps option since it's not used anymore * Adjust kafka consumer to use transactional semantics * Update tests * Remove unused import from test * Fix compilation * Invoke transaction api to fix a unit test * temporary modification of travis.yml for debugging * another attempt to get travis tasklogs * update kafka to 2.0.1 at all places * Remove druid-kafka-eight dependency from integration-tests, remove the kafka firehose test and deprecate kafka-eight classes * Add deprecated in docs for kafka-eight and kafka-simple extensions * Remove skipOffsetGaps and code changes for transaction support * Fix indentation * remove skipOffsetGaps from kinesis * Add transaction api to KafkaRecordSupplierTest * Fix indent * Fix test * update kafka version to 2.1.0
* Support kafka transactional topics * update kafka to version 2.0.0 * Remove the skipOffsetGaps option since it's not used anymore * Adjust kafka consumer to use transactional semantics * Update tests * Remove unused import from test * Fix compilation * Invoke transaction api to fix a unit test * temporary modification of travis.yml for debugging * another attempt to get travis tasklogs * update kafka to 2.0.1 at all places * Remove druid-kafka-eight dependency from integration-tests, remove the kafka firehose test and deprecate kafka-eight classes * Add deprecated in docs for kafka-eight and kafka-simple extensions * Remove skipOffsetGaps and code changes for transaction support * Fix indentation * remove skipOffsetGaps from kinesis * Add transaction api to KafkaRecordSupplierTest * Fix indent * Fix test * update kafka version to 2.1.0
* Support kafka transactional topics * update kafka to version 2.0.0 * Remove the skipOffsetGaps option since it's not used anymore * Adjust kafka consumer to use transactional semantics * Update tests * Remove unused import from test * Fix compilation * Invoke transaction api to fix a unit test * temporary modification of travis.yml for debugging * another attempt to get travis tasklogs * update kafka to 2.0.1 at all places * Remove druid-kafka-eight dependency from integration-tests, remove the kafka firehose test and deprecate kafka-eight classes * Add deprecated in docs for kafka-eight and kafka-simple extensions * Remove skipOffsetGaps and code changes for transaction support * Fix indentation * remove skipOffsetGaps from kinesis * Add transaction api to KafkaRecordSupplierTest * Fix indent * Fix test * update kafka version to 2.1.0
This issue has been marked as stale due to 280 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions. |
This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time. |
See https://groups.google.com/d/topic/druid-user/QNgVy7IDsgA/discussion for issues reported with transactional topics. It sounds like the "there will be no message gaps" assumption is not valid in the transactional world. So we should at least look at that, and also make sure that in general we are calling the consumer properly.
The message gap check was a nice sanity check on the Kafka consumer and I will be sad to see it go. I guess we'll have to be more trusting of Kafka now :)
The text was updated successfully, but these errors were encountered: