New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-31363] Do not checkpoint a KafkaCommittable if the transaction was empty #15
Conversation
…ucer This new flag tracks whether or not data was actually written to the current transaction
…as written to the transaction
…nsaction Tests that if the KafkaWriter is asked to prepareCommit but its current transaction is empty, it should: - NOT emit a KafkaCommittable - Immediately commit the empty transaction, and - Recycle the producer
Thanks for opening this pull request! Please check out our contributing guidelines. (https://flink.apache.org/contributing/how-to-contribute.html) |
cc @RamanVerma @Gerrrr for review |
snapshottedCommittable.getProducerId(), snapshottedCommittable.getEpoch()); | ||
|
||
try { | ||
resumedProducer.commitTransaction(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use assertThatThrownBy
here
@Override | ||
public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) { | ||
if (inTransaction) { | ||
hasRecordsInTransaction = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this boolean be set in the callback, in the successful send scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, that's a good point. I think the question to ask is: is it incorrect to set this flag (to allow a KafkaCommittable
to be generated for the txn at pre-commit time) preemptively, instead of only setting it when data has actually been successfully written?
I think the answer is that it is not incorrect, so it is ok to leave this as is. Reasoning is as follows:
-
At pre-commit time and performing flush, if some data failed to be flushed, the pre-commit will fail so a
KafkaCommittable
will not be checkpointed for the txn anyways. In this scenario, thehasRecordsInTransaction
flag is irrelevant no matter its value. -
If all records are correctly flushed, then good; a
KafkaCommittable
should be generated for the txn. We're good here because we've alraedy preemptively set thehasRecordsInTransaction
flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of which, it might give a more complete picture of this interaction after I rebase this PR branch on top of the latest changes (to include the fix that adds checkAsyncExceptions
).
resumedProducer.resumeTransaction( | ||
snapshottedCommittable.getProducerId(), snapshottedCommittable.getEpoch()); | ||
|
||
assertThatThrownBy(resumedProducer::commitTransaction); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we check the exception type also using
isInstanceOf(<exception type>.class)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR, @tzulitai
LGTM
thanks for the review @RamanVerma, merging this now! |
…rnalProducer This closes #15.
…rnalProducer This closes #15.
…rnalProducer This closes apache#15.
This PR fixes FLINK-31363 by changing how we handle empty transactions on
KafkaWriter#prepareCommit()
.Previously, regardless of whether the current transaction is empty or non-empty, we always emit a
KafkaCommittable
for it to be checkpointed by theCommitterOperator
. The issue: on restore, when we resume the transaction and commit it, we recreate aFlinkKafkaInternalProducer
that always has the internaltransactionStarted
flag set totrue
, which means that anEndTxnRequest
will be sent to the brokers for committing the transaction. This results in anInvalidTxnState
error since on the broker side the transaction hasn't actually been started yet (transactions are lazily started on brokers on the first record sent).I've considered two possible ways to address this:
transactionStarted
flag in aKafkaCommittable
alongside other txn metadata. Then, on restore, on the recreated producer, we set the internaltransactionStarted
accordingly to what the checkpoint says.KafkaCommitable
if the transaction is empty. In this case, anyKafkaCommittable
restored from a checkpoint always has some data in them, and therefore it is correct to always set the internaltransactionStarted
flag totrue
on the recreated producer.This PR chooses to go with approach 2.
On
prepareCommit
, if the current ongoing transaction is empty, then:KafkaCommittable
(which will be checkpointed by the downstreamCommitterOperator
)EndTxnRequest
at all)