[FLINK-31363] Do not checkpoint a KafkaCommittable if the transaction was empty #15

tzulitai · 2023-03-23T22:24:06Z

This PR fixes FLINK-31363 by changing how we handle empty transactions on KafkaWriter#prepareCommit().

Previously, regardless of whether the current transaction is empty or non-empty, we always emit a KafkaCommittable for it to be checkpointed by the CommitterOperator. The issue: on restore, when we resume the transaction and commit it, we recreate a FlinkKafkaInternalProducer that always has the internal transactionStarted flag set to true, which means that an EndTxnRequest will be sent to the brokers for committing the transaction. This results in an InvalidTxnState error since on the broker side the transaction hasn't actually been started yet (transactions are lazily started on brokers on the first record sent).

I've considered two possible ways to address this:

Store the transactionStarted flag in a KafkaCommittable alongside other txn metadata. Then, on restore, on the recreated producer, we set the internal transactionStarted accordingly to what the checkpoint says.
Never checkpoint a KafkaCommitable if the transaction is empty. In this case, any KafkaCommittable restored from a checkpoint always has some data in them, and therefore it is correct to always set the internal transactionStarted flag to true on the recreated producer.

This PR chooses to go with approach 2.

On prepareCommit, if the current ongoing transaction is empty, then:

Do NOT emit a KafkaCommittable (which will be checkpointed by the downstream CommitterOperator)
Immediately commit the empty transaction (this will be a no-op since the producer would not issue an EndTxnRequest at all)
Recycle the producer back into the idle pool for future reuse.

…ucer This new flag tracks whether or not data was actually written to the current transaction

…as written to the transaction

…nsaction Tests that if the KafkaWriter is asked to prepareCommit but its current transaction is empty, it should: - NOT emit a KafkaCommittable - Immediately commit the empty transaction, and - Recycle the producer

boring-cyborg · 2023-03-23T22:24:10Z

Thanks for opening this pull request! Please check out our contributing guidelines. (https://flink.apache.org/contributing/how-to-contribute.html)

tzulitai · 2023-03-23T22:25:14Z

cc @RamanVerma @Gerrrr for review

RamanVerma · 2023-03-27T21:57:32Z

...ka/src/test/java/org/apache/flink/connector/kafka/sink/FlinkKafkaInternalProducerITCase.java

+                    snapshottedCommittable.getProducerId(), snapshottedCommittable.getEpoch());
+
+            try {
+                resumedProducer.commitTransaction();


You can use assertThatThrownBy here

RamanVerma · 2023-03-28T21:09:18Z

...or-kafka/src/main/java/org/apache/flink/connector/kafka/sink/FlinkKafkaInternalProducer.java

+    @Override
+    public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {
+        if (inTransaction) {
+            hasRecordsInTransaction = true;


Should this boolean be set in the callback, in the successful send scenario?

hmm, that's a good point. I think the question to ask is: is it incorrect to set this flag (to allow a KafkaCommittable to be generated for the txn at pre-commit time) preemptively, instead of only setting it when data has actually been successfully written?

I think the answer is that it is not incorrect, so it is ok to leave this as is. Reasoning is as follows:

At pre-commit time and performing flush, if some data failed to be flushed, the pre-commit will fail so a KafkaCommittable will not be checkpointed for the txn anyways. In this scenario, the hasRecordsInTransaction flag is irrelevant no matter its value.

If all records are correctly flushed, then good; a KafkaCommittable should be generated for the txn. We're good here because we've alraedy preemptively set the hasRecordsInTransaction flag.

Speaking of which, it might give a more complete picture of this interaction after I rebase this PR branch on top of the latest changes (to include the fix that adds checkAsyncExceptions).

RamanVerma · 2023-03-28T21:28:47Z

...ka/src/test/java/org/apache/flink/connector/kafka/sink/FlinkKafkaInternalProducerITCase.java

+            resumedProducer.resumeTransaction(
+                    snapshottedCommittable.getProducerId(), snapshottedCommittable.getEpoch());
+
+            assertThatThrownBy(resumedProducer::commitTransaction);


nit: Can we check the exception type also using
isInstanceOf(<exception type>.class)

RamanVerma

thanks for the PR, @tzulitai
LGTM

tzulitai · 2023-03-29T20:55:50Z

thanks for the review @RamanVerma, merging this now!

…rnalProducer This closes #15.

…rnalProducer This closes apache#15.

tzulitai added 4 commits March 23, 2023 14:18

[FLINK-31363] Add hasDataInTransaction flag in FlinkKafkaInternalProd…

c93c59b

…ucer This new flag tracks whether or not data was actually written to the current transaction

[FLINK-31363] Only produce a KafkaCommittable on pre-commit if data w…

73a6612

…as written to the transaction

[FLINK-31363] Add tests for committing resumed transactions

b930efa

boring-cyborg bot added the component=Connectors/Kafka label Mar 23, 2023

RamanVerma reviewed Mar 27, 2023

View reviewed changes

[review] Use assertThatThrownBy

a7cdc55

RamanVerma reviewed Mar 28, 2023

View reviewed changes

RamanVerma approved these changes Mar 28, 2023

View reviewed changes

tzulitai closed this in 150eaf8 Mar 29, 2023

tzulitai added a commit that referenced this pull request Apr 4, 2023

[FLINK-31363] [kafka] Add hasDataInTransaction flag in FlinkKafkaInte…

1b1d799

…rnalProducer This closes #15.

tzulitai added a commit that referenced this pull request Apr 4, 2023

[FLINK-31363] [kafka] Add hasDataInTransaction flag in FlinkKafkaInte…

a5df3c7

…rnalProducer This closes #15.

mas-chen pushed a commit to mas-chen/flink-connector-kafka that referenced this pull request Apr 25, 2023

[FLINK-31363] [kafka] Add hasDataInTransaction flag in FlinkKafkaInte…

cb75d73

…rnalProducer This closes apache#15.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-31363] Do not checkpoint a KafkaCommittable if the transaction was empty #15

[FLINK-31363] Do not checkpoint a KafkaCommittable if the transaction was empty #15

tzulitai commented Mar 23, 2023 •

edited

boring-cyborg bot commented Mar 23, 2023

tzulitai commented Mar 23, 2023

RamanVerma Mar 27, 2023

RamanVerma Mar 28, 2023

tzulitai Mar 28, 2023 •

edited

tzulitai Mar 28, 2023

RamanVerma Mar 28, 2023

RamanVerma left a comment

tzulitai commented Mar 29, 2023

[FLINK-31363] Do not checkpoint a KafkaCommittable if the transaction was empty #15

[FLINK-31363] Do not checkpoint a KafkaCommittable if the transaction was empty #15

Conversation

tzulitai commented Mar 23, 2023 • edited

boring-cyborg bot commented Mar 23, 2023

tzulitai commented Mar 23, 2023

RamanVerma Mar 27, 2023

Choose a reason for hiding this comment

RamanVerma Mar 28, 2023

Choose a reason for hiding this comment

tzulitai Mar 28, 2023 • edited

Choose a reason for hiding this comment

tzulitai Mar 28, 2023

Choose a reason for hiding this comment

RamanVerma Mar 28, 2023

Choose a reason for hiding this comment

RamanVerma left a comment

Choose a reason for hiding this comment

tzulitai commented Mar 29, 2023

tzulitai commented Mar 23, 2023 •

edited

tzulitai Mar 28, 2023 •

edited