New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-5429: Ignore produce response if batch was previously aborted #3300
Conversation
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
private boolean retry; | ||
|
||
private enum FinalState { ABORTED, FAILED, SUCCEEDED }; | ||
private AtomicReference<FinalState> finalState = new AtomicReference<>(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
missing.
if (!this.finalState.compareAndSet(null, finalState)) { | ||
if (this.finalState.get() == FinalState.ABORTED) { | ||
log.debug("ProduceResponse returned for {} after batch had already been aborted.", topicPartition); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it's expected that we may call done
(successful or failed) after aborting and we should just ignore the response, in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, not much else we can do since we've already signaled the future and invoked callbacks. I think it would be justifiable to elevate the log level to info if that helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just elevate the log level if there's user visible impact. Supposedly, this happens with transactions and the produced data won't be committed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. Another case might be on producer shutdown, but that might be much harder to hit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log level is a tricky one. In the producer, messages for errors are all at debug level (for instance when we transition to error state in the transaction manager). So having this higher than debug may not add much value.
@@ -366,7 +384,7 @@ public void close() { | |||
} | |||
} | |||
|
|||
public void abort() { | |||
public void abortRecordAppends() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth adding a comment explaining how abort
and abortRecordAppends
differ in one of the methods and add a reference in the other one. It seems like the reason we have two separate methods is that we do one of them with the lock held and the other with no lock held (as it invokes callbacks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
if (!this.finalState.compareAndSet(null, finalState)) { | ||
if (this.finalState.get() == FinalState.ABORTED) { | ||
log.debug("ProduceResponse returned for {} after batch had already been aborted.", topicPartition); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log level is a tricky one. In the producer, messages for errors are all at debug level (for instance when we transition to error state in the transaction manager). So having this higher than debug may not add much value.
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, a few comments/questions.
private boolean retry; | ||
|
||
private enum FinalState { ABORTED, FAILED, SUCCEEDED }; | ||
private final AtomicReference<FinalState> finalState = new AtomicReference<>(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should this be with other final fields?
* @param exception The exception to use to complete the future and awaiting callbacks. | ||
*/ | ||
public void abort(RuntimeException exception) { | ||
abortRecordAppends(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe? It seems like we should document abort
and done
as thread-safe and hence we should not invoke abortRecordAppends
, which is not thread-safe. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. The code seemed weird without it, but it's not actually needed.
|
||
KafkaException exception = new KafkaException(); | ||
batch.abort(exception); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check future.isDone
here too?
@@ -55,6 +59,67 @@ public void testChecksumNullForMagicV2() { | |||
} | |||
|
|||
@Test | |||
public void testBatchAbort() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably need one like this but where we call done
with an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, LGTM.
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Merging to trunk and 0.11.0. |
Author: Jason Gustafson <jason@confluent.io> Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk> Closes #3300 from hachikuji/KAFKA-5429 (cherry picked from commit 6c92fc5) Signed-off-by: Jason Gustafson <jason@confluent.io>
No description provided.