Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-16375: Fix for rejoin while reconciling #15579
KAFKA-16375: Fix for rejoin while reconciling #15579
Changes from 3 commits
11ba7f0
bd2619f
415092d
6faba89
40f3489
6bacaaa
278521d
f8d1bc9
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to actually fail the future here? We will repeat the exception with
Reconciliation failed
at ERROR level 5 lines below. It would be good to separate the "error" path from the "aborted, but that's fine" path.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uhm good point, I did consider it but wasn't fully convinced at that time (mostly taking into account that this abort path is not only in the case of rejoin while reconciling, but also in the case of fatal failures while reconciling, for instance). But I do like you point of view about the 2 paths, seeing from the POV that even in the failure scenario, the logging/error handling should be done on that fatal transition path, and the reconciliation itself could only be responsible for aborting quietly. Changed, take a look and let me know your thoughts (note that the tradeoff is that it requires another check before sending the ack, to make sure that the reconciliation hasn't been already aborted)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that is at most a
warn
log, possiblyinfo
, right? Because entering this code path is completely fine and expected, and should be handled correctly by the consumer.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree,
info
lgtm. Done. Just for the record, this could be the case of a fatal failure in HB while reconciling, but still, all the error handling/logging is the responsibility of the fatal error path, so seems sensible that from the reconciliation POV we end up with just the info saying that it was interrupted because the member transitioned out of the reconciling state into fatal state (which is what we would get in that case)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the code path where reconciliation is aborted after assignment is completed,
markReconciliationCompleted
will be called twice. Can we avoid it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to mark the stale reconciliation or the new reconciliation (or both) as completed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stale one. We only have a single
reconciliationInProgress
at a time, and that's the one that is aborted, so it will always be the stale one. To complete the story, we keep the new target to reconcile and on the next poll the new reconciliation will be triggered. All tests related to this case end up with the assertInitialReconciliationDiscardedAfterRejoin that validates that the new assignment received after rejoining is kept as target even after the stale reconciliation is discarded. I just extended it to show how the next poll triggers the new reconciliation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it possible to have a reconciliation with callbacks (going to the user space) which could return to the background thread when a new reconciliation triggered after the rejoin is ongoing? Do we have a unit test for this case too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We couldn't have that case of "new reconciliation triggered while another one executing callbacks" I expect. If there is a reconciliation that goes into the user space to run callbacks, the background still has the flag
reconciliationInProgress
true, so even in the case that the member gets a new assignment from the broker, all it does is to update thecurrentTargetAssignment
. On every poll, the attempt to trigger a new reconciliation for that new target received will be ignored (here) until the ongoing reconciliation completes. Makes sense?This test shows this path (using commit though), but basically showing one reconciliation triggered at a time, keeping target updated as it's received/discovered, and next reconciliation triggered when the initial one completes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks for the explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if we get rid of the exception (see comment above), we may want to just inline this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done