New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-15202: Fix MM2 offset translation when syncs are variably spaced #14156
KAFKA-15202: Fix MM2 offset translation when syncs are variably spaced #14156
Conversation
Signed-off-by: Greg Harris <greg.harris@aiven.io>
hey @gharris1727 , can you please guide me how can I start contributing to this project as I am new to open source contribution but I have a strong knowledge of Java. Thank you. |
Hi @monish-byte ! Thanks for considering contributing to Apache Kafka. You can find the contributing guide here: https://kafka.apache.org/contributing.html I look forward to seeing you on the mailing list! |
hey @gharris1727 I have joined the mailing list. I am eager to know what should I do next. |
@monish-byte This specific PR thread isn't the right place to discuss this, so please in the future post your general questions on the dev mailing list. Your next steps should be:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Greg, the description in the PR and Jira ticket, along with comments in the code changes, all really helped to make this easier to review. I've left some minor suggestions but none that are blocker-worthy; LGTM!
@@ -227,7 +227,7 @@ private void clearSyncArray(OffsetSync[] syncs, OffsetSync offsetSync) { | |||
} | |||
} | |||
|
|||
private void updateSyncArray(OffsetSync[] syncs, OffsetSync offsetSync) { | |||
private void updateSyncArray(OffsetSync[] syncs, OffsetSync[] original, OffsetSync offsetSync) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: feels a little strange that we're passing in a to-be-mutated sync array. Any reason not to alter updateSyncArray
to only take in the original sync array and the new sync, and construct and return the new sync array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is already the signature of the caller, so this would amount to inlining this function into updateExistingSyncs
to combine the two.
I think that this function is already large enough, and doesn't also need to be concerned with copying the array or logging the result. I also think it makes batching updates simpler: The caller can manage the lifetimes of the arrays and their mutability, while the inner function can be concerned with just applying the updates.
Also this code was doing that already, here and in the clearSyncArray
function.
connect/mirror/src/test/java/org/apache/kafka/connect/mirror/OffsetSyncStoreTest.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Greg Harris <greg.harris@aiven.io>
Signed-off-by: Greg Harris <greg.harris@aiven.io>
@gharris1727 Thanks for your work on this! Hopefully it will help KAFKA-15197 too! |
I saw one of the CI builds with testOffsetTranslationBehindReplicationFlow failing, so I think there is still another source of flakiness somewhere, probably #13911. I'll close the fix JIRA but leave the flaky JIRA open. |
The latest CI run has some unrelated flakiness, and tests pass locally. |
#14156) Reviewers: Chris Egerton <chrise@aiven.io>
#14156) Reviewers: Chris Egerton <chrise@aiven.io>
#14156) Reviewers: Chris Egerton <chrise@aiven.io>
apache#14156) Reviewers: Chris Egerton <chrise@aiven.io>
apache#14156) Reviewers: Chris Egerton <chrise@aiven.io>
apache#14156) Reviewers: Chris Egerton <chrise@aiven.io>
The new OffsetSyncStore historical translation cache clears more syncs than necessary when the gap between syncs is variable. This is a problem with the replacement promotion logic, which only covered the base case when promoting an one index to the immediately following index.
This has the effect that if a sync fails to satisfy an invariant for the following index, then it is discarded immediately, even if the value would satisfy the invariants at a different index. In particular, invariant B which enforces a maximum distance between two syncs gets more permissive as the index in the array increases, so a sync which does not satisfy invariant B at index 1 may satisfy it at index j > 1.
Instead of the greedy discarding algorithm, the replacement promotion logic should keep a separate index into the potential replacements from the original array, and delay discarding a sync until it can be determined that the sync is not valid at any place in the array. In particular, syncs are certainly not worth keeping if they are duplicates of other syncs, or if they fail invariant C. Invariant C becomes more strict as the index in the array increases, so a sync which does not satisfy invariant C at index 1 will certainly never satisfy it at index j > 1.
In order to verify the changes, new tests which use Random to generate gaps between syncs, and generalize the test for maintaining unique syncs to an arbitrary stream of gaps. The different tests cover:
The new algorithm is an extension of the existing one, so all of the consistent-spacing tests have the exact same behavior.
Committer Checklist (excluded from commit message)