Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-13452: MM2 shouldn't checkpoint when offset mapping is unavailable #11492

Conversation

viktorsomogyi
Copy link
Contributor

MM2 checkpointing reads the offset-syncs topic to create offset mappings for committed consumer group offsets. In some corner cases, it is possible that a mapping is not available in offset-syncs - in that case, MM2 simply copies the source offset, which might not be a valid offset in the replica topic at all. This can cause issues when auto offset sync is also turned on.
Updated checkpointing logic to not create checkpoints for topic partitions without a valid offset mapping.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@viktorsomogyi viktorsomogyi marked this pull request as draft November 12, 2021 20:52
@viktorsomogyi viktorsomogyi force-pushed the KAFKA-13452-mm2-checkpoint-issue branch 2 times, most recently from 3684020 to 9d0df27 Compare February 10, 2022 12:51
MM2 checkpointing reads the offset-syncs topic to create offset mappings
for committed consumer group offsets. In some corner cases, it is possible
that a mapping is not available in offset-syncs - in that case, MM2 simply
copies the source offset, which might not be a valid offset in the replica
topic at all. This can cause issues when auto offset sync is also turned on.
Updated checkpointing logic to not create checkpoints for topic partitions
without a valid offset mapping.
There is an original feature of copying replica topic offsets back to the
upstream cluster. Checkpointing handled this by creating a checkpoint with
the original topic name (which was a buggy implementation). These offsets
were never meant to be used on the upstream topic, as the offsets probably
do not match between the upstream and the downstream topic. Besides that,
if there are multiple replication flows, these checkpoints would collide
in case sync.group.offsets would be enabled. This functionality is instead
kept by transforming the topic name as it
went through another hop of replication. This can enable client tools to
use the offset-syncs topic of the upstream cluster to map these offsets back.
@viktorsomogyi viktorsomogyi marked this pull request as ready for review April 26, 2022 12:18
@viktorsomogyi
Copy link
Contributor Author

@mimaison would you please review this?

@mimaison
Copy link
Member

Thanks @viktorsomogyi for the PR. If I understand correctly, this an alternative to #11748?

@viktorsomogyi
Copy link
Contributor Author

@mimaison yes, it is, if you see something, feel free to incorporate it into yours. Also I didn't want to ping you on this one but a different one :). Sorry. Once I have that up here you can expect me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants