Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-16226; Reduce synchronization between producer threads (#15323) #15493

Merged
merged 1 commit into from
Mar 14, 2024

Conversation

msn-tldr
Copy link
Contributor

@msn-tldr msn-tldr commented Mar 7, 2024

NOTE this cherry-picks fix #14522 into 3.7

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

…15323)

As this [JIRA](https://issues.apache.org/jira/browse/KAFKA-16226) explains, there is increased synchronization between application-thread, and the background thread as the background thread started to synchronized methods Metadata.currentLeader() in [original PR](apache#14384). So this PR does the following changes
1. Changes background thread, i.e. RecordAccumulator's partitionReady(), and drainBatchesForOneNode(), to not use `Metadata.currentLeader()`. Instead rely on `MetadataCache` that is immutable. So access to it is unsynchronized.
2.  This PR repurposes `MetadataCache` as an immutable snapshot of Metadata. This is a wrapper around public `Cluster`. `MetadataCache`'s API/functionality should be extended for internal client usage Vs public `Cluster`. For example, this PR adds `MetadataCache.leaderEpochFor()`
3. Rename `MetadataCache` to `MetadataSnapshot` to make it explicit its immutable.

**Note both `Cluster` and `MetadataCache` are not syncronized, hence reduce synchronization from the hot path for high partition counts.**

Reviewers: Jason Gustafson <jason@confluent.io>
@msn-tldr msn-tldr marked this pull request as ready for review March 8, 2024 12:11
@msn-tldr
Copy link
Contributor Author

msn-tldr commented Mar 8, 2024

@hachikuji / @ijuma can you help in merging this?

The test failures on jenkins are flaky and unrelated.

@ijuma
Copy link
Contributor

ijuma commented Mar 14, 2024

Test failures are the same as the 3.7 branch.

Copy link
Contributor

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

This is a clean cherry-pick from master.

@ijuma ijuma merged commit f31307a into apache:3.7 Mar 14, 2024
1 check failed
@msn-tldr
Copy link
Contributor Author

@ijuma thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants