-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-10074: Improve performance of matchingAcls
#8769
KAFKA-10074: Improve performance of matchingAcls
#8769
Conversation
This PR reduces allocations by using a plain old `foreach` in `matchingAcls` and improves `AclSeqs.find` to only search the inner collections that are required to find a match (instead of searching all of them). A recent change (90bbeed) in `matchingAcls` to remove `filterKeys` in favor of filtering inside `flatMap` caused a performance regression in cases where there are large number of topics, prefix ACLs and TreeMap.from/to filtering is ineffective. In such cases, we rely on string comparisons to exclude entries from the ACL cache that are not relevant. This issue is not present in any release yet, so we should include the simple fix in the 2.6 branch. The original benchmark did not show a performance difference, so I adjusted the benchmark to stress the relevant code more. More specifically, `aclCacheSnapshot.from(...).to(...)` returns nearly 20000 entries where each map value contains 1000 AclEntries. Out of the 200k AclEntries, only 1050 are retained due to the `startsWith` filtering. This is the case where the implementation in master is least efficient when compared to the previous version and the version in this PR. The adjusted benchmark results for testAuthorizer are 4.532ms for master, 2.903ms for the previous version and 2.877ms for this PR. Normalized allocation rate was 593 KB/op for master, 597 KB/op for the previous version and 101 KB/s for this PR. Full results follow: master with adjusted benchmark: (aclCount) (resourceCount) Mode Cnt Score Error Units AclAuthorizerBenchmark.testAclsIterator 50 200000 avgt 5 680.805 ± 44.318 ms/op AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate 50 200000 avgt 5 549.879 ± 36.259 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate.norm 50 200000 avgt 5 411457042.000 ± 4805.461 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space 50 200000 avgt 5 331.110 ± 95.821 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 247799480.320 ± 72877192.319 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space 50 200000 avgt 5 0.891 ± 3.183 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space.norm 50 200000 avgt 5 667593.387 ± 2369888.357 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.count 50 200000 avgt 5 28.000 counts AclAuthorizerBenchmark.testAclsIterator:·gc.time 50 200000 avgt 5 3458.000 ms AclAuthorizerBenchmark.testAuthorizer 50 200000 avgt 5 4.532 ± 0.546 ms/op AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate 50 200000 avgt 5 119.036 ± 14.261 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate.norm 50 200000 avgt 5 593524.310 ± 22.452 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Eden_Space 50 200000 avgt 5 117.091 ± 1008.188 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 598574.303 ± 5153905.271 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Survivor_Space 50 200000 avgt 5 0.034 ± 0.291 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Survivor_Space.norm 50 200000 avgt 5 173.001 ± 1489.593 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.count 50 200000 avgt 5 1.000 counts AclAuthorizerBenchmark.testAuthorizer:·gc.time master with filterKeys like 90bbeed and adjusted benchmark: AclAuthorizerBenchmark.testAclsIterator 50 200000 avgt 5 729.163 ± 20.842 ms/op AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate 50 200000 avgt 5 513.005 ± 13.966 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate.norm 50 200000 avgt 5 411459778.400 ± 3178.045 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space 50 200000 avgt 5 307.041 ± 94.544 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 246385400.686 ± 82294899.881 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space 50 200000 avgt 5 1.571 ± 2.590 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space.norm 50 200000 avgt 5 1258291.200 ± 2063669.849 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.count 50 200000 avgt 5 33.000 counts AclAuthorizerBenchmark.testAclsIterator:·gc.time 50 200000 avgt 5 3266.000 ms AclAuthorizerBenchmark.testAuthorizer 50 200000 avgt 5 2.903 ± 0.175 ms/op AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate 50 200000 avgt 5 187.088 ± 11.301 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate.norm 50 200000 avgt 5 597962.743 ± 14.237 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Eden_Space 50 200000 avgt 5 118.602 ± 1021.202 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 383359.632 ± 3300842.044 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.count 50 200000 avgt 5 1.000 counts AclAuthorizerBenchmark.testAuthorizer:·gc.time 50 200000 avgt 5 14.000 ms This PR with adjusted benchmark: (aclCount) (resourceCount) Mode Cnt Score Error Units AclAuthorizerBenchmark.testAclsIterator 50 200000 avgt 5 706.774 ± 32.353 ms/op AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate 50 200000 avgt 5 529.879 ± 25.416 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate.norm 50 200000 avgt 5 411458751.497 ± 4424.187 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space 50 200000 avgt 5 310.559 ± 112.310 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 241364219.611 ± 97317733.967 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Old_Gen 50 200000 avgt 5 0.690 ± 5.937 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Old_Gen.norm 50 200000 avgt 5 531278.507 ± 4574468.166 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space 50 200000 avgt 5 2.550 ± 17.243 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space.norm 50 200000 avgt 5 1969325.592 ± 13278191.648 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.count 50 200000 avgt 5 32.000 counts AclAuthorizerBenchmark.testAclsIterator:·gc.time 50 200000 avgt 5 3489.000 ms AclAuthorizerBenchmark.testAuthorizer 50 200000 avgt 5 2.877 ± 0.530 ms/op AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate 50 200000 avgt 5 31.963 ± 5.912 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate.norm 50 200000 avgt 5 101057.225 ± 9.468 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.count 50 200000 avgt 5 ≈ 0 counts
def find(p: AclEntry => Boolean): Option[AclEntry] = classes.flatMap(_.find(p)).headOption | ||
def isEmpty: Boolean = !classes.exists(_.nonEmpty) | ||
class AclSeqs(seqs: Seq[AclEntry]*) { | ||
def find(p: AclEntry => Boolean): Option[AclEntry] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it need comment to remind reader that this style is for optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is obvious, no? find
should generally short-circuit and not go through all the items. That's how it works for all collection implementations
I think this kind of comment makes sense in matchingAcls
where I added one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chia7712 I looked at the code again and I guess the intent may not be clear. I added a comment that hopefully clarifies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ijuma Thanks for the PR, LGTM
The issue affecting the Scala 2.12 build is unrelated. Looks like Gradle exited while running Streams tests. |
This PR reduces allocations by using a plain old `foreach` in `matchingAcls` and improves `AclSeqs.find` to only search the inner collections that are required to find a match (instead of searching all of them). A recent change (90bbeed) in `matchingAcls` to remove `filterKeys` in favor of filtering inside `flatMap` caused a performance regression in cases where there are large number of topics, prefix ACLs and TreeMap.from/to filtering is ineffective. In such cases, we rely on string comparisons to exclude entries from the ACL cache that are not relevant. This issue is not present in any release yet, so we should include the simple fix in the 2.6 branch. The original benchmark did not show a performance difference, so I adjusted the benchmark to stress the relevant code more. More specifically, `aclCacheSnapshot.from(...).to(...)` returns nearly 20000 entries where each map value contains 1000 AclEntries. Out of the 200k AclEntries, only 1050 are retained due to the `startsWith` filtering. This is the case where the implementation in master is least efficient when compared to the previous version and the version in this PR. The adjusted benchmark results for testAuthorizer are 4.532ms for master, 2.903ms for the previous version and 2.877ms for this PR. Normalized allocation rate was 593 KB/op for master, 597 KB/op for the previous version and 101 KB/s for this PR. Full results follow: master with adjusted benchmark: ``` Benchmark (aclCount) (resourceCount) Mode Cnt Score Error Units AclAuthorizerBenchmark.testAclsIterator 50 200000 avgt 5 680.805 ± 44.318 ms/op AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate 50 200000 avgt 5 549.879 ± 36.259 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate.norm 50 200000 avgt 5 411457042.000 ± 4805.461 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space 50 200000 avgt 5 331.110 ± 95.821 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 247799480.320 ± 72877192.319 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space 50 200000 avgt 5 0.891 ± 3.183 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space.norm 50 200000 avgt 5 667593.387 ± 2369888.357 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.count 50 200000 avgt 5 28.000 counts AclAuthorizerBenchmark.testAclsIterator:·gc.time 50 200000 avgt 5 3458.000 ms AclAuthorizerBenchmark.testAuthorizer 50 200000 avgt 5 4.532 ± 0.546 ms/op AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate 50 200000 avgt 5 119.036 ± 14.261 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate.norm 50 200000 avgt 5 593524.310 ± 22.452 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Eden_Space 50 200000 avgt 5 117.091 ± 1008.188 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 598574.303 ± 5153905.271 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Survivor_Space 50 200000 avgt 5 0.034 ± 0.291 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Survivor_Space.norm 50 200000 avgt 5 173.001 ± 1489.593 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.count 50 200000 avgt 5 1.000 counts AclAuthorizerBenchmark.testAuthorizer:·gc.time 50 200000 avgt 5 13.000 ms ``` master with filterKeys like 90bbeed and adjusted benchmark: ``` Benchmark (aclCount) (resourceCount) Mode Cnt Score Error Units AclAuthorizerBenchmark.testAclsIterator 50 200000 avgt 5 729.163 ± 20.842 ms/op AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate 50 200000 avgt 5 513.005 ± 13.966 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate.norm 50 200000 avgt 5 411459778.400 ± 3178.045 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space 50 200000 avgt 5 307.041 ± 94.544 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 246385400.686 ± 82294899.881 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space 50 200000 avgt 5 1.571 ± 2.590 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space.norm 50 200000 avgt 5 1258291.200 ± 2063669.849 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.count 50 200000 avgt 5 33.000 counts AclAuthorizerBenchmark.testAclsIterator:·gc.time 50 200000 avgt 5 3266.000 ms AclAuthorizerBenchmark.testAuthorizer 50 200000 avgt 5 2.903 ± 0.175 ms/op AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate 50 200000 avgt 5 187.088 ± 11.301 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate.norm 50 200000 avgt 5 597962.743 ± 14.237 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Eden_Space 50 200000 avgt 5 118.602 ± 1021.202 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 383359.632 ± 3300842.044 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.count 50 200000 avgt 5 1.000 counts AclAuthorizerBenchmark.testAuthorizer:·gc.time 50 200000 avgt 5 14.000 ms ``` This PR with adjusted benchmark: ``` Benchmark (aclCount) (resourceCount) Mode Cnt Score Error Units AclAuthorizerBenchmark.testAclsIterator 50 200000 avgt 5 706.774 ± 32.353 ms/op AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate 50 200000 avgt 5 529.879 ± 25.416 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.alloc.rate.norm 50 200000 avgt 5 411458751.497 ± 4424.187 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space 50 200000 avgt 5 310.559 ± 112.310 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Eden_Space.norm 50 200000 avgt 5 241364219.611 ± 97317733.967 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Old_Gen 50 200000 avgt 5 0.690 ± 5.937 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Old_Gen.norm 50 200000 avgt 5 531278.507 ± 4574468.166 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space 50 200000 avgt 5 2.550 ± 17.243 MB/sec AclAuthorizerBenchmark.testAclsIterator:·gc.churn.G1_Survivor_Space.norm 50 200000 avgt 5 1969325.592 ± 13278191.648 B/op AclAuthorizerBenchmark.testAclsIterator:·gc.count 50 200000 avgt 5 32.000 counts AclAuthorizerBenchmark.testAclsIterator:·gc.time 50 200000 avgt 5 3489.000 ms AclAuthorizerBenchmark.testAuthorizer 50 200000 avgt 5 2.877 ± 0.530 ms/op AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate 50 200000 avgt 5 31.963 ± 5.912 MB/sec AclAuthorizerBenchmark.testAuthorizer:·gc.alloc.rate.norm 50 200000 avgt 5 101057.225 ± 9.468 B/op AclAuthorizerBenchmark.testAuthorizer:·gc.count 50 200000 avgt 5 ≈ 0 counts ``` Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Chia-Ping Tsai <chia7712@gmail.com>
* apache-github/2.6: (32 commits) KAFKA-10083: fix failed testReassignmentWithRandomSubscriptionsAndChanges tests (apache#8786) KAFKA-9945: TopicCommand should support --if-exists and --if-not-exists when --bootstrap-server is used (apache#8737) KAFKA-9320: Enable TLSv1.3 by default (KIP-573) (apache#8695) KAFKA-10082: Fix the failed testMultiConsumerStickyAssignment (apache#8777) MINOR: Remove unused variable to fix spotBugs failure (apache#8779) MINOR: ChangelogReader should poll for duration 0 for standby restore (apache#8773) KAFKA-10030: Allow fetching a key from a single partition (apache#8706) Kafka-10064 Add documentation for KIP-571 (apache#8760) MINOR: Code cleanup and assertion message fixes in Connect integration tests (apache#8750) KAFKA-9987: optimize sticky assignment algorithm for same-subscription case (apache#8668) KAFKA-9392; Clarify deleteAcls javadoc and add test for create/delete timing (apache#7956) KAFKA-10074: Improve performance of `matchingAcls` (apache#8769) KAFKA-9494; Include additional metadata information in DescribeConfig response (KIP-569) (apache#8723) KAFKA-10056; Ensure consumer metadata contains new topics on subscription change (apache#8739) KAFKA-10029; Don't update completedReceives when channels are closed to avoid ConcurrentModificationException (apache#8705) KAFKA-10061; Fix flaky `ReassignPartitionsIntegrationTest.testCancellation` (apache#8749) KAFKA-9130; KIP-518 Allow listing consumer groups per state (apache#8238) KAFKA-9501: convert between active and standby without closing stores (apache#8248) MINOR: Relax Percentiles test (apache#8748) MINOR: regression test for task assignor config (apache#8743) ...
This PR reduces allocations by using a plain old
foreach
inmatchingAcls
and improvesAclSeqs.find
to only search the innercollections that are required to find a match (instead of searching all
of them).
A recent change (90bbeed) in
matchingAcls
to removefilterKeys
infavor of filtering inside
flatMap
caused a performance regression incases where there are large number of topics, prefix ACLs and
TreeMap.from/to filtering is ineffective. In such cases, we rely on
string comparisons to exclude entries from the ACL cache that are not
relevant.
This issue is not present in any release yet, so we should include the
simple fix in the 2.6 branch.
The original benchmark did not show a performance difference, so I
adjusted the benchmark to stress the relevant code more. More
specifically,
aclCacheSnapshot.from(...).to(...)
returns nearly 20000entries where each map value contains 1000 AclEntries. Out of the 200k
AclEntries, only 1050 are retained due to the
startsWith
filtering.This is the case where the implementation in master is least
efficient when compared to the previous version and the version in this
PR.
The adjusted benchmark results for testAuthorizer are 4.532ms for
master, 2.903ms for the previous version and 2.877ms for this PR.
Normalized allocation rate was 593 KB/op for master, 597 KB/op for the
previous version and 101 KB/s for this PR. Full results follow:
master with adjusted benchmark:
master with filterKeys like 90bbeed and adjusted benchmark:
This PR with adjusted benchmark:
Committer Checklist (excluded from commit message)