identity/cache: only call SortedList for release #27796

bimmlerd · 2023-08-29T11:20:22Z

First commit introduces a specific benchmark, the next two commits are optimizations. Commit msgs should be descriptive. The first optimization commit does not improve benchmark time as the benchmark doesn't exercise that path, but the second optimization clearly improves benchmark time.

Below the metrics of a kind cluster which runs 10 pods which send a single UDP packet to host1.test.org., while having a custom DNS server which responds with 5 random IPs for host1.test.org. with a TTL of 5 seconds. This mimicks S3 behaviour.

In the graph, you can see that there's a doubling of identities, that's the identity restoration grace period of 10 minutes; it shows when the pods were restarted with the two commits of this PR applied. They show an approximate reduction of 5MiB/s and 50KHz allocations.

bimmlerd · 2023-08-29T11:47:29Z

/test (was completely green, rerunning for second commit)

bimmlerd · 2023-08-30T11:03:19Z

/test

aanm

LGTM but would be nice to have some bechmarks as well to compare before/after.

bimmlerd · 2023-08-30T13:17:24Z

goos: linux
goarch: arm64
pkg: github.com/cilium/cilium/pkg/labels
                     │     old     │                 opt                 │
                     │   sec/op    │   sec/op     vs base                │
Labels_SortedList-10   3.279µ ± 6%   2.209µ ± 6%  -32.65% (p=0.000 n=10)

                     │    old     │                opt                 │
                     │    B/op    │    B/op     vs base                │
Labels_SortedList-10   504.0 ± 0%   360.0 ± 0%  -28.57% (p=0.000 n=10)

                     │     old     │                opt                 │
                     │  allocs/op  │ allocs/op   vs base                │
Labels_SortedList-10   13.000 ± 0%   3.000 ± 0%  -76.92% (p=0.000 n=10)

pkg: github.com/cilium/cilium/pkg/labels/cidr
                            │     old     │                 opt                 │
                            │   sec/op    │   sec/op     vs base                │
Labels_SortedListCIDRIDs-10   21.23µ ± 5%   14.89µ ± 9%  -29.87% (p=0.000 n=10)

                            │     old      │                 opt                  │
                            │     B/op     │     B/op      vs base                │
Labels_SortedListCIDRIDs-10   3.594Ki ± 0%   1.586Ki ± 0%  -55.87% (p=0.000 n=10)

                            │     old     │                opt                 │
                            │  allocs/op  │ allocs/op   vs base                │
Labels_SortedListCIDRIDs-10   41.000 ± 0%   3.000 ± 0%  -92.68% (p=0.000 n=10)

old is main, opt is this branch.

This case is exercised heavily in the toFQDN policy incremental policy computation flow. Signed-off-by: David Bimmler <david.bimmler@isovalent.com>

bimmlerd · 2023-08-30T14:29:15Z

/test

pkg/identity/cache/local.go

This is on the hot path when we have a fqdn policy for S3, where a single hostname maps to many IPs. This occurs due to a combination of factors: 1. We allocate a CIDR identitiy for each IP for a hostname which matches a fqdn policy. 2. For each CIDR identity, we generate labels for all super-CIDRs (i.e. CIDRs which contain this CIDR). 3. We opportunistically allocate an identity for all IPs which are matched by a fqdn selector. For those we already knew, other parts of the code decrement the ref count again. 4. We use the SortedList serialization as the key to lookup for existing identities here, which sorts and serializes the labels to a byte array, which is reasonably expensive. Since we can't as easily fix the other factors at play here, at least avoid doing it twice for each label set during the opportunistic acquire/release path. We can lookup by identity for the fast path, and build the string representation for the slower path if need be. We hit this path for every IP returned by S3 that's still being kept alive (be that because of DNS TTL or the zombie mechanism), hence we can easily get to 5k acquire/release pairs per DNS request. While we're at it, also reduce the critical section by moving the SortedList call outside in lookupOrCreate. Signed-off-by: David Bimmler <david.bimmler@isovalent.com>

SortedList appears prominently in both CPU and Heap pprofs when in a scenario with an FQDN policy matching S3. This is because DNS requests to S3 return short-lived DNS responses from a pool of many IPs, each of which will receive a CIDR identitiy in cilium. Since we furthermore call SortedList repeatedly for these CIDR identities (represented as a set of CIDR labels containing all super-CIDRs), avoiding ~32 buffer allocations per call is worth it. Before: Labels_SortedList-10 1000000 3079 ns/op 504 B/op 13 allocs/op Labels_SortedListCIDRIDs-10 52702 21417 ns/op 3680 B/op 41 allocs/op After: Labels_SortedList-10 1000000 2164 ns/op 360 B/op 3 allocs/op Labels_SortedListCIDRIDs-10 72180 15444 ns/op 1624 B/op 3 allocs/op Benchstat: │ old │ opt │ │ sec/op │ sec/op vs base │ Labels_SortedList-10 3.279µ ± 6% 2.209µ ± 6% -32.65% (p=0.000 n=10) │ B/op │ B/op vs base │ Labels_SortedList-10 504.0 ± 0% 360.0 ± 0% -28.57% (p=0.000 n=10) │ allocs/op │ allocs/op vs base │ Labels_SortedList-10 13.000 ± 0% 3.000 ± 0% -76.92% (p=0.000 n=10) pkg: github.com/cilium/cilium/pkg/labels/cidr │ old │ opt │ │ sec/op │ sec/op vs base │ Labels_SortedListCIDRIDs-10 21.23µ ± 5% 14.89µ ± 9% -29.87% (p=0.000 n=10) │ B/op │ B/op vs base │ Labels_SortedListCIDRIDs-10 3.594Ki ± 0% 1.586Ki ± 0% -55.87% (p=0.000 n=10) │ allocs/op │ allocs/op vs base │ Labels_SortedListCIDRIDs-10 41.000 ± 0% 3.000 ± 0% -92.68% (p=0.000 n=10) Signed-off-by: David Bimmler <david.bimmler@isovalent.com>

bimmlerd · 2023-09-05T12:47:01Z

I've pushed the above-suggested change to pull the call to SortedList out of the critical section of lookupOrCreate. Please take another look.

bimmlerd · 2023-09-05T13:03:36Z

/test

bimmlerd · 2023-09-05T15:16:53Z

CI triage:

Conformance Runtime (ci-runtime) seems to have hit CI: Runtime Test (privileged): Test/linuxPrivilegedIPv6OnlyTestSuite/TestArpPingHandling #26479, rerunning.

bimmlerd · 2023-09-05T15:17:04Z

/ci-runtime

pkg/identity/cache/local.go

bimmlerd added kind/performance There is a performance impact of this. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. release-note/misc This PR makes changes that have no direct user impact. sig/agent Cilium agent related. labels Aug 29, 2023

bimmlerd marked this pull request as ready for review August 29, 2023 11:41

bimmlerd requested a review from a team as a code owner August 29, 2023 11:41

bimmlerd requested a review from aditighag August 29, 2023 11:41

bimmlerd force-pushed the pr/bimmlerd/avoid-sorted-list-calls branch from 7e73fff to 174bfe9 Compare August 30, 2023 10:34

bimmlerd requested a review from a team as a code owner August 30, 2023 10:34

bimmlerd requested a review from aanm August 30, 2023 10:34

bimmlerd force-pushed the pr/bimmlerd/avoid-sorted-list-calls branch from 174bfe9 to f5a8e36 Compare August 30, 2023 10:36

aanm approved these changes Aug 30, 2023

View reviewed changes

bimmlerd force-pushed the pr/bimmlerd/avoid-sorted-list-calls branch from f5a8e36 to 880333c Compare August 30, 2023 13:30

labels/cidr: benchmark SortedList on CIDR labels

a92c95b

This case is exercised heavily in the toFQDN policy incremental policy computation flow. Signed-off-by: David Bimmler <david.bimmler@isovalent.com>

bimmlerd force-pushed the pr/bimmlerd/avoid-sorted-list-calls branch from 880333c to 75cf55b Compare August 30, 2023 13:40

aditighag approved these changes Aug 31, 2023

View reviewed changes

schlosna reviewed Sep 1, 2023

View reviewed changes

pkg/identity/cache/local.go Outdated Show resolved Hide resolved

bimmlerd added 2 commits September 5, 2023 14:44

bimmlerd force-pushed the pr/bimmlerd/avoid-sorted-list-calls branch from 75cf55b to be37350 Compare September 5, 2023 12:45

bimmlerd requested review from aanm and aditighag September 5, 2023 12:47

bimmlerd mentioned this pull request Sep 5, 2023

CI: Runtime Test (privileged): Test/linuxPrivilegedIPv6OnlyTestSuite/TestArpPingHandling #26479

Closed

aanm approved these changes Sep 6, 2023

View reviewed changes

joamaki approved these changes Sep 7, 2023

View reviewed changes

aditighag approved these changes Sep 7, 2023

View reviewed changes

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 7, 2023

aditighag reviewed Sep 7, 2023

View reviewed changes

pkg/identity/cache/local.go Show resolved Hide resolved

youngnick requested review from schlosna and removed request for schlosna September 8, 2023 00:20

youngnick merged commit c1e21d8 into cilium:main Sep 8, 2023
61 checks passed

bimmlerd deleted the pr/bimmlerd/avoid-sorted-list-calls branch September 8, 2023 07:40

joamaki mentioned this pull request Sep 15, 2023

Backport v1.12: FQDN fixes #28138

Merged

6 tasks

joamaki mentioned this pull request Oct 5, 2023

Backport v1.13: FQDN fixes #28401

Merged

5 tasks

julianwiedmann added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. labels Dec 5, 2023

squeed added the needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch label Feb 19, 2024

tklauser mentioned this pull request Feb 20, 2024

v1.14 Backports 2024-02-20 #30864

Merged

6 tasks

tklauser added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Feb 20, 2024

github-actions bot added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Feb 21, 2024

thorn3r mentioned this pull request Mar 13, 2024

Prepare for release v1.14.8 #31383

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

identity/cache: only call SortedList for release #27796

identity/cache: only call SortedList for release #27796

bimmlerd commented Aug 29, 2023 •

edited

bimmlerd commented Aug 29, 2023 •

edited

bimmlerd commented Aug 30, 2023

aanm left a comment

bimmlerd commented Aug 30, 2023 •

edited

bimmlerd commented Aug 30, 2023

bimmlerd commented Sep 5, 2023

bimmlerd commented Sep 5, 2023

bimmlerd commented Sep 5, 2023

bimmlerd commented Sep 5, 2023

identity/cache: only call SortedList for release #27796

identity/cache: only call SortedList for release #27796

Conversation

bimmlerd commented Aug 29, 2023 • edited

bimmlerd commented Aug 29, 2023 • edited

bimmlerd commented Aug 30, 2023

aanm left a comment

Choose a reason for hiding this comment

bimmlerd commented Aug 30, 2023 • edited

bimmlerd commented Aug 30, 2023

bimmlerd commented Sep 5, 2023

bimmlerd commented Sep 5, 2023

bimmlerd commented Sep 5, 2023

bimmlerd commented Sep 5, 2023

bimmlerd commented Aug 29, 2023 •

edited

bimmlerd commented Aug 29, 2023 •

edited

bimmlerd commented Aug 30, 2023 •

edited