Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identity/cache: only call SortedList for release #27796

Merged
merged 3 commits into from Sep 8, 2023

Conversation

bimmlerd
Copy link
Member

@bimmlerd bimmlerd commented Aug 29, 2023

First commit introduces a specific benchmark, the next two commits are optimizations. Commit msgs should be descriptive. The first optimization commit does not improve benchmark time as the benchmark doesn't exercise that path, but the second optimization clearly improves benchmark time.

Below the metrics of a kind cluster which runs 10 pods which send a single UDP packet to host1.test.org., while having a custom DNS server which responds with 5 random IPs for host1.test.org. with a TTL of 5 seconds. This mimicks S3 behaviour.

In the graph, you can see that there's a doubling of identities, that's the identity restoration grace period of 10 minutes; it shows when the pods were restarted with the two commits of this PR applied. They show an approximate reduction of 5MiB/s and 50KHz allocations.

image

@bimmlerd bimmlerd added kind/performance There is a performance impact of this. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. release-note/misc This PR makes changes that have no direct user impact. sig/agent Cilium agent related. labels Aug 29, 2023
@bimmlerd bimmlerd marked this pull request as ready for review August 29, 2023 11:41
@bimmlerd bimmlerd requested a review from a team as a code owner August 29, 2023 11:41
@bimmlerd
Copy link
Member Author

bimmlerd commented Aug 29, 2023

/test (was completely green, rerunning for second commit)

@bimmlerd bimmlerd force-pushed the pr/bimmlerd/avoid-sorted-list-calls branch from 7e73fff to 174bfe9 Compare August 30, 2023 10:34
@bimmlerd bimmlerd requested a review from a team as a code owner August 30, 2023 10:34
@bimmlerd bimmlerd requested a review from aanm August 30, 2023 10:34
@bimmlerd bimmlerd force-pushed the pr/bimmlerd/avoid-sorted-list-calls branch from 174bfe9 to f5a8e36 Compare August 30, 2023 10:36
@bimmlerd
Copy link
Member Author

/test

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but would be nice to have some bechmarks as well to compare before/after.

@bimmlerd
Copy link
Member Author

bimmlerd commented Aug 30, 2023

goos: linux
goarch: arm64
pkg: github.com/cilium/cilium/pkg/labels
                     │     old     │                 opt                 │
                     │   sec/op    │   sec/op     vs base                │
Labels_SortedList-10   3.279µ ± 6%   2.209µ ± 6%  -32.65% (p=0.000 n=10)

                     │    old     │                opt                 │
                     │    B/op    │    B/op     vs base                │
Labels_SortedList-10   504.0 ± 0%   360.0 ± 0%  -28.57% (p=0.000 n=10)

                     │     old     │                opt                 │
                     │  allocs/op  │ allocs/op   vs base                │
Labels_SortedList-10   13.000 ± 0%   3.000 ± 0%  -76.92% (p=0.000 n=10)

pkg: github.com/cilium/cilium/pkg/labels/cidr
                            │     old     │                 opt                 │
                            │   sec/op    │   sec/op     vs base                │
Labels_SortedListCIDRIDs-10   21.23µ ± 5%   14.89µ ± 9%  -29.87% (p=0.000 n=10)

                            │     old      │                 opt                  │
                            │     B/op     │     B/op      vs base                │
Labels_SortedListCIDRIDs-10   3.594Ki ± 0%   1.586Ki ± 0%  -55.87% (p=0.000 n=10)

                            │     old     │                opt                 │
                            │  allocs/op  │ allocs/op   vs base                │
Labels_SortedListCIDRIDs-10   41.000 ± 0%   3.000 ± 0%  -92.68% (p=0.000 n=10)

old is main, opt is this branch.

@bimmlerd bimmlerd force-pushed the pr/bimmlerd/avoid-sorted-list-calls branch from f5a8e36 to 880333c Compare August 30, 2023 13:30
This case is exercised heavily in the toFQDN policy incremental policy
computation flow.

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
@bimmlerd bimmlerd force-pushed the pr/bimmlerd/avoid-sorted-list-calls branch from 880333c to 75cf55b Compare August 30, 2023 13:40
@bimmlerd
Copy link
Member Author

/test

This is on the hot path when we have a fqdn policy for S3, where a
single hostname maps to many IPs. This occurs due to a combination of
factors:

1. We allocate a CIDR identitiy for each IP for a hostname which matches
   a fqdn policy.
2. For each CIDR identity, we generate labels for all super-CIDRs (i.e.
   CIDRs which contain this CIDR).
3. We opportunistically allocate an identity for all IPs which are
   matched by a fqdn selector. For those we already knew, other parts of
   the code decrement the ref count again.
4. We use the SortedList serialization as the key to lookup for existing
   identities here, which sorts and serializes the labels to a byte
   array, which is reasonably expensive.

Since we can't as easily fix the other factors at play here, at least
avoid doing it twice for each label set during the opportunistic
acquire/release path. We can lookup by identity for the fast path, and
build the string representation for the slower path if need be.

We hit this path for every IP returned by S3 that's still being kept
alive (be that because of DNS TTL or the zombie mechanism), hence we can
easily get to 5k acquire/release pairs per DNS request.

While we're at it, also reduce the critical section by moving the
SortedList call outside in lookupOrCreate.

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
SortedList appears prominently in both CPU and Heap pprofs when in a
scenario with an FQDN policy matching S3. This is because DNS requests
to S3 return short-lived DNS responses from a pool of many IPs, each of
which will receive a CIDR identitiy in cilium.

Since we furthermore call SortedList repeatedly for these CIDR
identities (represented as a set of CIDR labels containing all
super-CIDRs), avoiding ~32 buffer allocations per call is worth it.

Before:
Labels_SortedList-10    	1000000	    3079 ns/op	   504 B/op	    13 allocs/op
Labels_SortedListCIDRIDs-10    	  52702	    21417 ns/op	   3680 B/op	     41 allocs/op

After:
Labels_SortedList-10    	1000000	    2164 ns/op	   360 B/op	     3 allocs/op
Labels_SortedListCIDRIDs-10    	  72180	    15444 ns/op	   1624 B/op	      3 allocs/op

Benchstat:
                     │     old     │                 opt                 │
                     │   sec/op    │   sec/op     vs base                │
Labels_SortedList-10   3.279µ ± 6%   2.209µ ± 6%  -32.65% (p=0.000 n=10)

                     │    B/op    │    B/op     vs base                │
Labels_SortedList-10   504.0 ± 0%   360.0 ± 0%  -28.57% (p=0.000 n=10)
                     │  allocs/op  │ allocs/op   vs base                │
Labels_SortedList-10   13.000 ± 0%   3.000 ± 0%  -76.92% (p=0.000 n=10)

pkg: github.com/cilium/cilium/pkg/labels/cidr
                            │     old     │                 opt                 │
                            │   sec/op    │   sec/op     vs base                │
Labels_SortedListCIDRIDs-10   21.23µ ± 5%   14.89µ ± 9%  -29.87% (p=0.000 n=10)
                            │     B/op     │     B/op      vs base                │
Labels_SortedListCIDRIDs-10   3.594Ki ± 0%   1.586Ki ± 0%  -55.87% (p=0.000 n=10)
                            │  allocs/op  │ allocs/op   vs base                │
Labels_SortedListCIDRIDs-10   41.000 ± 0%   3.000 ± 0%  -92.68% (p=0.000 n=10)

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
@bimmlerd bimmlerd force-pushed the pr/bimmlerd/avoid-sorted-list-calls branch from 75cf55b to be37350 Compare September 5, 2023 12:45
@bimmlerd
Copy link
Member Author

bimmlerd commented Sep 5, 2023

I've pushed the above-suggested change to pull the call to SortedList out of the critical section of lookupOrCreate. Please take another look.

@bimmlerd
Copy link
Member Author

bimmlerd commented Sep 5, 2023

/test

@bimmlerd
Copy link
Member Author

bimmlerd commented Sep 5, 2023

CI triage:

@bimmlerd
Copy link
Member Author

bimmlerd commented Sep 5, 2023

/ci-runtime

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 7, 2023
@youngnick youngnick requested review from schlosna and removed request for schlosna September 8, 2023 00:20
@youngnick youngnick merged commit c1e21d8 into cilium:main Sep 8, 2023
61 checks passed
@bimmlerd bimmlerd deleted the pr/bimmlerd/avoid-sorted-list-calls branch September 8, 2023 07:40
@joamaki joamaki mentioned this pull request Sep 15, 2023
6 tasks
@joamaki joamaki mentioned this pull request Oct 5, 2023
5 tasks
@julianwiedmann julianwiedmann added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. labels Dec 5, 2023
@squeed squeed added the needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch label Feb 19, 2024
@tklauser tklauser mentioned this pull request Feb 20, 2024
6 tasks
@tklauser tklauser added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Feb 20, 2024
@github-actions github-actions bot added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. kind/performance There is a performance impact of this. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/agent Cilium agent related. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants