Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PolicyRepository: index and replace rules by resource. #32703

Merged
merged 3 commits into from
May 30, 2024

Conversation

squeed
Copy link
Contributor

@squeed squeed commented May 24, 2024

(Note for reviewers: this also contains a commit fixing some improper use of shared state in tests; it is mostly mechanical but required for this change).

The PolicyRepository is a database of all known network policies (KNP / CNP / CCNP / gRPC Policies). It references policies by Labels, which are an arbitrary set of identifiers for a policy. When an "upstream" policy is updated, a synthetic set of labels is created and used as a key for replacement.

This label-oriented referencing is inefficient. When a policy is upserted, all existing rules must be scanned to se if they are candidates for replacement. Furthermore, this doesn't reflect how policy rules are actually created. Namely, they have an upstream owning resource, which creates one or more downstream rules in the repository.

This change reorients the PolicyRepository to index and replace rules on a per-resource basis. This pattern is already well established within Cilium, most notably the IPCache, and has proven itself. It also means that the standard policy actions do not require a whole-repository scan or evaluating labels.

The existing label-based replacement mechanism must be preserved for policies managed by the local API, so the existing code cannot be entirely removed, but it will seldom be used in the field.

(This PR was inspired by #27163, but takes a different tack).

Starting cilium-agent with large numbers of network policies should be much faster.

@squeed squeed requested a review from a team as a code owner May 24, 2024 09:57
@squeed squeed requested a review from doniacld May 24, 2024 09:57
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 24, 2024
@squeed squeed added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. kind/performance There is a performance impact of this. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels May 24, 2024
@squeed
Copy link
Contributor Author

squeed commented May 24, 2024

/test

@squeed squeed requested a review from aanm May 27, 2024 12:38
@squeed
Copy link
Contributor Author

squeed commented May 27, 2024

Tagging @aanm for review too; you understand this bit of code somewhat.

pkg/policy/rule.go Outdated Show resolved Hide resolved
Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, very nice cleanup! Do we have any performance benchmark numbers that can tell us how much this improves the startup time?

The stateful test scaffolding was getting in the way of future changes,
especially surrounding the shared SelectorCache. Tests were adding
identities, which affected other tests. Furthermore, testing the package
with --count=2 reliably failed due to left-behind state.

This mechanical change aggregates all useful variables behind a single
struct. No test logic has been changed.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
UID is not a safe resource for indexing; multiple rules may have the
same UID, and UID is not guaranteed to be unique across different
resources. Furthermore, informers may coalesce delete + add events in to
an update, thus losing the UID edge regardless.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
@squeed squeed force-pushed the policy-repo-track-resource branch from 3dc803d to db1df64 Compare May 30, 2024 09:49
@squeed
Copy link
Contributor Author

squeed commented May 30, 2024

/test

@squeed
Copy link
Contributor Author

squeed commented May 30, 2024

Do we have any performance benchmark numbers that can tell us how much this improves the startup time?

@christarazi I threw together a quick benchmark:

$ benchstat bench-before.txt bench-after.txt 
goos: linux
goarch: amd64
pkg: github.com/cilium/cilium/pkg/policy
cpu: 12th Gen Intel(R) Core(TM) i7-1250U
            │ bench-before.txt │           bench-after.txt            │
            │      sec/op      │    sec/op     vs base                │
AddRules-12      241.740µ ± 7%   1.538µ ± 49%  -99.36% (p=0.000 n=10)

Good enough for me :-)

When upserting a CNP or KNP, we identify existing rules in the
repository by a set of labels. However, evaluating this set of labels is
expensive, especially as we must check against all label selectors every
time we want to add or remove a policy.

Rather than using label selectors internally, track policies by
owning resource, much the way that prefixes are tracked in the ipcache.
Then, when upserting policies, the set of existing rules attached to a
given resource can be easily retrieved.

The existing behavior is preserved, as it is also exposed via the local
gRPC API. However, the k8s handlers no longer use it.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
@squeed squeed force-pushed the policy-repo-track-resource branch from db1df64 to 77b43c5 Compare May 30, 2024 11:27
@squeed
Copy link
Contributor Author

squeed commented May 30, 2024

Tests caught a flake -- just some test code that expected a certain ordering. Fixed.

@squeed
Copy link
Contributor Author

squeed commented May 30, 2024

/test

@squeed squeed added this pull request to the merge queue May 30, 2024
Merged via the queue into cilium:main with commit df42a7d May 30, 2024
63 of 64 checks passed
@squeed squeed deleted the policy-repo-track-resource branch May 30, 2024 12:46
Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@squeed I'm 15 mins too late but I left a comment that I believe it should be addressed.

slim_metav1 "github.com/cilium/cilium/pkg/k8s/slim/k8s/apis/meta/v1"
"github.com/cilium/cilium/pkg/labels"
"github.com/cilium/cilium/pkg/lock"
"github.com/cilium/cilium/pkg/option"
"github.com/cilium/cilium/pkg/policy/api"
)

type ruleKey struct {
resource ipcachetypes.ResourceID
idx uint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index in reference to what? A comment here would be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, I'll add a comment in an incoming PR.

@aanm
Copy link
Member

aanm commented May 30, 2024

Do we have any performance benchmark numbers that can tell us how much this improves the startup time?

@christarazi I threw together a quick benchmark:

$ benchstat bench-before.txt bench-after.txt 
goos: linux
goarch: amd64
pkg: github.com/cilium/cilium/pkg/policy
cpu: 12th Gen Intel(R) Core(TM) i7-1250U
            │ bench-before.txt │           bench-after.txt            │
            │      sec/op      │    sec/op     vs base                │
AddRules-12      241.740µ ± 7%   1.538µ ± 49%  -99.36% (p=0.000 n=10)

Good enough for me :-)

I also ran a local benchmark to compare the PR against current main and I didn't see any regressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/performance There is a performance impact of this. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants