Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

endpoint: Only perform full synchronization periodically #27693

Merged

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Aug 25, 2023

Performing a full dump of the policy map on every policy map synchronization
is expensive and is only necessary to catch rare cases where the agent's view
of the policy map has diverged from the kernel's view which should only happen
either due to kernel or other bugs or some other application modifying the
endpoint policy map.

To reduce the cost of synchronizing large endpoint policy maps perform the
full synchronization only periodically. The full synchronization is defaults
to 15 minutes. Configurable with the hidden option
"--policy-map-full-reconciliation-interval".

Fixes: 9dc1350 ("endpoint: Enhance policy map sync")

endpoint: Only perform the full policy map synchronization periodically (every 15 minutes) to reduce overhead with large endpoint policy maps

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 25, 2023
@joamaki joamaki added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Aug 25, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 25, 2023
@joamaki joamaki force-pushed the pr/joamaki/policy-map-periodic-full-sync branch from ec7872a to 61a956e Compare August 28, 2023 13:25
@joamaki
Copy link
Contributor Author

joamaki commented Aug 29, 2023

/test

@joamaki joamaki marked this pull request as ready for review August 29, 2023 06:46
@joamaki joamaki requested review from a team as code owners August 29, 2023 06:46
@joamaki joamaki requested a review from derailed August 29, 2023 06:46
pkg/endpoint/bpf.go Outdated Show resolved Hide resolved
Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this, I had no idea we were issuing a full dump on each endpoint (policy) regeneration!

Copy link
Contributor

@derailed derailed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joamaki Nice work! just a couple comments...

pkg/endpoint/bpf.go Outdated Show resolved Hide resolved
daemon/cmd/daemon_main.go Show resolved Hide resolved
pkg/endpoint/bpf.go Outdated Show resolved Hide resolved
Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix makes sense to me. How do we know that Cilium wasn't relying on this controller running frequently for correct operation?

@joamaki
Copy link
Contributor Author

joamaki commented Aug 31, 2023

We know Cilium wasn’t relying on this because we were not seeing the warnings that would be logged if full sync did catch something. Furthermore, @jrajahalme who wrote this originally did not intend this as he expected UpdateController to be a no-op after the first call and not a trigger to run it immediately.

@joamaki joamaki force-pushed the pr/joamaki/policy-map-periodic-full-sync branch 2 times, most recently from aff193d to 8046258 Compare September 1, 2023 10:55
pkg/controller/manager.go Outdated Show resolved Hide resolved
@christarazi christarazi added kind/enhancement This would improve or streamline existing functionality. area/daemon Impacts operation of the Cilium daemon. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. kind/performance There is a performance impact of this. labels Sep 1, 2023
The UpdateController method performs either a creation, or updating
of a controller. If an update is performed the controller is immediately
triggered. In some cases we want to create the controller once without
triggering or modifying it if it already has been created.

Add a CreateController method for those use cases.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Performing a full dump of the policy map on every policy map synchronization
is expensive and is only necessary to catch rare cases where the agent's view
of the policy map has diverged from the kernel's view which should only happen
either due to kernel or other bugs or some other application modifying the
endpoint policy map.

To reduce the cost of synchronizing large endpoint policy maps perform the
full synchronization only periodically. The full synchronization is defaults
to 15 minutes. Configurable with the hidden option
"--policy-map-full-reconciliation-interval".

Fixes: 9dc1350 ("endpoint: Enhance policy map sync")
Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki force-pushed the pr/joamaki/policy-map-periodic-full-sync branch from 8046258 to f005f93 Compare September 4, 2023 07:21
@joamaki
Copy link
Contributor Author

joamaki commented Sep 4, 2023

/test

@joamaki joamaki removed the request for review from joestringer September 4, 2023 13:24
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 5, 2023
@joamaki joamaki merged commit 867e41a into cilium:main Sep 5, 2023
61 checks passed
ctrlName := fmt.Sprintf("sync-policymap-%d", e.ID)
e.controllers.UpdateController(ctrlName,
e.controllers.CreateController(ctrlName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor nit but it's odd to return a boolean and ignore it. Looking at the line of code here it's not obvious that the result is being ignored. Ideally the underlying function wouldn't return a boolean if it's unused in production code (I see it's used in tests), but this could at least signal that the result is deliberately ignored with a _ = e.controllers.CreateController(...).

@joestringer
Copy link
Member

Huh, I was pondering why the policy trigger doesn't ratelimit these and then realized that's because the policy trigger is for triggering re-evaluation of the policy, not attempting a fresh synchronization of the synchronized policy to the BPF policymaps. So fair 👍

@joamaki joamaki added needs-backport/1.12 needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Sep 11, 2023
@gandro gandro mentioned this pull request Sep 12, 2023
15 tasks
@gandro gandro added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Sep 12, 2023
@gandro
Copy link
Member

gandro commented Sep 12, 2023

I managed to resolve the conflicts for v1.14, but the conflicts for v1.13 and v1.12 are more complex (i.e. there is no managedController type). Marking as backport/author.

@gandro gandro added the backport/author The backport will be carried out by the author of the PR. label Sep 12, 2023
@gandro gandro mentioned this pull request Sep 12, 2023
6 tasks
@joamaki joamaki mentioned this pull request Sep 15, 2023
6 tasks
@gandro gandro added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Sep 25, 2023
@joamaki joamaki added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. and removed needs-backport/1.12 labels Sep 26, 2023
@joamaki joamaki mentioned this pull request Oct 5, 2023
5 tasks
@julianwiedmann julianwiedmann added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. backport/author The backport will be carried out by the author of the PR. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. kind/enhancement This would improve or streamline existing functionality. kind/performance There is a performance impact of this. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants