Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable and deprecate force-local-policy-eval-at-source #22190

Merged
merged 2 commits into from
Nov 24, 2022

Conversation

pchaigno
Copy link
Member

@pchaigno pchaigno commented Nov 15, 2022

There isn't much more than what the title says here, but do check the commit descriptions of course.

Tested manually on GKE and AKS (given those workflows are disabled) with a 3-nodes cluster where I ran the connectivity tests 5 times (different pods each time).

Fixes: #15452.

The force-local-policy-eval-at-source flag was introduced in commit
c525c75 ("bpf: Continue to enforce policy at source endpoint unless
disabled"). It is enabled by default and causes Cilium to always enforce
policies at the source when the destination is a local pod.

Unfortunately, this flag is also causing issues when both endpoint
routes and tunneling are enabled [1] (a configuration that was not
possible at the time the flag was introduced).

We have enough test coverage (L7 on multiple cloud providers) now to be
able to safely disable this flag by default. We can remove it after a
couple releases.

1 - cilium#14657
Signed-off-by: Paul Chaignon <paul@cilium.io>
@pchaigno pchaigno added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Nov 15, 2022
@pchaigno
Copy link
Member Author

/test

@pchaigno pchaigno marked this pull request as ready for review November 16, 2022 09:11
@pchaigno pchaigno requested review from a team as code owners November 16, 2022 09:11
Copy link
Contributor

@thorn3r thorn3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. some tests are failing, in particular GKE connectivity- but that seems to be from a cluster creation error (out of private IP space). should we kick that off again or good to go given your manual testing?

@pchaigno
Copy link
Member Author

some tests are failing, in particular GKE connectivity- but that seems to be from a cluster creation error (our of private IP space). should we kick that off again or good to go given your manual testing?

I'll triage the tests once the second review is in. From a quick look, they all appear to be known flakes.

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep looks redundant vs. USE_BPF_PROG_FOR_INGRESS_POLICY (auto-enabled from the CNI in relevant environments). This combined with your testing is good enough to convince me 👍

@pchaigno
Copy link
Member Author

pchaigno commented Nov 24, 2022

This took too long to review and the Jenkins results were recycled:
/test-1.16-4.9

@pchaigno
Copy link
Member Author

/test-runtime

@julianwiedmann
Copy link
Member

If I understand commit c525c75 correctly, one concern back then was the reconfig scenario where per-EP-routes gets switched on. And the source program (eg. bpf_overlay) has already been regenerated and trusts that that the pod will enforce policy in to-container... but the pod hasn't been regenerated yet, and so there's no to-container program installed yet.

That feels like a legit concern - but maybe it's no longer valid in today's code base?

@pchaigno
Copy link
Member Author

the reconfig scenario where per-EP-routes gets switched on

We've discussed this and concluded that switching the endpoint routes setting isn't supported today without downtime. We would need additional logic to support it.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 24, 2022
@julianwiedmann
Copy link
Member

the reconfig scenario where per-EP-routes gets switched on

We've discussed this and concluded that switching the endpoint routes setting isn't supported today without downtime. We would need additional logic to support it.

Makes sense 👍. Would be nice to have this conclusion in the docs somewhere.

@pchaigno pchaigno removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 24, 2022
This should never have been exposed to users in the first place. It also
causes issues when set to true, as explained in the previous commit.
There are other ways to control if policy enforcement happens at the
source or not (enable-endpoint-routes).

Signed-off-by: Paul Chaignon <paul@cilium.io>
@pchaigno pchaigno force-pushed the disable-forcelocalpolicyevalatsource branch from 5a7a34b to a59c368 Compare November 24, 2022 15:22
@pchaigno
Copy link
Member Author

I've pushed a small change to address @AwesomePatrol's review and didn't rebase. As illustrated by MLH's label, all tests were passing before this change. Given the change is trivial and everything still compiles, I think it's fine to merge without rerunning the full CI.

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 24, 2022
@aanm aanm merged commit 0ded29b into cilium:master Nov 24, 2022
@pchaigno pchaigno deleted the disable-forcelocalpolicyevalatsource branch November 24, 2022 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove force-local-policy-eval-at-source flag
6 participants