New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add EnableDefaultDeny field to Network Policy #30572
Conversation
Note for reviewers: I don't love this API; this is intended to be a first pass. Feel free to pick it to pieces :-). |
555f021
to
f8ef6f3
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review the implementation since the key question seems to be around the user-facing API at the moment. What do you think about putting together a CFP on https://github.com/cilium/design-cfps ? I am trying to encourage a culture of documenting the reasoning and key questions / design decisions so we can record what we're trying to solve, why we solve it this way, and why we didn't solve it other ways. This can also help us to come to general consensus with others in the community. The CFP doesn't have to be super complicated, mostly just the content you already wrote for this PR description. Beyond the content here, I think that a dedicated document can also help us to consider alternate API representations without tying too much to the implementation.
My initial feeling on this proposal is that this is a solid option for how to express this functionality, but it does also overlap with another CFP that has been out for feedback for a few months so I think it's also worth evaluating how the two CFPs interact (even if we end up deciding that they have different use cases and we want to adopt both).
pkg/k8s/apis/cilium.io/client/crds/v2/ciliumclusterwidenetworkpolicies.yaml
Show resolved
Hide resolved
Good idea. Filed: cilium/design-cfps#16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature can be availed only with CNP or CCNP. This may not address for vendors who support only k8s network policy. I'm wondering if we can merge passing cmdline changes (#30060) with this PR so that can support both cmdline option and via cnp.
pkg/k8s/apis/cilium.io/client/crds/v2/ciliumclusterwidenetworkpolicies.yaml
Outdated
Show resolved
Hide resolved
pkg/k8s/apis/cilium.io/client/crds/v2/ciliumclusterwidenetworkpolicies.yaml
Outdated
Show resolved
Hide resolved
pkg/k8s/apis/cilium.io/client/crds/v2/ciliumnetworkpolicies.yaml
Outdated
Show resolved
Hide resolved
pkg/k8s/apis/cilium.io/client/crds/v2/ciliumnetworkpolicies.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the great PR description and for writing the CFP. I can't comment on the implementation in the second commit, but I have two pieces and read value of the DefaultAction
, it's not the default action for traffic on the endpoint; it' of feedback here on an initial pass:
- When appropriate, could you update the documentation to include a page on this field? Especially the use case examples? The content you wrote in the PR and the CFP is great and could be adapted pretty well to be in the docs.
- The name of the field
DefaultAction
is confusing for me. I associateDefaultAction
with a configuration that lies outside of a policy, ie, a firewall has aDefaultAction
for traffic on an interface and rules are applied on top. It feels off when the word "default" is used on a per-policy basis because the default action an agent takes for an endpoint is impacted by all of theDefaultAction
s of matching policies, meaning, if I go to a policy and read the value of theDefaultAction
, it's not the default action for traffic on the endpoint, it's how a policy contributes to an agent's decision on what the default action is. I feel there is a way the field could be named to communicate this behavior, something like 'triggerDefaultDeny: true/false`. To me, this is more clear, as it makes the contribution aspect clear. It is also self-documenting, making it easy to understand for folks who aren't familiar with the field: "this policy will or will not trigger a default deny rule for the matched endpoints".
WDYT? Thanks!
I think that these are related but distinct efforts, and given that each effort is already 500+ lines of code I'd prefer not to merge them together. I could perhaps see a way for these two PRs to be split into (a) the core functionality for overriding default deny behaviour, (b) the CNP API for this and (c) the way to configure default policies, in which case probably the core (a) functionality is probably overlapping between the two PRs today but could be merged earlier, then we can iterate on the UX for (b) (#30572) / (c) (#30060) |
FYI I explored a little bit around semantics and action-based wording over here, you may be interested to also read through and brainstorm: cilium/design-cfps#16 (comment) |
f8ef6f3
to
a87a771
Compare
This comment was marked as outdated.
This comment was marked as outdated.
The CFP has merged, and this is ready for review. As a follow-up item, I will file a docs PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work! I left a small wording suggestion for the description of the field, but otherwise LGTM.
pkg/k8s/apis/cilium.io/client/crds/v2/ciliumclusterwidenetworkpolicies.yaml
Outdated
Show resolved
Hide resolved
5ffb95e
to
131d005
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
pkg/k8s/apis/cilium.io/client/crds/v2/ciliumclusterwidenetworkpolicies.yaml
Show resolved
Hide resolved
This adds a new field, EnableDefaultDeny, to CiliumNetworkPolicy and CiliumClusterwideNetworkPolicy, that controls whether or not the subject endpoints of this policy should drop unselected peer traffic. By default, endpoints are in a default-allow mode. When the first policy applies to an endpoint, it flips it to a default-deny mode. This option allows disabling that behavior. If not specified, the existing behavior remains: for each direction, traffic is allowed unless at least one policy rule applies to that endpoint. If multiple policies select an endpoint, then default-deny takes precedence. It is useful in heterogeneous environments, it may not be desirable to implicitly drop all non-matching traffic. Consider, for example, the case where an administrator wishes to ensure a monitoring service can access all namespaces. If they create a cluster-wide policy allowing access from the monitoring service, it may create a deny policy where none was previously; unexpectedly dropping traffic. See: https://github.com/cilium/design-cfps/blob/main/cilium/CFP-30572-non-default-deny-policies.md This commit contains only the API changes; a subsequent commit will introduce the implementation. Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Methods called by Sanitize() may alter the underlying structures. For example. selectors are aggregated and address families are upper-cased. However, top-level fields can't be written by Sanitize. In the future, we'd like to do that. So, give it a pointer receiver. Signed-off-by: Casey Callendrello <cdc@isovalent.com>
This adjust the policy rule generation to take in to account non-default-deny policies. As before, an endpoint is normally in a policy-disabled state. If any policies select this endpoint, then policy is enabled and all non-allowed traffic is dropped. If, however, an endpoint is only selected by default-allow policies, then policy is enabled, but a special wildcard allow policy is inserted. Since wildcard polcies have very low precedence, this ensures that any Deny or L7-proxy rules will still take effect. This commit also fixes tests that incorrectly failed to sanitize rules before adding to the policy repository, leading to a nil pointer exception. Production code *always* sanitizes rules before adding. Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Add some tests that create various mixtures of default-allow and default-deny policies. It is important that default-deny policies always take precedence over default-allow. It is also important that Deny rules take precedence over default-allow. These need to be integration tests, since they rely on specific interactions between the userspace and bpf policy engines. Signed-off-by: Casey Callendrello <cdc@isovalent.com>
131d005
to
8c7d8ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like this approach, but as we add more features to the policy engine this approach may not continue to work.
/test |
@joestringer @christarazi Non-default-deny is on the cusp of being green. I'll merge this in ca. 24 hours, FYI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My review was only for the @cilium/github-sec files.
(see individual commits for a further breakdown of the work included)
(see the CFP / CFP PR for the design motivation)
This adds a new field,
EnableDefaultDeny
, to CiliumNetworkPolicy and CiliumClusterideNetworkPolicy. It configures whether or not selected endpoints are switched to default-deny mode when selected by the given network policy.By default, the presence of at least one network policy selecting an endpoint implies that endpoint should drop all non-matching traffic. This makes sense, given that Kubernetes NetworkPolicy does not even have the ability to explicitly deny traffic. If an endpoint has no policies, then all traffic is allowed.
This presents administrators with a difficult situation. In a heterogeneous cluster, they may wish to apply a cluster-wide policy allowing access to a centralized service (e.g. monitoring or DNS). However, if this cluster-wide policy happens to be the first policy that selects a workload, that workload will abruptly find itself with all other traffic being dropped. Hence, this configurable option. If an endpoint has only
enableDefaultDeny: false
policies, then no implicit default-deny rule is created. This means an administrator can create cluster-wide policies without worry that they will disrupt existing workloads.If an endpoint is subject to at least one
enableDefaultDeny: true
policy, then the existing behavior remains and all traffic not explicitly allowed is dropped.Example 1: Allow Prometheus
This policy ensures that Prometheus can access all workloads in the cluster, without creating a default-deny policy that could block existing traffic.
Example 2: Block cloud-provider metadata APIs
This policy ensures that traffic to the "magic" cloud-provider API is blocked, without affecting existing workloads