Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EnableDefaultDeny field to Network Policy #30572

Merged
merged 4 commits into from Mar 14, 2024

Conversation

squeed
Copy link
Contributor

@squeed squeed commented Feb 1, 2024

(see individual commits for a further breakdown of the work included)
(see the CFP / CFP PR for the design motivation)

This adds a new field, EnableDefaultDeny, to CiliumNetworkPolicy and CiliumClusterideNetworkPolicy. It configures whether or not selected endpoints are switched to default-deny mode when selected by the given network policy.

By default, the presence of at least one network policy selecting an endpoint implies that endpoint should drop all non-matching traffic. This makes sense, given that Kubernetes NetworkPolicy does not even have the ability to explicitly deny traffic. If an endpoint has no policies, then all traffic is allowed.

This presents administrators with a difficult situation. In a heterogeneous cluster, they may wish to apply a cluster-wide policy allowing access to a centralized service (e.g. monitoring or DNS). However, if this cluster-wide policy happens to be the first policy that selects a workload, that workload will abruptly find itself with all other traffic being dropped. Hence, this configurable option. If an endpoint has only enableDefaultDeny: false policies, then no implicit default-deny rule is created. This means an administrator can create cluster-wide policies without worry that they will disrupt existing workloads.

If an endpoint is subject to at least one enableDefaultDeny: true policy, then the existing behavior remains and all traffic not explicitly allowed is dropped.

Example 1: Allow Prometheus

This policy ensures that Prometheus can access all workloads in the cluster, without creating a default-deny policy that could block existing traffic.

kind: CiliumClusterwideNetworkPolicy
spec:
  enableDefaultDeny:
    ingress: false
  endpointSelector: {}
  ingress:
  - fromEndpoints:
    - matchLabels:
        "k8s:io.kubernetes.pod.namespace": monitoring

Example 2: Block cloud-provider metadata APIs

This policy ensures that traffic to the "magic" cloud-provider API is blocked, without affecting existing workloads

kind: CiliumClusterwideNetworkPolicy
spec:
  enableDefaultDeny:
    egress: false
  endpointSelector: {}
  egressDeny:
  - toCIDR:
    - 169.254.169.254/32
This adds a new policy field, EnableDefaultDeny, which permits the creation of network polices that do not drop non-matching traffic.

@squeed squeed added sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. labels Feb 1, 2024
@squeed squeed requested review from a team as code owners February 1, 2024 10:48
@squeed
Copy link
Contributor Author

squeed commented Feb 1, 2024

Note for reviewers: I don't love this API; this is intended to be a first pass. Feel free to pick it to pieces :-).

pkg/policy/repository.go Outdated Show resolved Hide resolved
@squeed
Copy link
Contributor Author

squeed commented Feb 1, 2024

/test

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review the implementation since the key question seems to be around the user-facing API at the moment. What do you think about putting together a CFP on https://github.com/cilium/design-cfps ? I am trying to encourage a culture of documenting the reasoning and key questions / design decisions so we can record what we're trying to solve, why we solve it this way, and why we didn't solve it other ways. This can also help us to come to general consensus with others in the community. The CFP doesn't have to be super complicated, mostly just the content you already wrote for this PR description. Beyond the content here, I think that a dedicated document can also help us to consider alternate API representations without tying too much to the implementation.

My initial feeling on this proposal is that this is a solid option for how to express this functionality, but it does also overlap with another CFP that has been out for feedback for a few months so I think it's also worth evaluating how the two CFPs interact (even if we end up deciding that they have different use cases and we want to adopt both).

pkg/policy/api/rule.go Outdated Show resolved Hide resolved
@squeed
Copy link
Contributor Author

squeed commented Feb 2, 2024

I didn't review the implementation since the key question seems to be around the user-facing API at the moment. What do you think about putting together a CFP on https://github.com/cilium/design-cfps ?

Good idea. Filed: cilium/design-cfps#16

Copy link
Contributor

@tamilmani1989 tamilmani1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature can be availed only with CNP or CCNP. This may not address for vendors who support only k8s network policy. I'm wondering if we can merge passing cmdline changes (#30060) with this PR so that can support both cmdline option and via cnp.

Copy link
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great PR description and for writing the CFP. I can't comment on the implementation in the second commit, but I have two pieces and read value of the DefaultAction, it's not the default action for traffic on the endpoint; it' of feedback here on an initial pass:

  1. When appropriate, could you update the documentation to include a page on this field? Especially the use case examples? The content you wrote in the PR and the CFP is great and could be adapted pretty well to be in the docs.
  2. The name of the field DefaultAction is confusing for me. I associate DefaultAction with a configuration that lies outside of a policy, ie, a firewall has a DefaultAction for traffic on an interface and rules are applied on top. It feels off when the word "default" is used on a per-policy basis because the default action an agent takes for an endpoint is impacted by all of the DefaultActions of matching policies, meaning, if I go to a policy and read the value of the DefaultAction, it's not the default action for traffic on the endpoint, it's how a policy contributes to an agent's decision on what the default action is. I feel there is a way the field could be named to communicate this behavior, something like 'triggerDefaultDeny: true/false`. To me, this is more clear, as it makes the contribution aspect clear. It is also self-documenting, making it easy to understand for folks who aren't familiar with the field: "this policy will or will not trigger a default deny rule for the matched endpoints".

WDYT? Thanks!

@joestringer
Copy link
Member

I'm wondering if we can merge passing cmdline changes (#30060) with this PR

I think that these are related but distinct efforts, and given that each effort is already 500+ lines of code I'd prefer not to merge them together. I could perhaps see a way for these two PRs to be split into (a) the core functionality for overriding default deny behaviour, (b) the CNP API for this and (c) the way to configure default policies, in which case probably the core (a) functionality is probably overlapping between the two PRs today but could be merged earlier, then we can iterate on the UX for (b) (#30572) / (c) (#30060)

@joestringer
Copy link
Member

I feel there is a way the field could be named to communicate this behavior, something like 'triggerDefaultDeny: true/false`

FYI I explored a little bit around semantics and action-based wording over here, you may be interested to also read through and brainstorm: cilium/design-cfps#16 (comment)

@maintainer-s-little-helper

This comment was marked as outdated.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Mar 1, 2024
@squeed squeed removed dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels Mar 1, 2024
@squeed squeed requested a review from aanm March 1, 2024 13:58
@squeed
Copy link
Contributor Author

squeed commented Mar 1, 2024

The CFP has merged, and this is ready for review.

As a follow-up item, I will file a docs PR.

@learnitall learnitall self-requested a review March 1, 2024 15:34
Copy link
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! I left a small wording suggestion for the description of the field, but otherwise LGTM.

@squeed
Copy link
Contributor Author

squeed commented Mar 5, 2024

/test

Copy link
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@christarazi christarazi added the kind/feature This introduces new functionality. label Mar 6, 2024
This adds a new field, EnableDefaultDeny, to CiliumNetworkPolicy and
CiliumClusterwideNetworkPolicy, that controls whether or not the subject
endpoints of this policy should drop unselected peer traffic.

By default, endpoints are in a default-allow mode. When the first policy
applies to an endpoint, it flips it to a default-deny mode. This option
allows disabling that behavior. If not specified, the existing behavior
remains: for each direction, traffic is allowed unless at least one
policy rule applies to that endpoint. If multiple policies select an
endpoint, then default-deny takes precedence.

It is useful in heterogeneous environments, it may not be desirable to
implicitly drop all non-matching traffic. Consider, for example, the
case where an administrator wishes to ensure a monitoring service can
access all namespaces. If they create a cluster-wide policy allowing
access from the monitoring service, it may create a deny policy where
none was previously; unexpectedly dropping traffic.

See: https://github.com/cilium/design-cfps/blob/main/cilium/CFP-30572-non-default-deny-policies.md

This commit contains only the API changes; a subsequent commit will
introduce the implementation.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Methods called by Sanitize() may alter the underlying structures. For
example. selectors are aggregated and address families are upper-cased.
However, top-level fields can't be written by Sanitize. In the future,
we'd like to do that. So, give it a pointer receiver.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
This adjust the policy rule generation to take in to account
non-default-deny policies.

As before, an endpoint is normally in a policy-disabled state. If any
policies select this endpoint, then policy is enabled and all
non-allowed traffic is dropped.

If, however, an endpoint is only selected by default-allow policies,
then policy is enabled, but a special wildcard allow policy is inserted.
Since wildcard polcies have very low precedence, this ensures that any
Deny or L7-proxy rules will still take effect.

This commit also fixes tests that incorrectly failed to sanitize rules
before adding to the policy repository, leading to a nil pointer
exception. Production code *always* sanitizes rules before adding.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Add some tests that create various mixtures of default-allow and
default-deny policies. It is important that default-deny policies always
take precedence over default-allow. It is also important that Deny rules
take precedence over default-allow.

These need to be integration tests, since they rely on specific
interactions between the userspace and bpf policy engines.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Copy link
Member

@nathanjsweet nathanjsweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like this approach, but as we add more features to the policy engine this approach may not continue to work.

@squeed
Copy link
Contributor Author

squeed commented Mar 13, 2024

/test

@squeed
Copy link
Contributor Author

squeed commented Mar 13, 2024

@joestringer @christarazi Non-default-deny is on the cusp of being green. I'll merge this in ca. 24 hours, FYI.

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My review was only for the @cilium/github-sec files.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 13, 2024
@squeed squeed added dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. and removed dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. labels Mar 13, 2024
@dylandreimerink dylandreimerink removed the dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. label Mar 14, 2024
@squeed squeed added this pull request to the merge queue Mar 14, 2024
Merged via the queue into cilium:main with commit 95c916d Mar 14, 2024
62 checks passed
@squeed squeed deleted the policy-no-default-deny branch March 14, 2024 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature This introduces new functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/major This PR introduces major new functionality to Cilium. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants