Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes network policy for api-server #20550

Closed
2 tasks done
djkormo opened this issue Jul 15, 2022 · 25 comments · Fixed by #27464
Closed
2 tasks done

Kubernetes network policy for api-server #20550

djkormo opened this issue Jul 15, 2022 · 25 comments · Fixed by #27464
Assignees
Labels
kind/bug This is a bug in the Cilium logic. sig/agent Cilium agent related. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.

Comments

@djkormo
Copy link

djkormo commented Jul 15, 2022

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

  1. My service for api server

kubectl describe svc kubernetes -n default

Name:              kubernetes
Namespace:         default
Labels:            component=apiserver
                   provider=kubernetes
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.233.0.1
IPs:               10.233.0.1
Port:              https  443/TCP
TargetPort:        6443/TCP
Endpoints:         10.137.16.20:6443,10.137.16.21:6443,10.137.16.22:6443
Session Affinity:  None
Events:            <none>
  1. All egress is blocked

Trying to whitelist api-server via Kubernetes Network Policy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-server-access-egress
  namespace: default
spec:
  podSelector: { } # all pods
  policyTypes:
    - Egress
  egress:
    - to:
        - ipBlock:
            cidr: 10.137.17.30/32 # IP of K8s Api Server Endpoints
      ports:
        - protocol: TCP
          port: 6443
    - to:
        - ipBlock:
            cidr: 10.137.17.31/32 # IP of K8s Api Server Endpoints
      ports:
        - protocol: TCP
          port: 6443
    - to:
        - ipBlock:
            cidr: 10.137.17.32/32 # IP of K8s Api Server Endpoints
      ports:
        - protocol: TCP
          port: 6443  

Trafic is still blocked.
In cilium monitor I see blocked 10.137.17.30:6443 , 10.137.17.31:6443, 10.137.17.32:6443

Policy verdict log: flow 0x17d61b8f local EP ID 2679, remote ID kube-apiserver, proto 6, egress, action deny, match none, 10.0.10.66:46330 -> 10.137.17.31:6443 tcp SYN
xx drop (Policy denied) flow 0x17d61b8f to endpoint 0, , identity 60570->kube-apiserver: 10.0.10.66:46330 -> 10.137.17.31:6443 tcp SYN
Policy verdict log: flow 0x635df349 local EP ID 2679, remote ID kube-apiserver, proto 6, egress, action deny, match none, 10.0.10.66:46330 -> 10.137.17.31:6443 tcp SYN
xx drop (Policy denied) flow 0x635df349 to endpoint 0, , identity 60570->kube-apiserver: 10.0.10.66:46330 -> 10.137.17.31:6443 tcp SYN
Policy verdict log: flow 0xae4297f5 local EP ID 2679, remote ID host, proto 6, ingress, action allow, match L3-Only, 10.0.10.90:53836 -> 10.0.10.66:8080 tcp SYN

When I use Cilium Network Policy

kind: CiliumNetworkPolicy
metadata:
  name: cilium-api-server-egress
  namespace: default
spec:
  endpointSelector:
    matchLabels: {} # all pods
  egress:
  - toEntities:
    - kube-apiserver
  - toPorts:
    - ports:
      - port: "6443"
        protocol: TCP

Now everything works fine.

My issue is to use only Kubernetes Network Policy for all my clusters (some of them are based on calico from Tigera)

cilium identity list | grep kube-apiserver

reserved:kube-apiserver

cilium identity get 7

ID   LABELS
7    reserved:kube-apiserver
     reserved:remote-node

Cilium Version

1.11.2

Kernel Version

5.4.182 on Centos 8

Kubernetes Version

1.22.5

Sysdump

No response

Relevant log output

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@djkormo djkormo added kind/bug This is a bug in the Cilium logic. needs/triage This issue requires triaging to establish severity and next steps. labels Jul 15, 2022
@christarazi
Copy link
Member

cc @nebril, something to potentially look into. Maybe we can discuss next week.

@christarazi christarazi added sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. and removed needs/triage This issue requires triaging to establish severity and next steps. labels Jul 16, 2022
@jamallorock
Copy link

Hi @christarazi . It's any news about this topic?

@github-actions
Copy link

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 29, 2022
@djkormo
Copy link
Author

djkormo commented Oct 3, 2022

Still not corrected.

@christarazi
Copy link
Member

@nebril Assigning to you for whenever you have free cycles.

@juergenthomann
Copy link

I can confirm that it also affects Cilium 1.12.5. I will try to find a reason as well next week but can't promise it...

@christarazi christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 6, 2023
@montag
Copy link

montag commented Jan 8, 2023

I could also use a fix for this.

@Cajga
Copy link

Cajga commented Jan 17, 2023

We got the same issue here with latest EKS Anywhere.

Opening up all Egress with networkpolicy works:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-all-ingress-egress
  namespace: sealed-secrets
spec:
  podSelector: {}
  egress:
  - {}
  policyTypes:
  - Egress

Anyone found a workaround which does not require opening up everything?

P.S.: eks anywhere does not have CiliumNetworkPolicy CRD :/

@github-actions
Copy link

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Mar 19, 2023
@christarazi christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Mar 19, 2023
@youngnick youngnick added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/agent Cilium agent related. labels Mar 21, 2023
@gadiener
Copy link

we have the same problem on EKS

@mikhail-barg
Copy link

same here (

@nathanjsweet nathanjsweet self-assigned this Jun 27, 2023
@robpearce-flux
Copy link

also seeing this on cilium 1.12.3 against EKS 1.27

@sathieu
Copy link
Contributor

sathieu commented Sep 1, 2023

I'll try to fix this issue. Here is my proposals:

The best way to fix this would be to add the IP in the frames from/to apiserver 1️⃣. I don't know if this is possible (and where ?).

Another way would be to create fromEntities and toEntities in the conversion code, either:

  • when the IP is a control-plane (with /32), passed as a flag 2️⃣, or detected automaticaly 3️⃣

  • using a selector 4️⃣, for example:

          podSelector:
              matchLabels:
                io.cilium.k8s.reserved: kube-apiserver

I'll start with 2️⃣ : create fromEntities and toEntities when the IP is a control-plane` passed as a flag.

@nathanjsweet
Copy link
Member

This is going to be fixed by #27464.

@squeed
Copy link
Contributor

squeed commented Oct 5, 2023

An update: This is fixed by #27464, but it requires enabling a specific setting. Documentation to come.

@embik
Copy link

embik commented Dec 5, 2023

@squeed do I read the PR right that the fix for this bug will only be in 1.15 and is not slated for backporting?

@squeed
Copy link
Contributor

squeed commented Dec 5, 2023

@embik that's right; the "fix" is an entirely new feature and thus unlikely to be backported.

@embik
Copy link

embik commented Dec 5, 2023

Okay, thanks for the clarification!

@squeed
Copy link
Contributor

squeed commented Dec 5, 2023

@embik as it stands right now, this is only fixed when setting policyCIDRMatchMode: nodes in values.yaml. If you would like this to be the default, we can have that discussion. Setting this comes at the cost of some memory usage (plus it is a behavior change), so we're not currently enabling it by default.

@embik
Copy link

embik commented Dec 5, 2023

@embik as it stands right now, this is only fixed when setting policyCIDRMatchMode: nodes in values.yaml. If you would like this to be the default, we can have that discussion. Setting this comes at the cost of some memory usage (plus it is a behavior change), so we're not currently enabling it by default.

I don't think we have a strong opinion on this per se (we will probably ship Cilium with this setting enabled in our distribution and document the expectation for this to be turned on), but maybe some input from a user perspective: I was very surprised to run into this issue because I didn't expect a NetworkPolicy implementation to "block" access to specific CIDRs when there's a Kubernetes node behind them (that's my understanding of the issue at least). The CIDR option in NetworkPolicies always felt like a network primitive that skips any knowledge of Kubernetes entities outside the Pod.

I would definitely expect this to work by default, or at the very least this should be mentioned in the default installation guide (as an option that you might want to turn on), since "limiting network access to everything except the Kubernetes API" feels like a fairly frequent use case with NetworkPolicies.

@squeed
Copy link
Contributor

squeed commented Dec 6, 2023

@embik that's basically my perspective as well. However, at least Cilium is consistent -- CIDR / IPBlock policies cannot select cluster entities (nodes, pods, service IPs) -- if not necessarily matching intuition.

However, if we made this the default, it could make existing policies less restrictive. If, somehow, a user is depending on the current behavior to block node access, and we change that, it would be a rude surprise. Since this is a security feature, we need to be quite deliberate in how changes are made.

I'm not opposed to changing the default, but it would need to be done with due consideration and consultation with users.

@embik
Copy link

embik commented Dec 6, 2023

Very valid points, reminds me of xkcd 1172 - just super charged for security considerations. There is no "good" way out of this apart from documenting behaviour IMHO. I can totally understand that changing the default behaviour at this point is not possible.

@squeed
Copy link
Contributor

squeed commented Dec 6, 2023

You know, this conversation has tickled a memory; we actually have the ability to change a default via helm that is preserved on upgrade.

@ErikEngerd
Copy link

I migrated from calico to cilium yesterday and migration was a breeze. But it cost me several hours to work around the restriction that ipBlocks with cidrs do not cover pods. But that is easy enough to fix. Just allow both traffic to a CIDR for external traffic and also add the pods you want to allow access to . However, that trick does not work with the API server.

For instance:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: nginx-allow-api-server-access
  namespace: nginx
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to:
      - namespaceSelector: 
          matchLabels:
            kubernetes.io/metadata.name: kube-system
        podSelector:
          matchLabels:
            component: kube-apiserver
      ports:
        - port: 6443

The above network policy should work or am I missing something? The only weird thing is that the kubernetes service which is being accessed is in the 'default' namespace whereas it has an endpoint to a service using a node IP in the 'kube-system' namespace. For now, I have worked around the issue by using the CiliumNetworkPolicy mentioned above but I would rather use standard network policies since it would allow me to switch later on if needed to another CNI. Also, the cilium observe was quite useful in finding out misconfigurations in other parts.

Will it be possible in 1.15 to also configure API server access in a standard way? I think there is no need to configure IPs of the controller nodes in network policies if the api server is running as a pod in a kubeadm cluster. BTW. I am running kubernetes 1.28.5 on debian 12.

@squeed
Copy link
Contributor

squeed commented Jan 12, 2024

@ErikEngerd your policy as written won't work, since the kube-apiserver pod is a host-network pod. It is currently undefined upstream how to treat host-network pods (most policy providers basically ignore them, which is probably correct).

If you would like to select the apiserver via Kubernetes network policy, you will have to wait for v1.15, enable node selectability, then select via IPBlock selectors. Alternatively, you can solve this problem now with a CiliumNetworkPoliy toEntities selector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. sig/agent Cilium agent related. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging a pull request may close this issue.