Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.14.0 - Policy Denies occurring when they should not be #27169

Closed
2 tasks done
jwitko opened this issue Jul 31, 2023 · 8 comments · Fixed by #27327
Closed
2 tasks done

1.14.0 - Policy Denies occurring when they should not be #27169

jwitko opened this issue Jul 31, 2023 · 8 comments · Fixed by #27327
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. need-more-info More information is required to further debug or fix the issue. needs/triage This issue requires triaging to establish severity and next steps.

Comments

@jwitko
Copy link

jwitko commented Jul 31, 2023

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

I upgraded my cluster from 1.13-latest to 1.14.0.
I then switched from IPTables to BPF.
In audit mode everything works fine and audits all seem expected.
When audit mode is disabled I see lots of policy denies that I can't seem to explain.

Cilium Version

1.14.0

Kernel Version

Linux 5.14.0-284.18.1.el9_2.x86_64 #1 Thu Jun 29 17:06:27 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

1.25.8

Sysdump

No response

Relevant log output

cilium status --verbose:

# cilium status --verbose
KVStore:                Ok   Disabled
Kubernetes:             Ok   1.25 (v1.25.8) [linux/amd64]
Kubernetes APIs:        ["EndpointSliceOrEndpoint", "cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "cilium/v2alpha1::CiliumCIDRGroup", "core/v1::Namespace", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:   True      [bond0 REDACTED, private0 172.31.0.90 (Direct Routing)]
Host firewall:          Enabled   [bond0, private0]
CNI Chaining:           none
Cilium:                 Ok   1.14.0 (v1.14.0-b5013e15)
NodeMonitor:            Listening for events on 16 CPUs with 64x4096 of shared memory
Cilium health daemon:   Ok
IPAM:                   IPv4: 3/254 allocated from 10.0.16.0/24,
Allocated addresses:
  10.0.16.167 (kube-system/coredns-588bb58b94-4txvt [restored])
  10.0.16.215 (router)
  10.0.16.50 (health)
IPv4 BIG TCP:           Disabled
IPv6 BIG TCP:           Disabled
BandwidthManager:       Disabled
Host Routing:           BPF
Masquerading:           BPF       [bond0, private0]   10.0.0.0/8 [IPv4: Enabled, IPv6: Disabled]
Clock Source for BPF:   jiffies   [1000 Hz]
Controller Status:      27/27 healthy
  Name                                         Last success   Last error    Count   Message
  bpf-map-sync-cilium_lxc                      8s ago         never         0       no error
  cilium-health-ep                             58s ago        never         0       no error
  dns-garbage-collector-job                    1m2s ago       never         0       no error
  endpoint-189-regeneration-recovery           never          never         0       no error
  endpoint-1924-regeneration-recovery          never          never         0       no error
  endpoint-740-regeneration-recovery           never          never         0       no error
  endpoint-gc                                  1m2s ago       never         0       no error
  ipcache-inject-labels                        59s ago        1h46m0s ago   0       no error
  k8s-heartbeat                                32s ago        never         0       no error
  link-cache                                   14s ago        never         0       no error
  metricsmap-bpf-prom-sync                     6s ago         never         0       no error
  neighbor-table-refresh                       29s ago        never         0       no error
  resolve-identity-189                         58s ago        never         0       no error
  restoring-ep-identity (1924)                 1h45m59s ago   never         0       no error
  restoring-ep-identity (740)                  1h45m59s ago   never         0       no error
  sync-host-ips                                59s ago        never         0       no error
  sync-lb-maps-with-k8s-services               1h45m59s ago   never         0       no error
  sync-policymap-189                           40s ago        never         0       no error
  sync-policymap-1924                          40s ago        never         0       no error
  sync-policymap-740                           40s ago        never         0       no error
  sync-to-k8s-ciliumendpoint (189)             8s ago         never         0       no error
  sync-to-k8s-ciliumendpoint (1924)            9s ago         never         0       no error
  sync-to-k8s-ciliumendpoint (740)             9s ago         never         0       no error
  sync-utime                                   59s ago        never         0       no error
  template-dir-watcher                         never          never         0       no error
  waiting-initial-global-identities-ep (740)   1h45m59s ago   never         0       no error
  write-cni-file                               1h46m2s ago    never         0       no error
Proxy Status:            OK, ip 10.0.16.215, 0 redirects active on ports 10000-20000, Envoy: embedded
Global Identity Range:   min 256, max 65535
Hubble:                  Ok   Current/Max Flows: 4095/4095 (100.00%), Flows/s: 24.28   Metrics: Ok
KubeProxyReplacement Details:
  Status:                 True
  Socket LB:              Enabled
  Socket LB Tracing:      Enabled
  Socket LB Coverage:     Full
  Devices:                bond0 REDACTED, private0 172.31.0.90 (Direct Routing)
  Mode:                   SNAT
  Backend Selection:      Random
  Session Affinity:       Enabled
  Graceful Termination:   Enabled
  NAT46/64 Support:       Disabled
  XDP Acceleration:       Disabled
  Services:
  - ClusterIP:      Enabled
  - NodePort:       Enabled (Range: 30000-32767)
  - LoadBalancer:   Enabled
  - externalIPs:    Enabled
  - HostPort:       Enabled
BPF Maps:   dynamic sizing: on (ratio: 0.002500)
  Name                          Size
  Auth                          524288
  Non-TCP connection tracking   589568
  TCP connection tracking       1179136
  Endpoint policy               65535
  IP cache                      512000
  IPv4 masquerading agent       16384
  IPv6 masquerading agent       16384
  IPv4 fragmentation            8192
  IPv4 service                  65536
  IPv6 service                  65536
  IPv4 service backend          65536
  IPv6 service backend          65536
  IPv4 service reverse NAT      65536
  IPv6 service reverse NAT      65536
  Metrics                       1024
  NAT                           1179136
  Neighbor table                1179136
  Global policy                 16384
  Session affinity              65536
  Sock reverse NAT              589568
  Tunnel                        65536
Encryption:                                      Disabled
Cluster health:                                  51/51 reachable   (2023-08-01T00:03:23Z)
  Name                                           IP                Node          Endpoints
  znsvc1-3.mydomain.net (localhost)   172.31.0.90       reachable     reachable
  ze2101-1.mydomain.net               172.31.0.16       reachable     reachable
  ze2101-10.mydomain.net              172.31.0.25       reachable     reachable
  ze2101-11.mydomain.net              172.31.0.26       reachable     reachable
  ze2101-12.mydomain.net              172.31.0.27       reachable     reachable
  ze2101-13.mydomain.net              172.31.0.28       reachable     reachable
  ze2101-14.mydomain.net              172.31.0.29       reachable     reachable
  ze2101-15.mydomain.net              172.31.0.30       reachable     reachable
  ze2101-16.mydomain.net              172.31.0.31       reachable     reachable
  ze2101-17.mydomain.net              172.31.0.32       reachable     reachable
  ze2101-18.mydomain.net              172.31.0.33       reachable     reachable
  ze2101-19.mydomain.net              172.31.0.34       reachable     reachable
  ze2101-2.mydomain.net               172.31.0.17       reachable     reachable
  ze2101-20.mydomain.net              172.31.0.35       reachable     reachable
  ze2101-21.mydomain.net              172.31.0.36       reachable     reachable
  ze2101-22.mydomain.net              172.31.0.37       reachable     reachable
  ze2101-23.mydomain.net              172.31.0.38       reachable     reachable
  ze2101-24.mydomain.net              172.31.0.39       reachable     reachable
  ze2101-25.mydomain.net              172.31.0.40       reachable     reachable
  ze2101-26.mydomain.net              172.31.0.41       reachable     reachable
  ze2101-27.mydomain.net              172.31.0.42       reachable     reachable
  ze2101-28.mydomain.net              172.31.0.43       reachable     reachable
  ze2101-29.mydomain.net              172.31.0.44       reachable     reachable
  ze2101-3.mydomain.net               172.31.0.18       reachable     reachable
  ze2101-30.mydomain.net              172.31.0.45       reachable     reachable
  ze2101-31.mydomain.net              172.31.0.46       reachable     reachable
  ze2101-32.mydomain.net              172.31.0.47       reachable     reachable
  ze2101-33.mydomain.net              172.31.0.48       reachable     reachable
  ze2101-34.mydomain.net              172.31.0.49       reachable     reachable
  ze2101-35.mydomain.net              172.31.0.50       reachable     reachable
  ze2101-36.mydomain.net              172.31.0.51       reachable     reachable
  ze2101-37.mydomain.net              172.31.0.52       reachable     reachable
  ze2101-38.mydomain.net              172.31.0.53       reachable     reachable
  ze2101-39.mydomain.net              172.31.0.54       reachable     reachable
  ze2101-4.mydomain.net               172.31.0.19       reachable     reachable
  ze2101-40.mydomain.net              172.31.0.55       reachable     reachable
  ze2101-41.mydomain.net              172.31.0.56       reachable     reachable
  ze2101-42.mydomain.net              172.31.0.57       reachable     reachable
  ze2101-43.mydomain.net              172.31.0.58       reachable     reachable
  ze2101-44.mydomain.net              172.31.0.59       reachable     reachable
  ze2101-45.mydomain.net              172.31.0.60       reachable     reachable
  ze2101-46.mydomain.net              172.31.0.61       reachable     reachable
  ze2101-47.mydomain.net              172.31.0.62       reachable     reachable
  ze2101-48.mydomain.net              172.31.0.63       reachable     reachable
  ze2101-5.mydomain.net               172.31.0.20       reachable     reachable
  ze2101-6.mydomain.net               172.31.0.21       reachable     reachable
  ze2101-7.mydomain.net               172.31.0.22       reachable     reachable
  ze2101-8.mydomain.net               172.31.0.23       reachable     reachable
  ze2101-9.mydomain.net               172.31.0.24       reachable     reachable
  znsvc1-4.mydomain.net               172.31.0.91       reachable     reachable
  znsvc1-5.mydomain.net               172.31.0.92       reachable     reachable

Here is an example.

  1. policy-verdict output from ~1m on a random node:
$ time kubectl -n kube-system exec -it cilium-298mj -- hubble observe flows -t policy-verdict -f | grep AUDITED
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init), install-cni-binaries (init)
Jul 31 22:01:34.453: 162.243.129.7:60929 (world) -> REDACTED:6666 (host) policy-verdict:none INGRESS AUDITED (TCP Flags: SYN)
Jul 31 22:01:38.900: 179.43.167.186:34808 (world) -> REDACTED:2133 (host) policy-verdict:none INGRESS AUDITED (TCP Flags: SYN)
Jul 31 22:01:39.355: 198.235.24.71:56864 (world) -> REDACTED:17516 (host) policy-verdict:none INGRESS AUDITED (TCP Flags: SYN)
Jul 31 22:01:48.727: 185.81.68.102:43562 (world) -> REDACTED:54523 (host) policy-verdict:none INGRESS AUDITED (TCP Flags: SYN)
Jul 31 22:01:49.165: 109.248.6.78:50801 (world) -> REDACTED:1027 (host) policy-verdict:none INGRESS AUDITED (UDP)
Jul 31 22:01:59.260: 89.248.168.235:57596 (world) -> REDACTED:7663 (host) policy-verdict:none INGRESS AUDITED (TCP Flags: SYN)
Jul 31 22:02:04.964: 162.243.142.59:58956 (world) -> REDACTED:3389 (host) policy-verdict:none INGRESS AUDITED (UDP)
Jul 31 22:02:09.086: 45.240.150.10:49146 (world) -> REDACTED:45079 (host) policy-verdict:none INGRESS AUDITED (TCP Flags: SYN)

real    0m41.368s
user    0m0.158s
sys     0m0.080s
  1. Manually edit cilium-config config-map resource to have audit mode disabled.
  2. Restart a single pod
  3. Here is the DROPPED log after for ~1m
 $ kubectl -n kube-system exec -it cilium-zpkzk -- hubble observe flows -t drop -f
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init), install-cni-binaries (init)
Jul 31 22:06:18.510: 172.31.0.30 (host) <> 172.31.0.50 (remote-node) Unsupported protocol for NAT masquerade DROPPED (ICMPv4 TimeExceeded(TTLExceeded))
Jul 31 22:06:19.279: 172.31.0.30 (host) <> 172.31.0.23 (remote-node) Unsupported protocol for NAT masquerade DROPPED (ICMPv4 TimeExceeded(TTLExceeded))
Jul 31 22:06:19.287: 172.31.0.30 (host) <> 172.31.0.24 (remote-node) Unsupported protocol for NAT masquerade DROPPED (ICMPv4 TimeExceeded(TTLExceeded))
Jul 31 22:06:20.685: 205.210.31.145:57248 (world) <> REDACTED:9002 (host) Policy denied DROPPED (TCP Flags: SYN)
Jul 31 22:06:20.721: 172.31.0.30 (host) <> 172.31.0.27 (remote-node) Unsupported protocol for NAT masquerade DROPPED (ICMPv4 TimeExceeded(TTLExceeded))
Jul 31 22:06:20.786: 172.31.0.30 (host) <> 172.31.0.26 (remote-node) Unsupported protocol for NAT masquerade DROPPED (ICMPv4 TimeExceeded(TTLExceeded))
Jul 31 22:06:20.978: 172.31.0.30 (host) <> 172.31.0.54 (remote-node) Unsupported protocol for NAT masquerade DROPPED (ICMPv4 TimeExceeded(TTLExceeded))
Jul 31 22:06:22.599: 172.31.0.30 (host) <> 172.31.0.33 (remote-node) Unsupported protocol for NAT masquerade DROPPED (ICMPv4 TimeExceeded(TTLExceeded))
Jul 31 22:06:24.711: 172.31.0.30 (host) <> 172.31.0.32 (remote-node) Unsupported protocol for NAT masquerade DROPPED (ICMPv4 TimeExceeded(TTLExceeded))
Jul 31 22:06:26.749: 172.31.0.30 (host) <> 172.31.0.48 (remote-node) Unsupported protocol for NAT masquerade DROPPED (ICMPv4 TimeExceeded(TTLExceeded))
Jul 31 22:06:28.002: 172.31.0.30 (host) <> 172.31.0.22 (remote-node) Unsupported protocol for NAT masquerade DROPPED (ICMPv4 TimeExceeded(TTLExceeded))
Jul 31 22:06:28.113: 172.31.0.30 (host) <> 172.31.0.20 (remote-node) Unsupported protocol for NAT masquerade DROPPED (ICMPv4 TimeExceeded(TTLExceeded))
Jul 31 22:06:31.625: 117.9.1.18:29989 (world) <> REDACTED:23 (host) Policy denied DROPPED (TCP Flags: SYN)
Jul 31 22:06:52.788: 92.63.197.91:45830 (world) <> REDACTED:65000 (host) Policy denied DROPPED (TCP Flags: SYN)
Jul 31 22:06:53.890: 78.128.113.102:44215 (world) <> REDACTED:9060 (host) Policy denied DROPPED (TCP Flags: SYN)
Jul 31 22:10:16.000: kube-system/coredns-588bb58b94-vcx55:39097 (ID:43333) <> 1.1.1.1:53 (ID:16777223) policy-verdict:none EGRESS DENIED (UDP)
Jul 31 22:10:16.000: kube-system/coredns-588bb58b94-vcx55:39097 (ID:43333) <> 1.1.1.1:53 (ID:16777223) Policy denied DROPPED (UDP)
Jul 31 22:10:17.501: kube-system/coredns-588bb58b94-vcx55:43595 (ID:43333) <> 1.1.1.1:53 (ID:16777223) policy-verdict:none EGRESS DENIED (UDP)
Jul 31 22:10:17.501: kube-system/coredns-588bb58b94-vcx55:43595 (ID:43333) <> 1.1.1.1:53 (ID:16777223) Policy denied DROPPED (UDP)
Jul 31 22:10:19.003: kube-system/coredns-588bb58b94-vcx55:32808 (ID:43333) <> 1.1.1.1:53 (ID:16777223) policy-verdict:none EGRESS DENIED (UDP)
Jul 31 22:10:19.003: kube-system/coredns-588bb58b94-vcx55:32808 (ID:43333) <> 1.1.1.1:53 (ID:16777223) Policy denied DROPPED (UDP)
  1. Policies that would allow the above denied/dropped traffic for coredns:
$ kubectl -n kube-system get ciliumnetworkpolicy coredns -o yaml
...
specs:
- description: All coredns pods allowed egress to DNS servers
  egress:
  - toCIDRSet:
    - cidr: 1.1.1.1/32
    - cidr: 8.8.4.4/32
    - cidr: 8.8.8.8/32
    - cidr: 169.254.25.10/32
    toPorts:
    - ports:
      - port: "53"
  endpointSelector:
    matchLabels:
      k8s:io.cilium.k8s.policy.serviceaccount: coredns
      k8s:io.kubernetes.pod.namespace: kube-system

Anything else?

It is very difficult to test and/or get logs because when I leave audit mode an massive amount of DROPs occur. Cilium does not seem to be respecting the network policies that worked perfectly fine before the move to bpf.

In this image (IPs blacked out) we can see an example of some drops.
image

These drops should not happen due to a CiliumClusterwideNetworkPolicy

    - name: 'nodes'
      policies:
        - description: "Allow all nodes SSH from MyIP"
          nodeSelector:
            matchLabels: {}
          ingress:
            - fromCIDR:
                - <blacked-out-ip>/32 # MyIP (Redacted)
              toPorts:
                - ports:
                    - port: '22'
                      protocol: TCP

When attempting to lookup the ID:

Jul 31 19:00:43.148: <my-ip>:53002 (ID:16777231) -> <host-ip>:22 (host) policy-verdict:none INGRESS AUDITED (TCP Flags: SYN)
...
Error: Cannot get identity for given ID 16777231: [GET /identity/{id}][404] getIdentityIdNotFound

Code of Conduct

  • I agree to follow this project's Code of Conduct
@jwitko jwitko added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Jul 31, 2023
@dylandreimerink
Copy link
Member

Thanks for this report, would you be able to grab a sysdump file and upload it please so we can better debug the issue? (Using https://docs.cilium.io/en/stable/operations/troubleshooting/#automatic-log-state-collection)

@dylandreimerink dylandreimerink added the need-more-info More information is required to further debug or fix the issue. label Aug 1, 2023
@nathanjsweet
Copy link
Member

@jwitko In addition to getting that sysdump, can you ensure you don't have a broad deny policy. Deny policies take precedence over allow policies, no matter how narrowly tailored the CIDR selection is. So, a broad CIDR deny-policy will deny the traffic of a narrow CIDR allow-policy.

@jwitko
Copy link
Author

jwitko commented Aug 1, 2023

@jwitko In addition to getting that sysdump, can you ensure you don't have a broad deny policy. Deny policies take precedence over allow policies, no matter how narrowly tailored the CIDR selection is. So, a broad CIDR deny-policy will deny the traffic of a narrow CIDR allow-policy.

I do not have any deny policies at all. I have enforcement mode set to always.
I'm trying to get approval for a sysdump sharing but I don't think its going to happen unfortunately.

@margamanterola
Copy link
Member

Hi! What do you mean with "switched from IPTables to BPF."? What did you change and how did you change it?
Similarly with audit mode enabled vs disabled, how are you enabling it or disabling it?

Did you notice anything that could be timing related? In particular there's a setting called identity-restore-grace-period which defaults to 10 minutes. Could there be a timing factor involved instead of the audit vs not audit or are you sure that disabling audit mode is what breaks it?

And could you share how you are configuring and deploying Cilium? If we can reproduce the issue ourselves, then we could get our own sysdump

@jwitko jwitko closed this as not planned Won't fix, can't repro, duplicate, stale Aug 2, 2023
@jwitko
Copy link
Author

jwitko commented Aug 2, 2023

Thank you all for the attempted assistance. I have taken an alternative approach and will circle back since I cannot properly assist in trouble-shooting this at the moment.

@carnerito
Copy link
Contributor

@dylandreimerink @margamanterola @nathanjsweet I've encountered the same situation, and I've opened #27210. As mentioned in this issue, this problem is easy to reproduce and I don't think it has anything with host routing mode, it is related to logic in ipcache package.

@AlHood77
Copy link

AlHood77 commented Aug 4, 2023

I upgraded to 1.14 from 1.13 yesterday and also had a simple egress to CIDR policy being denied. Was working fine for about 10 mins. No other deny rules in place. Had same policy on another deployment which worked fine. A roll-back to 1.13 fixed it. Happy to upgrade again and get a sysdump if required.

@jtgorny
Copy link

jtgorny commented Aug 10, 2023

Similarly to what @AlHood77 mentioned, things work fine for about 10 minutes then cilium-agent pod throws:

cilium-agent level=error msg="Unable to find identity of previously used CIDR <kube-apiserverIP1>" subsys=ipcache
cilium-agent level=error msg="Unable to find identity of previously used CIDR <kube-apiserverIP2>" subsys=ipcache

Seems closely related to #24502. Previously we tried upgrading from 1.12.9 to 1.13.1 and were running into the issue of kube-apiserver IP addresses disappearing from the bpf ipcache after 10 minutes. Decided to stick with cilium 1.12.9 as it was working great for us. Now trying to upgrade from 1.12.9 to 1.14.0 causing very similar issue, yet I suppose it's slightly different because if I look in the cilium bpf ipcache list, I can still see the kube-apiserver IP's:

cilium bpf ipcache list | grep <kube-apiserverIP1>
<kube-apiserverIP1>    identity=16777219 encryptkey=0 tunnelendpoint=0.0.0.0 nodeid=0x0

Once the error is thrown, approximately after 10 minutes after cilium-agent start, cilium monitor --type drop begins to clutter with deny's. Also worth mentioning, we have no np/cnp deny's in place during this testing. cilium status --verbose doesn't seem to indicate anything glaringly wrong either.

Downgrading from 1.14.0 to 1.13.5 seems to work. The bpf ipcache still holds the kube-apiserver IP's. However, one thing I noticed is in 1.13.5, running cilium identity list yields many identities but most notably 3 IP's/CIDR's: the VPC CIDR, and two IP's for the kube-apiserver. In 1.14.0, after 10 minutes, we're observing the VPC CIDR disappear from the identity list and subsequently the bpf ipcache?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. need-more-info More information is required to further debug or fix the issue. needs/triage This issue requires triaging to establish severity and next steps.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants