Cilium in DSR mode: Can't reach nodeport service outside cluster from remote node in self managed AWS cluster #26407

teclone · 2023-06-21T17:51:39Z

Is there an existing issue for this?

I have searched the existing issues

What happened?

I cannot access a nodePort nginx service outside the cluster through nodePublicIpAddress:nodePort from any of the nodes that does not have the pod running locally in it. The connection always times out when the backend pod is on a remote node.

I am running kubernetes 1.27.3 with Cilium v1.14.0-snapshot.4 in a self managed dual stack k8s cluster installed using kubeadm in AWS. There are two ec2 worker nodes, with 1 control plane node, all in the same AWS region and availability zone (us-east-1a).

I deployed Cilium in strict kube-proxy replacement mode and also disabled ip source/destination checks in AWS for all 3 nodes

Here is the output of kubectl -n kube-system exec ds/cilium -- cilium status --verbose. It shows that all nodes and endpoints are reachable.

[ec2-user@ip-172-31-15-25 ~]$ kubectl -n kube-system exec ds/cilium -- cilium status --verbose
Defaulted container "cilium-agent" out of: cilium-agent, cilium-monitor, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
KVStore:                Ok   Disabled
Kubernetes:             Ok   1.27 (v1.27.3) [linux/arm64]
Kubernetes APIs:        ["EndpointSliceOrEndpoint", "cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "cilium/v2alpha1::CiliumCIDRGroup", "core/v1::Namespace", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:   Strict   [ens5 ipv4 ipv6 (Direct Routing)]
Host firewall:          Disabled
CNI Chaining:           none
Cilium:                 Ok   1.14.0-snapshot.4 (v1.14.0-snapshot.4-6c8db759)
NodeMonitor:            Listening for events on 2 CPUs with 64x4096 of shared memory
Cilium health daemon:   Ok   
IPAM:                   IPv4: 3/254 allocated from 10.10.0.0/24, IPv6: 3/18446744073709551614 allocated from fd10:800b:444f:2b00::/64
Allocated addresses:
  10.10.0.184 (router)
  10.10.0.187 (health)
  10.10.0.73 (kube-system/coredns-6fccb86bcb-5n4bj)
  fd10:800b:444f:2b00::26e3 (kube-system/coredns-6fccb86bcb-5n4bj)
  fd10:800b:444f:2b00::50be (health)
  fd10:800b:444f:2b00::f003 (router)
IPv4 BIG TCP:           Disabled
IPv6 BIG TCP:           Disabled
BandwidthManager:       Disabled
Host Routing:           BPF
Masquerading:           BPF       [ens5]   10.10.0.0/16 [IPv4: Enabled, IPv6: Enabled]
Clock Source for BPF:   jiffies   [100 Hz]
Controller Status:      24/24 healthy
  Name                                  Last success   Last error   Count   Message
  cilium-health-ep                      45s ago        never        0       no error   
  dns-garbage-collector-job             49s ago        never        0       no error   
  endpoint-1837-regeneration-recovery   never          never        0       no error   
  endpoint-572-regeneration-recovery    never          never        0       no error   
  endpoint-891-regeneration-recovery    never          never        0       no error   
  endpoint-gc                           49s ago        never        0       no error   
  ipcache-inject-labels                 46s ago        10m48s ago   0       no error   
  k8s-heartbeat                         19s ago        never        0       no error   
  link-cache                            1s ago         never        0       no error   
  metricsmap-bpf-prom-sync              4s ago         never        0       no error   
  resolve-identity-1837                 46s ago        never        0       no error   
  resolve-identity-572                  30s ago        never        0       no error   
  resolve-identity-891                  45s ago        never        0       no error   
  sync-host-ips                         46s ago        never        0       no error   
  sync-lb-maps-with-k8s-services        10m46s ago     never        0       no error   
  sync-policymap-1837                   10s ago        never        0       no error   
  sync-policymap-572                    10s ago        never        0       no error   
  sync-policymap-891                    10s ago        never        0       no error   
  sync-to-k8s-ciliumendpoint (1837)     6s ago         never        0       no error   
  sync-to-k8s-ciliumendpoint (572)      10s ago        never        0       no error   
  sync-to-k8s-ciliumendpoint (891)      5s ago         never        0       no error   
  sync-utime                            46s ago        never        0       no error   
  template-dir-watcher                  never          never        0       no error   
  write-cni-file                        10m49s ago     never        0       no error   
Proxy Status:            OK, ip 10.10.0.184, 0 redirects active on ports 10000-20000, Envoy: embedded
Global Identity Range:   min 256, max 65535
Hubble:                  Ok   Current/Max Flows: 3911/4095 (95.51%), Flows/s: 6.01   Metrics: Disabled
KubeProxyReplacement Details:
  Status:                 Strict
  Socket LB:              Enabled
  Socket LB Tracing:      Enabled
  Socket LB Coverage:     Full
  Devices:                ens5 ipv4 ipv6 (Direct Routing)
  Mode:                   DSR
  Backend Selection:      Maglev (Table Size: 65521)
  Session Affinity:       Enabled
  Graceful Termination:   Enabled
  NAT46/64 Support:       Disabled
  XDP Acceleration:       Disabled
  Services:
  - ClusterIP:      Enabled
  - NodePort:       Enabled (Range: 30000-32767) 
  - LoadBalancer:   Enabled 
  - externalIPs:    Enabled 
  - HostPort:       Enabled
BPF Maps:   dynamic sizing: on (ratio: 0.002500)
  Name                          Size
  Auth                          524288
  Non-TCP connection tracking   65536
  TCP connection tracking       131072
  Endpoint policy               65535
  IP cache                      512000
  IPv4 masquerading agent       16384
  IPv6 masquerading agent       16384
  IPv4 fragmentation            8192
  IPv4 service                  65536
  IPv6 service                  65536
  IPv4 service backend          65536
  IPv6 service backend          65536
  IPv4 service reverse NAT      65536
  IPv6 service reverse NAT      65536
  Metrics                       1024
  NAT                           131072
  Neighbor table                131072
  Global policy                 16384
  Session affinity              65536
  Sock reverse NAT              65536
  Tunnel                        65536
Encryption:                                  Disabled        
Cluster health:                              3/3 reachable   (2023-06-21T17:25:31Z)
  Name                                       IP              Node        Endpoints
  master-node (localhost)   internalIPV4   reachable   reachable
  worker-1               internalIPIV4    reachable   reachable
  worker-2               internalIpV4    reachable   reachable

However, i can curl the nginx service from any of the 3 nodes internally using each node's ip address and port

# this works from any node internally
curl nodeInternalIp:nodePort

Cilium Version

v1.14.0-snapshot.4

Kernel Version

6.1.29-50.88.amzn2023.aarch64

Kubernetes Version

1.27.3

Sysdump

cilium-sysdump-20230621-174034.zip

Relevant log output

No response

Anything else?

Here is the deployment file

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
spec:
  selector:
    matchLabels:
      run: my-nginx
  replicas: 1
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      containers:
      - name: my-nginx
        image: nginx
        ports:
        - containerPort: 80

here is the command that i used to expose the service

kubectl expose deployment my-nginx --type=NodePort --port=80

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

margamanterola · 2023-06-22T08:36:47Z

Hi @teclone, thanks for the report. You mentioned that this is happening with snapshot.4, did you test with any other Cilium versions before? (snapshot.3 or even Cilium 1.13).

teclone · 2023-06-22T09:53:12Z

hi @margamanterola , yes, i did, i tested with Cilium version 1.13.3, 1.13.4 and snapshot 3 as well, i also tested with Kubernetes 1.27.2

I tried various installations. I also tried using kube-router as the pod networking plugin, alongside cilium (that is no directrouting, but rather bgp from kube-router) as outlined in this cilium guideline https://docs.cilium.io/en/stable/network/kube-router/

Everything worked perfectly in all trials except that i am not able to reach node port service outside the cluster from remote nodes.

I also tried with ubuntu installation, instead of amazon linux 2023. Same thing

margamanterola · 2023-06-22T10:39:53Z

Sorry, @teclone, but it's not clear to me from your message in which cases it failed or didn't fail. Are you saying that it also failed in all of those environments, or that it didn't fail in those? Is there any environment where it didn't fail?

teclone · 2023-06-22T10:50:29Z

ah sorry, @margamanterola , it failed in all the cases. what i meant was that all the installations looked fine
when i printed

kubectl -n kube-system exec ds/cilium -- cilium status --verbose

just like the one in this Issue report

margamanterola · 2023-06-22T11:03:21Z

Alright, in that case it's not a 1.14 regression. I'll retitle.

margamanterola · 2023-06-22T12:23:54Z

From where are you trying to reach the service? I see mentions of whether the pod is running or not in the nodes, but you also say from outside the cluster, which is confusing. Can you clarify, from which locations it works and which locations it doesn't work? What response do you get when it works and when it doesn't work?

teclone · 2023-06-22T12:43:08Z

@margamanterola I tried to reach the service from two places, outside the cluster and inside the cluster., by outside the cluster, i mean from public/world via my browser using the node public ip address.

Test 1. enter the public ip address of the node that has the pod running locally in it + service nodePort, here is the output screen

Test 2: enter the public ip address of any of the other two nodes, (worker 2 and master 1) + service nodePort
error, connection time

by inside the cluster, i mean when i SSH into all of the nodes, and curl the service using each node's internal ip address + port. I receive a valid response, basically 200 response code with the html page.

The screenshot below is the curl request executed inside the master control plane node. The ip address is the internal ip address of my master control plane node. The nginx service is running on nodePort 31345.

I hope it clears the confusion

margamanterola · 2023-06-22T15:28:31Z

Try using hubble or cilium monitor to debug what's going on with your packet. It's likely it's going back the wrong path.

teclone · 2023-06-22T16:28:34Z

Try using hubble or cilium monitor to debug what's going on with your packet. It's likely it's going back the wrong path.

hi @margamanterola , it is not clear to me what to look out for in cilium monitor, i see a lot of logs in here when i execute

kubectl -n kube-system exec ds/cilium -- cilium monitor

What exactly do you mean by going back the wrong path? how do i check for this? I will really appreciate any troubleshooting help i can get

margamanterola · 2023-06-23T09:08:16Z

Hi @teclone! I was able to get a similar behavior to what you are getting, by setting up KIND with a misconfiguration with regards to native routing. I believe your problem is very likely also due to misconfiguration. My findings:

In order to use DSR, Cilium needs to be configured to use native routing (also called "direct routing" in parts of the documentation). To reproduce the behavior in KiND, what I did was to set autoDirectNodeRoutes: false. By doing that, I was telling Cilium that packets for the PodCIDRs would be able to get natively routed, without actually setting up the native routes.

The I used curl to reach the service on each nodeport and ran cilium monitor on the different cilium pods. I saw that if I run cilium monitor on the same node where the nginx pod was running, I could see the traffic go by successfully when connecting to that same node, but nothing when trying to curl the "broken" node. When running cilium monitor on the "broken" node, I saw this:

<- network flow 0x346bddf0 , identity unknown->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 172.18.0.1:45640 -> 172.18.0.3:30033 tcp SYN
xx drop (Service backend not found) flow 0x346bddf0 to endpoint 0, ifindex 76, file bpf_host.c:879, , identity world->unknown: 172.18.0.1:45640 -> 172.18.0.3:30033 tcp SYN

So, Cilium was not able to redirect the traffic because the node was not able to reach the backend.

After that, I reconfigured the cluster setting autoDirectNodeRoutes: true (which tells Cilium to create the pod routes as needed). And in that case I was able to properly reach the service on both nodes.

Now, in your case, you are using AWS. I believe that what you need to do is use the ENI IPAM mode, as documented here:
https://docs.cilium.io/en/stable/network/concepts/routing/#aws-eni-datapath

If this solves the issue for you, please close the bug. Thanks

teclone · 2023-06-24T20:29:47Z

hi @margamanterola , thank you for taking the time to investigate but I actually installed cilium with the auto direct node routes set to true.

However, i did not use aws-eni ipam mode because the number of aws ip addresses per node is very limited. it is not a scalable approach for me.

Please can we setup a time to test this together? So i can show what is going on. Remember that i am able to access the service from any node using internal ip addresses of the nodes.

Why does the internal ip addresses of the nodes work, but not not work with their external ip address except for the node that hosts the pod?

margamanterola · 2023-06-26T09:53:37Z

Hi @teclone,

First, I'm sorry to say that I can't give you one-on-one support. If you want that level of support you might want to talk to a vendor that does so.

I do believe that the issue you are seeing is due to a misconfiguration in your cluster. Configuring native routing can be tricky, and the recommended way when using AWS is using the AWS ENI mode.

As I mentioned above, you can try running cilium monitor on the different agents running in the different nodes, and seeing if you find where and why the packets are being dropped. To make this simpler, what I did was store the output in a file, and then view the file with a text editor, where I could search for the ports that I was interested in.

teclone · 2023-06-26T17:31:06Z

hi @margamanterola, this is not a misconfiguration from me, I followed the documentation properly.

I have tried with cilium monitor as you suggested. I can see in the logs that request reached the agent on the node that hosts the pod from other nodes, below is the log line.

-> network flow 0x3506c3df , identity unknown->unknown state reply ifindex 0 orig-ip 0.0.0.0: 172.31.15.25:31165 -> 197.210.55.103:14731 tcp SYN, ACK
-> network flow 0xaefc16ba , identity unknown->unknown state reply ifindex 0 orig-ip 0.0.0.0: 172.31.15.25:31165 -> 197.210.226.251:18769 tcp SYN, ACK
-> network flow 0xb5e3c06c , identity unknown->unknown state reply ifindex 0 orig-ip 0.0.0.0: 172.31.15.25:31165 -> 197.210.226.251:36918 tcp SYN, ACK
-> network flow 0x2223a415 , identity unknown->unknown state reply ifindex 0 orig-ip 0.0.0.0: 172.31.15.25:31165 -> 197.210.55.103:14731 tcp SYN, ACK
-> network flow 0xa2e16735 , identity unknown->unknown state reply ifindex 0 orig-ip 0.0.0.0: 172.31.15.25:31165 -> 197.210.226.251:18769 tcp SYN, ACK

I am not sure what to make out from the log.

Below is the options i passed to helm while installing cilium. correct me if there is anything wrong in the options

helm repo add cilium https://helm.cilium.io/
SEED=$(head -c12 /dev/urandom | base64 -w0)
helm install cilium cilium/cilium --version 1.14.0-snapshot.4 \
  --namespace kube-system \
  --set k8sServiceHost=hostIP \
  --set k8sServicePort=6443 \
  --set kubeProxyReplacement=strict \
  --set tunnel=disabled \
  --set loadBalancer.mode=dsr \
  --set loadBalancer.algorithm=maglev \
  --set maglev.tableSize=65521 \
  --set ipv6.enabled=true \
  --set bpf.masquerade=true \
  --set enableIPv6Masquerade=true \
  --set autoDirectNodeRoutes=true \
  --set ipv4NativeRoutingCIDR=10.10.0.0/16 \
  --set ipv6NativeRoutingCIDR=fd10:800b:444f:2b00::/56 \
  --set ipam.mode=kubernetes \
  --set monitor.enabled=true \
  --set endpointRoutes.enabled=true

teclone · 2023-07-02T01:03:53Z

hi @margamanterola, any update from you on the above log i shared?

margamanterola · 2023-07-02T13:09:28Z

I suggest you give another read of:
https://docs.cilium.io/en/stable/network/kubernetes/kubeproxy-free/#direct-server-return-dsr

In particular:

Note that usage of DSR mode might not work in some public cloud provider environments due to the Cilium-specific IP options that could be dropped by an underlying network fabric.
[...]
Also, in some public cloud provider environments, which implement a source / destination IP address checking (e.g. AWS), the checking has to be disabled in order for the DSR mode to work.

As I mentioned, I can't really provide 1:1 support. I tried to point you in the right direction, but it seems you need additional support to configure your service and for that you should engage with a vendor, rather than continue this engagement through a bug report.

JamesHawkinss · 2023-10-10T14:28:50Z

@teclone I'm able to reproduce the issue you're experiencing when using OVH Dedicated Servers connected at Layer 2 using their vRack. Sounds like we're having an identical issue - the nodes are able to curl the NodePort pod from each other, but I'm only able to access it externally by using the IP of the node running the pod.

I'm assuming that you haven't been able to find a fix for this?

ninja- · 2023-11-12T13:04:52Z

if there's ever progress on that I would be interested to hear, especially that I might be working with OVH vRack on my next project. however, I can recommend @JamesHawkinss @teclone to clone the service on each node(like nginx) and use externalTrafficPolicy: Local which is usually works much much better for onprem environments...

carnerito · 2023-12-13T09:57:14Z

I have the same issue when running Cilium in DSR mode with Geneve as described here.

giorio94 · 2024-01-08T18:00:21Z

I think I've just hit the same issue mentioned here, which seems specific to DSR + KPR + BPF masquerade disabled.

Reproduced on a two nodes kind cluster, with Cilium (v1.15.0-rc.0) configured with:

bpf:
    masquerade: false
kube-proxy-replacement: strict
tunnelProtocol: geneve
loadBalancer:
  mode: dsr
  dsrDispatch: geneve

The nodeport service is reachable when targeting the node hosting the backend pod, not when targeting the other node. In that case, curl 172.18.0.6:31852 returns: curl: (56) Recv failure: Connection reset by peer. Running tcpdump on the node hosting the backend highlights that the second response is not SNATted correctly (9898 is the port the server is listening on, 172.18.0.5 the IP of the hosting node):

$ tcpdump -ni eth0 port 9898 or port 31852
17:54:46.972737 IP 172.18.0.6.31852 > 172.18.0.1.55830: Flags [S.], seq 239868844, ack 3429881298, win 64308, options [mss 1410,sackOK,TS val 708178691 ecr 830692500,nop,wscale 7], length 0
17:54:46.972818 IP 172.18.0.5.9898 > 172.18.0.1.55830: Flags [.], ack 3429881378, win 502, options [nop,nop,TS val 708178691 ecr 830692500], length 0
17:54:46.972824 IP 172.18.0.1.55830 > 172.18.0.5.9898: Flags [R], seq 3429881378, win 0, length 0
17:54:47.173255 IP 172.18.0.6.31852 > 172.18.0.1.55830: Flags [R], seq 239868845, win 0, length 0

The same issue, instead, does not reproduce when BPF masquerade is enabled.

teclone · 2024-01-09T08:07:56Z

@margamanterola please, this might be worth another look now?

withinboredom · 2024-01-11T19:03:31Z

Let me know if I am hijacking, but after searching multiple issues, this is the closest one to my issue. However, I'm using metallb to provide a public IP. With IPv4, I see exactly this issue when getting traffic via the external IP. IPv6 works just fine though.

github-actions · 2024-03-12T01:44:17Z

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

withinboredom · 2024-03-13T08:19:53Z

I really with the issue hiding bot didn't exist. It doesn't make issues magically stop existing.

teclone added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Jun 21, 2023

margamanterola changed the title ~~Cilium v1.14.0-snapshot.4 in DSR mode: Can't reach nodeport service outside cluster from remote node in self managed AWS cluster~~ Cilium in DSR mode: Can't reach nodeport service outside cluster from remote node in self managed AWS cluster Jun 22, 2023

margamanterola removed the release-blocker/1.14 This issue will prevent the release of the next version of Cilium. label Jun 22, 2023

PKizzle mentioned this issue Jul 8, 2023

LoadBalancer service unable to reach external Endpoint #26515

Closed

2 tasks

julianwiedmann added the area/loadbalancing Impacts load-balancing and Kubernetes service implementations label Aug 30, 2023

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Mar 12, 2024

giorio94 removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Mar 13, 2024

julianwiedmann mentioned this issue Apr 25, 2024

Geneve DSR is broken when running w/o BPF masquerade #32189

Open

julianwiedmann added the feature/dsr Relates to Cilium's Direct-Server-Return feature for KPR. label May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cilium in DSR mode: Can't reach nodeport service outside cluster from remote node in self managed AWS cluster #26407

Cilium in DSR mode: Can't reach nodeport service outside cluster from remote node in self managed AWS cluster #26407

teclone commented Jun 21, 2023 •

edited

margamanterola commented Jun 22, 2023

teclone commented Jun 22, 2023 •

edited

margamanterola commented Jun 22, 2023

teclone commented Jun 22, 2023 •

edited

margamanterola commented Jun 22, 2023

margamanterola commented Jun 22, 2023

teclone commented Jun 22, 2023 •

edited

margamanterola commented Jun 22, 2023

teclone commented Jun 22, 2023 •

edited

margamanterola commented Jun 23, 2023 •

edited

teclone commented Jun 24, 2023

margamanterola commented Jun 26, 2023

teclone commented Jun 26, 2023 •

edited

teclone commented Jul 2, 2023

margamanterola commented Jul 2, 2023

JamesHawkinss commented Oct 10, 2023

ninja- commented Nov 12, 2023

carnerito commented Dec 13, 2023

giorio94 commented Jan 8, 2024

teclone commented Jan 9, 2024

withinboredom commented Jan 11, 2024

github-actions bot commented Mar 12, 2024

withinboredom commented Mar 13, 2024

Cilium in DSR mode: Can't reach nodeport service outside cluster from remote node in self managed AWS cluster #26407

Cilium in DSR mode: Can't reach nodeport service outside cluster from remote node in self managed AWS cluster #26407

Comments

teclone commented Jun 21, 2023 • edited

Is there an existing issue for this?

What happened?

Cilium Version

Kernel Version

Kubernetes Version

Sysdump

Relevant log output

Anything else?

Code of Conduct

margamanterola commented Jun 22, 2023

teclone commented Jun 22, 2023 • edited

margamanterola commented Jun 22, 2023

teclone commented Jun 22, 2023 • edited

margamanterola commented Jun 22, 2023

margamanterola commented Jun 22, 2023

teclone commented Jun 22, 2023 • edited

margamanterola commented Jun 22, 2023

teclone commented Jun 22, 2023 • edited

margamanterola commented Jun 23, 2023 • edited

teclone commented Jun 24, 2023

margamanterola commented Jun 26, 2023

teclone commented Jun 26, 2023 • edited

teclone commented Jul 2, 2023

margamanterola commented Jul 2, 2023

JamesHawkinss commented Oct 10, 2023

ninja- commented Nov 12, 2023

carnerito commented Dec 13, 2023

giorio94 commented Jan 8, 2024

teclone commented Jan 9, 2024

withinboredom commented Jan 11, 2024

github-actions bot commented Mar 12, 2024

withinboredom commented Mar 13, 2024

teclone commented Jun 21, 2023 •

edited

teclone commented Jun 22, 2023 •

edited

teclone commented Jun 22, 2023 •

edited

teclone commented Jun 22, 2023 •

edited

teclone commented Jun 22, 2023 •

edited

margamanterola commented Jun 23, 2023 •

edited

teclone commented Jun 26, 2023 •

edited