Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterIP addresses for Ingress services no longer work when bpf masquerading is enabled in native routing mode #32525

Open
3 tasks done
jspaleta opened this issue May 14, 2024 · 9 comments
Labels
area/loadbalancing Impacts load-balancing and Kubernetes service implementations kind/bug This is a bug in the Cilium logic. needs/triage This issue requires triaging to establish severity and next steps. sig/agent Cilium agent related.

Comments

@jspaleta
Copy link
Contributor

jspaleta commented May 14, 2024

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

I cannot seem to access ingress clusterIP addresses from a pod on the same node as the service backend pod if bpf masquerade is enabled in native router mode. I've got a documented Kind cluster environment in a github repo that can be used to reproduce.

In fact I can no longer access either the externalIP or the clusterIP, but the loss of the externalIP from inside the cluster isn't necessarily something I would expect to always be true. But I do expect to be able to use the clusterIP from inside the cluster.

There seems to be several bpf masquerade issues floating around, I didn't read any of them as being specific to this situation. They are probably all related, however.

I was able to isolate the symptoms to just the bpf.masquarade boolean in native routing mode.
I can't seem to trigger this at all in default tunneling routing mode.

Helm values used:

## baseline values
kubeProxyReplacement: true
routingMode: native
ipv4NativeRoutingCIDR: '10.9.0.0/16'
autoDirectNodeRoutes: true
ingressController:
  # -- Enable cilium ingress controller
  enabled: true
  default: true
  loadbalancerMode: dedicated
gatewayAPI:
  enabled: true
operator:
  replicas: 1
l2announcements:
  enabled: true
ipam:
  mode: 'cluster-pool'
  operator:
    clusterPoolIPv4PodCIDRList:
      - '10.9.0.0/16'
    clusterPoolIPv4MaskSize: 24

## Values under test  
bpf:
  masquerade: true

Cluster operates as expected when baseline helm values are used.

Cilium Version

cilium v1.5.4

Kernel Version

Linux carbon 6.7.4-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Feb 5 22:21:14 UTC 2024 x86_64 GNU/Linux

Kubernetes Version

using kind cluster

kubectl version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2

Regression

I did not check for regression

Sysdump

cilium-sysdump-20240513-165256.zip

Relevant log output

No response

Anything else?

I've documented the baseline native routing Kind cluster environment I'm using here:
https://github.com/jspaleta/scale21x-demos/tree/main/environments/cilium-l2lb/imperial-gateway-native-routing

I'll update the issue with additional info on how to use this environment to reproduce the symptoms.

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct
@jspaleta jspaleta added kind/bug This is a bug in the Cilium logic. needs/triage This issue requires triaging to establish severity and next steps. kind/community-report This was reported by a user in the Cilium community, eg via Slack. labels May 14, 2024
@jspaleta
Copy link
Contributor Author

Okay to reproduce the problem using the imperial-gateway-native-routing environment in my demos repo:

First I edit the cilium helm values under cilium/ directory and enable bpf.masquerade

Then I provision the kind cluster with variations of the death star tutorial service to include ingress and gateway API services fronting the deathstar service.

$ ./install.sh
<wait for cilium status to go green
# ./seed.sh
$ kubectl get services -A
NAMESPACE          NAME                               TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)                      AGE
deathstar          cilium-ingress-deathstar-ingress   LoadBalancer   10.96.120.14    172.18.200.105   80:30658/TCP,443:30755/TCP   29m
deathstar          deathstar                          LoadBalancer   10.96.240.62    172.18.200.101   80:31390/TCP                 29m
...
$ kubectl get ingress -A
NAMESPACE          NAME                CLASS    HOSTS   ADDRESS          PORTS   AGE
deathstar          deathstar-ingress   cilium   *       172.18.200.105   80      31m
...

The tiefighter pod, running on a different node than the deathstar backend pod works when using the ingress clusterIP

$ kubectl exec -n imperial-starships -ti tiefighter -- curl -s -XPOST 10.96.120.14/v1/request-landing
Death Star Landing Request: Ship Landed!

The xwing pod, running on the same node as the deathstar backend pod doesn't work.

$ kubectl exec -n rebel-scum -ti xwing -- curl -s -XPOST 10.96.120.14/v1/request-landing
upstream connect error or disconnect/reset before headers. reset reason: connection timeout
$ kubectl describe node imperial-gateway-worker2
Non-terminated Pods:          (6 in total)
  Namespace                   Name                                  CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                  ------------  ----------  ---------------  -------------  ---
  deathstar                   deathstar-f64bfbf4d-8v77g             0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
  rebel-scum                  xwing                                 0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
...
$ kubectl describe node imperial-gateway-worker2

Non-terminated Pods:          (6 in total)
  Namespace                   Name                                       CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                       ------------  ----------  ---------------  -------------  ---
  imperial-starships          tiefighter                                 0 (0%)        0 (0%)      0 (0%)           0 (0%)         35m
...

Interesting note both the xwing and tiefighter pods are able to access the Deathstar service directly using the ClusterIP, it appears that ingress (and I'm assuming also Gateway..i need to test... services fronting the actual Deathstar service are impacted)

@jspaleta
Copy link
Contributor Author

jspaleta commented May 14, 2024

quick check and gateway has same issue, can't access via clusterIP from the same node.

I'm able to see a difference between my tie and xwing pods only because i have a replica of 1 setup for my deathstar backends in my environment.

if I had backends on both my worker nodes by scaling the deathstar deployment up, I get sort of stochastic behavior on connection attempts from both xwing and tiefighter pods.. depending on which backend is chosen to service the HTTP request.

So for diagnostic purposes its easier to keep the target service deployment to 1 backend.

@sayboras
Copy link
Member

thanks for your issue, can you give it a try with bpf.legacyHostRouting enabled ?

@sayboras
Copy link
Member

This could be similar to #31653

@jspaleta
Copy link
Contributor Author

enabling bpf.legacyHostRouting did not help the situation.

trying to curl the clusterIP of the deathstar ingress service from a pod on the same node as the deathstar backend pod still results in the timeout error.

@jspaleta
Copy link
Contributor Author

jspaleta commented May 14, 2024

@sayboras definitely looks similar to the error in the "new" connectivity test.
I'm using latest cilium-cli release 0.16.7, looks like that test is available

running baseline native routing without bpf.masq enabled the pod-to-ingress-service test pass

$ cilium connectivity test --test='pod-to-ingress-service'
...
[=] Test [pod-to-ingress-service] [61/78]
......
[=] Test [pod-to-ingress-service-deny-all] [62/78]
W0514 13:40:59.099428   65787 warnings.go:70] unknown field "spec.enableDefaultDeny"
......
[=] Test [pod-to-ingress-service-deny-ingress-identity] [63/78]
W0514 13:41:30.985481   65787 warnings.go:70] unknown field "spec.enableDefaultDeny"
......
[=] Test [pod-to-ingress-service-deny-backend-service] [64/78]
W0514 13:41:44.875364   65787 warnings.go:70] unknown field "spec.enableDefaultDeny"
...

Note: For those following along, disregard the known spurious warning as enableDefaultDeny is added in the 1.16.0 cilium prereleases as an extension to the network policy spec and the cli tool is just being overly verbose about it as I'm running cilium 1.15.4.

enabling bpf.masquerade results in errors

$ cilium connectivity test --test='pod-to-ingress-service' --collect-sysdump-on-failure
...
📋 Test Report
❌ 2/5 tests failed (6/30 actions), 73 tests skipped, 0 scenarios skipped:
Test [pod-to-ingress-service]:
  ❌ pod-to-ingress-service/pod-to-ingress-service/curl-1: cilium-test/client-69748f45d8-7ps7g (10.9.1.17) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)
  ❌ pod-to-ingress-service/pod-to-ingress-service/curl-2: cilium-test/client2-ccd7b8bdf-f5fnc (10.9.1.236) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)
  ❌ pod-to-ingress-service/pod-to-ingress-service/curl-5: cilium-test/client3-868f7b8f6b-v47k9 (10.9.0.51) -> cilium-test/cilium-ingress-other-node (cilium-ingress-other-node.cilium-test:80)
Test [pod-to-ingress-service-allow-ingress-identity]:
  ❌ pod-to-ingress-service-allow-ingress-identity/pod-to-ingress-service/curl-0: cilium-test/client-69748f45d8-7ps7g (10.9.1.17) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)
  ❌ pod-to-ingress-service-allow-ingress-identity/pod-to-ingress-service/curl-2: cilium-test/client2-ccd7b8bdf-f5fnc (10.9.1.236) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)
  ❌ pod-to-ingress-service-allow-ingress-identity/pod-to-ingress-service/curl-5: cilium-test/client3-868f7b8f6b-v47k9 (10.9.0.51) -> cilium-test/cilium-ingress-other-node (cilium-ingress-other-node.cilium-test:80)
connectivity test failed: 2 tests failed

See attached sysdump corresponding to first of six failed actions:
cilium-sysdump-20240514-140902.zip

enabling bpf.masquerade and bpf.legacyHostRouting results in errors:

$ cilium connectivity test --test='pod-to-ingress-service' --collect-sysdump-on-failure
📋 Test Report
❌ 2/5 tests failed (6/30 actions), 73 tests skipped, 0 scenarios skipped:
Test [pod-to-ingress-service]:
  ❌ pod-to-ingress-service/pod-to-ingress-service/curl-0: cilium-test/client-69748f45d8-8pczl (10.9.0.91) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)
  ❌ pod-to-ingress-service/pod-to-ingress-service/curl-2: cilium-test/client2-ccd7b8bdf-ndvvt (10.9.0.55) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)
  ❌ pod-to-ingress-service/pod-to-ingress-service/curl-5: cilium-test/client3-868f7b8f6b-d8slh (10.9.2.12) -> cilium-test/cilium-ingress-other-node (cilium-ingress-other-node.cilium-test:80)
Test [pod-to-ingress-service-allow-ingress-identity]:
  ❌ pod-to-ingress-service-allow-ingress-identity/pod-to-ingress-service/curl-0: cilium-test/client-69748f45d8-8pczl (10.9.0.91) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)
  ❌ pod-to-ingress-service-allow-ingress-identity/pod-to-ingress-service/curl-2: cilium-test/client2-ccd7b8bdf-ndvvt (10.9.0.55) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)
  ❌ pod-to-ingress-service-allow-ingress-identity/pod-to-ingress-service/curl-5: cilium-test/client3-868f7b8f6b-d8slh (10.9.2.12) -> cilium-test/cilium-ingress-other-node (cilium-ingress-other-node.cilium-test:80)
connectivity test failed: 2 tests failed

See attached sysdump corresponding to first of six failed actions:
cilium-sysdump-20240514-153103.zip

@squeed squeed added sig/agent Cilium agent related. area/loadbalancing Impacts load-balancing and Kubernetes service implementations and removed needs/triage This issue requires triaging to establish severity and next steps. kind/community-report This was reported by a user in the Cilium community, eg via Slack. labels May 15, 2024
@jspaleta
Copy link
Contributor Author

just tested with 1.15.5 release, out today.. and still not working for me...even with the legacyHostRouting option enabled...

$ cilium version
cilium-cli: v0.16.7 compiled with go1.22.2 on linux/amd64
cilium image (default): v1.15.4
cilium image (stable): v1.15.5
cilium image (running): 1.15.5

kind config

ind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: imperial-gateway
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: "ingress-ready=true"
  # port forward 80 on the host to 80 on this node
  extraPortMappings:
  - containerPort: 80
    hostPort: 80
    protocol: TCP
  - containerPort: 443
    hostPort: 443
    protocol: TCP
- role: worker
- role: worker
networking:
  disableDefaultCNI: true
  kubeProxyMode: "none"
jspaleta@carbon imperial-gateway-native-routing (main *%

Passing Cilium config

## baseline
kubeProxyReplacement: true
routingMode: native
ipv4NativeRoutingCIDR: '10.9.0.0/16'
autoDirectNodeRoutes: true
ingressController:
  # -- Enable cilium ingress controller
  enabled: true
  default: true
  loadbalancerMode: dedicated
gatewayAPI:
  enabled: true
operator:
  replicas: 1
l2announcements:
  enabled: true
ipam:
  mode: 'cluster-pool'
  operator:
    clusterPoolIPv4PodCIDRList:
      - '10.9.0.0/16'
    clusterPoolIPv4MaskSize: 24

## Under test  
bpf:
  masquerade: false
  legacyHostRouting: false

Failing config

## baseline
kubeProxyReplacement: true
routingMode: native
ipv4NativeRoutingCIDR: '10.9.0.0/16'
autoDirectNodeRoutes: true
ingressController:
  # -- Enable cilium ingress controller
  enabled: true
  default: true
  loadbalancerMode: dedicated
gatewayAPI:
  enabled: true
operator:
  replicas: 1
l2announcements:
  enabled: true
ipam:
  mode: 'cluster-pool'
  operator:
    clusterPoolIPv4PodCIDRList:
      - '10.9.0.0/16'
    clusterPoolIPv4MaskSize: 24

## Under test  
bpf:
  masquerade: true
  legacyHostRouting: true

See attached sysdump for first action failure using 1.15.5

cilium-sysdump-20240515-101911.zip

@squeed squeed added the needs/triage This issue requires triaging to establish severity and next steps. label May 16, 2024
@sayboras
Copy link
Member

As discussed in Cilium slack, the mentioned changes were merged after 1.15.5, so you might need to check out the main branch.

@jorhett
Copy link

jorhett commented May 23, 2024

@sayboras I used the images you suggested and didn't see a fix. Can you tell me what images I should be using if these weren't the correct ones? https://cilium.slack.com/archives/C1MATJ5U5/p1715887169176509?thread_ts=1715382762.010149&cid=C1MATJ5U5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/loadbalancing Impacts load-balancing and Kubernetes service implementations kind/bug This is a bug in the Cilium logic. needs/triage This issue requires triaging to establish severity and next steps. sig/agent Cilium agent related.
Projects
None yet
Development

No branches or pull requests

4 participants