Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod to ingress with backend hosted on the same node not working in certain configurations #31653

Open
2 tasks done
giorio94 opened this issue Mar 28, 2024 · 4 comments
Open
2 tasks done
Assignees
Labels
area/servicemesh GH issues or PRs regarding servicemesh feature/k8s-ingress kind/bug This is a bug in the Cilium logic. sig/agent Cilium agent related.

Comments

@giorio94
Copy link
Member

giorio94 commented Mar 28, 2024

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Running an extended version of the connectivity tests to validate pod to ingress service with both the backend pod hosted on the same or a different node highlighted connectivity issues when the backend is hosted on the same node in certain configurations. In particular, this issue occurred on the Conformance Cluster Mesh E2E tests, although not clustermesh related (as the backend is hosted in the local cluster as well), and more specifically in the following configurations:

          - name: '2'
            tunnel: 'disabled'
            ipfamily: 'dual'
            encryption: 'wireguard'
            kube-proxy: 'none' (KPR + BPF masquerade)

          - name: '4'
            tunnel: 'disabled'
            ipfamily: 'ipv6'
            encryption: 'disabled'
            kube-proxy: 'none' (KPR + BPF masquerade)
❌ 3/63 tests failed (5/654 actions), 12 tests skipped, 0 scenarios skipped:
connectivity test failed: 3 tests failed
Test [pod-to-ingress-service]:
  ❌ pod-to-ingress-service/pod-to-ingress-service/curl-0: cilium-test/client-69748f45d8-vbwdx (10.242.1.231) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)
  ❌ pod-to-ingress-service/pod-to-ingress-service/curl-2: cilium-test/client2-ccd7b8bdf-z8fmw (10.242.1.100) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)
Test [pod-to-ingress-service-allow-ingress-identity]:
  ❌ pod-to-ingress-service-allow-ingress-identity/pod-to-ingress-service/curl-1: cilium-test/client-69748f45d8-vbwdx (10.242.1.231) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)
  ❌ pod-to-ingress-service-allow-ingress-identity/pod-to-ingress-service/curl-2: cilium-test/client2-ccd7b8bdf-z8fmw (10.242.1.100) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)
Test [outside-to-ingress-service]:
  ❌ outside-to-ingress-service/outside-to-ingress-service/curl-ingress-service-0: cilium-test/host-netns-non-cilium-nz9t8 (172.18.0.3) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)

❌ 3/57 tests failed (5/342 actions), 18 tests skipped, 8 scenarios skipped:
Test [pod-to-ingress-service]:
  ❌ pod-to-ingress-service/pod-to-ingress-service/curl-1: cilium-test/client-69748f45d8-r8rpj (fd00:10:242:2::20b0) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)
  ❌ pod-to-ingress-service/pod-to-ingress-service/curl-2: cilium-test/client2-ccd7b8bdf-mjxhv (fd00:10:242:2::d899) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)
Test [pod-to-ingress-service-allow-ingress-identity]:
  ❌ pod-to-ingress-service-allow-ingress-identity/pod-to-ingress-service/curl-0: cilium-test/client-69748f45d8-r8rpj (fd00:10:242:2::20b0) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)
  ❌ pod-to-ingress-service-allow-ingress-identity/pod-to-ingress-service/curl-2: cilium-test/client2-ccd7b8bdf-mjxhv (fd00:10:242:2::d899) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)
Test [outside-to-ingress-service]:
  ❌ outside-to-ingress-service/outside-to-ingress-service/curl-ingress-service-0: cilium-test/host-netns-non-cilium-x5zz8 (fc00:f853:ccd:e793::4) -> cilium-test/cilium-ingress-same-node (cilium-ingress-same-node.cilium-test:80)

Differently, it didn't occur when KPR was disabled (the ingress controller is not enabled in that case), or Cilium was configured in tunnel mode. I'm not sure why it didn't happen in the Conformance E2E tests.

Link: https://github.com/cilium/cilium/actions/runs/8456333159

Cilium Version

Tip of main

Sysdump

cilium-sysdump-20240327-180128.zip
cilium-sysdump-20240327-180201.zip

Code of Conduct

  • I agree to follow this project's Code of Conduct
@giorio94 giorio94 added kind/bug This is a bug in the Cilium logic. needs/triage This issue requires triaging to establish severity and next steps. area/servicemesh GH issues or PRs regarding servicemesh feature/k8s-ingress labels Mar 28, 2024
@sayboras
Copy link
Member

Ingress controller is having one requirement of kpr enabled (or at least nodeport enabled). The above configurations didn't seem to satisfy such requirement 🤔

@giorio94
Copy link
Member Author

Ingress controller is having one requirement of kpr enabled (or at least nodeport enabled). The above configurations didn't seem to satisfy such requirement 🤔

kube-proxy: none corresponds to KPR enabled. Sorry for not being clear there, that's the setting for kind which disables the kube-proxy daemonset, and then Cilium is configured with KPR enabled.

@lmb lmb added the sig/agent Cilium agent related. label Apr 8, 2024
sayboras added a commit that referenced this issue Apr 11, 2024
Relates: #31653

Signed-off-by: Tam Mach <tam.mach@cilium.io>
sayboras added a commit that referenced this issue Apr 11, 2024
After #2206, BPF host routing is enabled by default, however, this drops
the reply packet to proxy. This commit is to enable legacy host routing
as a workaround.

Relates: #31653, #22006

Signed-off-by: Tam Mach <tam.mach@cilium.io>
@sayboras sayboras self-assigned this Apr 11, 2024
sayboras added a commit that referenced this issue Apr 11, 2024
After #2206, BPF host routing is enabled by default, however, this drops
the reply packet to proxy. This commit is to enable legacy host routing
as a workaround.

Relates: #31653, #22006

Signed-off-by: Tam Mach <tam.mach@cilium.io>
sayboras added a commit that referenced this issue Apr 16, 2024
After #2206, BPF host routing is enabled by default, however, this drops
the reply packet to proxy. This commit is to enable legacy host routing
as a workaround.

Relates: #31653, #22006

Signed-off-by: Tam Mach <tam.mach@cilium.io>
sayboras added a commit that referenced this issue Apr 23, 2024
After #2206, BPF host routing is enabled by default, however, this drops
the reply packet to proxy. This commit is to enable legacy host routing
as a workaround.

Relates: #31653, #22006

Signed-off-by: Tam Mach <tam.mach@cilium.io>
sayboras added a commit that referenced this issue Apr 29, 2024
After #2206, BPF host routing is enabled by default, however, this drops
the reply packet to proxy. This commit is to enable legacy host routing
as a workaround.

Relates: #31653, #22006

Signed-off-by: Tam Mach <tam.mach@cilium.io>
sayboras added a commit that referenced this issue May 5, 2024
After #2206, BPF host routing is enabled by default, however, this drops
the reply packet to proxy. This commit is to enable legacy host routing
as a workaround.

Relates: #31653, #22006

Signed-off-by: Tam Mach <tam.mach@cilium.io>
sayboras added a commit that referenced this issue May 6, 2024
After #2206, BPF host routing is enabled by default, however, this drops
the reply packet to proxy. This commit is to enable legacy host routing
as a workaround.

Relates: #31653, #22006

Signed-off-by: Tam Mach <tam.mach@cilium.io>
sayboras added a commit that referenced this issue May 6, 2024
After #2206, BPF host routing is enabled by default, however, this drops
the reply packet to proxy. This commit is to enable legacy host routing
as a workaround.

Relates: #31653, #22006

Signed-off-by: Tam Mach <tam.mach@cilium.io>
sayboras added a commit that referenced this issue May 6, 2024
After #2206, BPF host routing is enabled by default, however, this drops
the reply packet to proxy. This commit is to enable legacy host routing
as a workaround.

Relates: #31653, #22006

Signed-off-by: Tam Mach <tam.mach@cilium.io>
sayboras added a commit that referenced this issue May 6, 2024
After #2206, BPF host routing is enabled by default, however, this drops
the reply packet to proxy. This commit is to enable legacy host routing
as a workaround.

Relates: #31653, #22006

Signed-off-by: Tam Mach <tam.mach@cilium.io>
sayboras added a commit that referenced this issue May 8, 2024
After #22006, BPF host routing is enabled by default, this commit is to
enable legacy host routing as a workaround, as the response packet might
be dropped. Further investigation is tracked under #31653.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
sayboras added a commit that referenced this issue May 8, 2024
After #22006, BPF host routing is enabled by default, this commit is to
enable legacy host routing as a workaround, as the response packet might
be dropped. Further investigation is tracked under #31653.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
github-merge-queue bot pushed a commit that referenced this issue May 10, 2024
After #22006, BPF host routing is enabled by default, this commit is to
enable legacy host routing as a workaround, as the response packet might
be dropped. Further investigation is tracked under #31653.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
sayboras added a commit that referenced this issue May 11, 2024
[ upstream commit de9c87b ]

After #22006, BPF host routing is enabled by default, this commit is to
enable legacy host routing as a workaround, as the response packet might
be dropped. Further investigation is tracked under #31653.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
sayboras added a commit that referenced this issue May 11, 2024
[ upstream commit de9c87b ]

After #22006, BPF host routing is enabled by default, this commit is to
enable legacy host routing as a workaround, as the response packet might
be dropped. Further investigation is tracked under #31653.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
sayboras added a commit that referenced this issue May 11, 2024
[ upstream commit de9c87b ]

After #22006, BPF host routing is enabled by default, this commit is to
enable legacy host routing as a workaround, as the response packet might
be dropped. Further investigation is tracked under #31653.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
sayboras added a commit that referenced this issue May 11, 2024
[ upstream commit de9c87b ]

After #22006, BPF host routing is enabled by default, this commit is to
enable legacy host routing as a workaround, as the response packet might
be dropped. Further investigation is tracked under #31653.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
ldelossa pushed a commit that referenced this issue May 12, 2024
[ upstream commit de9c87b ]

After #22006, BPF host routing is enabled by default, this commit is to
enable legacy host routing as a workaround, as the response packet might
be dropped. Further investigation is tracked under #31653.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
@sayboras
Copy link
Member

Copied the comment here so that it will not get lost.

In short, with the re-introduce of from proxy route 2005, there is still issue on the return traffic if bpf.masquerade is enabled, the workaround is to enable bpf.hostLegacyRouting as well.

#32367 (comment)

@jspaleta
Copy link
Contributor

Just fyi, I'm seeing similar reproducible behavior in a Kind cluster environment but bpf.hostLegacyRouting doesn't appear to be a viable workaround for me...so far. I'll continue testing in #32525

sayboras added a commit that referenced this issue May 15, 2024
[ upstream commit de9c87b ]

After #22006, BPF host routing is enabled by default, this commit is to
enable legacy host routing as a workaround, as the response packet might
be dropped. Further investigation is tracked under #31653.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
sayboras added a commit that referenced this issue Jun 10, 2024
[ upstream commit de9c87b ]

After #22006, BPF host routing is enabled by default, this commit is to
enable legacy host routing as a workaround, as the response packet might
be dropped. Further investigation is tracked under #31653.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
(cherry picked from commit aa44b70)
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
@ti-mo ti-mo removed the needs/triage This issue requires triaging to establish severity and next steps. label Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/servicemesh GH issues or PRs regarding servicemesh feature/k8s-ingress kind/bug This is a bug in the Cilium logic. sig/agent Cilium agent related.
Projects
None yet
Development

No branches or pull requests

5 participants