Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy DaemonSet fails probes on IPv6 single-stack cluster #30968

Closed
2 of 3 tasks
iandrewt opened this issue Feb 26, 2024 · 3 comments · Fixed by #30970
Closed
2 of 3 tasks

Envoy DaemonSet fails probes on IPv6 single-stack cluster #30968

iandrewt opened this issue Feb 26, 2024 · 3 comments · Fixed by #30970
Assignees
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.

Comments

@iandrewt
Copy link
Contributor

iandrewt commented Feb 26, 2024

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

On a true IPv6 only cluster (nodes and pods only have IPv6 addresses, except for the standard 127.0.0.1 loopback, no IPv4 routes), the readiness/liveness/startup probes for the Envoy DaemonSet fail with connection refused as the kubelet does not fall back to IPv6 after failing to probe over IPv4.

Changing the readiness probes to specifically query ::1 works, as I'm sure the reverse would if encountering a similar issue on an IPv4 only cluster.

I'm not entirely sure how k8s does DNS lookups for probes (I presume it's just part of the net library) but given there's no need for the readiness probe to resolve localhost to do its probe, this should probably be set to the localhost IP directly. Happy to raise a PR for that.

Cilium Version

v1.15.1

Kernel Version

Linux i-0f0d5a7c8c89a74cb 5.15.0-97-generic #107-Ubuntu SMP Wed Feb 7 13:26:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Ubuntu 22.04.3 LTS

Kubernetes Version

v1.29.2

Regression

No response

Sysdump

No response

Relevant log output

Events:
  Type     Reason     Age                     From     Message
  ----     ------     ----                    ----     -------
  Warning  BackOff    7m50s (x58 over 34m)    kubelet  Back-off restarting failed container cilium-envoy in pod cilium-envoy-jfxr5_kube-system(42b62c37-9f54-442b-84d4-4d70e03b8e98)
  Warning  Unhealthy  2m49s (x1306 over 62m)  kubelet  Startup probe failed: Get "http://localhost:9878/healthz": dial tcp 127.0.0.1:9878: connect: connection refused

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct
@iandrewt iandrewt added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Feb 26, 2024
iandrewt added a commit to iandrewt/cilium that referenced this issue Feb 26, 2024
On IPv6-only clusters, querying localhost for the health check could attempt to check 127.0.0.1, presumable depending on host DNS configuration.

As the health check does not listen on IPv4 when .Values.ipv4.enabled is false, this health check could fail.

This patch uses the same logic as the bootstrap-config.json file to ensure a valid IP is always used for the health check.

Fixes: cilium#30968
Fixes: 859d2a9 ("helm: use /ready from Envoy admin iface for healthprobes on daemonset")

Signed-off-by: Andrew Titmuss <iandrewt@icloud.com>
@ldelossa ldelossa added the sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. label Feb 26, 2024
@ldelossa
Copy link
Contributor

@iandrewt thanks for the bug report. Are you able to provide a sysdump?

@ldelossa ldelossa removed the needs/triage This issue requires triaging to establish severity and next steps. label Feb 26, 2024
@iandrewt
Copy link
Contributor Author

@sayboras sayboras self-assigned this Feb 27, 2024
sayboras added a commit to sayboras/cilium that referenced this issue Feb 27, 2024
This commit is to make sure that host value for startup/liveness and
readiness probes will be:

- 127.0.0.1 for IPv4 only
- ::1 for IPv6 only, or dual stack

Fixes: cilium#30968
Signed-off-by: Tam Mach <tam.mach@cilium.io>
@sayboras
Copy link
Member

Thanks for reporting this issue, the host value in all probes is hard coded as localhost in the helm chart. Can you help to test out the below change?

#30994

github-merge-queue bot pushed a commit that referenced this issue Feb 27, 2024
On IPv6-only clusters, querying localhost for the health check could attempt to check 127.0.0.1, presumable depending on host DNS configuration.

As the health check does not listen on IPv4 when .Values.ipv4.enabled is false, this health check could fail.

This patch uses the same logic as the bootstrap-config.json file to ensure a valid IP is always used for the health check.

Fixes: #30968
Fixes: 859d2a9 ("helm: use /ready from Envoy admin iface for healthprobes on daemonset")

Signed-off-by: Andrew Titmuss <iandrewt@icloud.com>
YutaroHayakawa pushed a commit that referenced this issue Feb 27, 2024
[ upstream commit 29a7918 ]

On IPv6-only clusters, querying localhost for the health check could attempt to check 127.0.0.1, presumable depending on host DNS configuration.

As the health check does not listen on IPv4 when .Values.ipv4.enabled is false, this health check could fail.

This patch uses the same logic as the bootstrap-config.json file to ensure a valid IP is always used for the health check.

Fixes: #30968
Fixes: 859d2a9 ("helm: use /ready from Envoy admin iface for healthprobes on daemonset")

Signed-off-by: Andrew Titmuss <iandrewt@icloud.com>
YutaroHayakawa pushed a commit that referenced this issue Feb 27, 2024
[ upstream commit 29a7918 ]

On IPv6-only clusters, querying localhost for the health check could attempt to check 127.0.0.1, presumable depending on host DNS configuration.

As the health check does not listen on IPv4 when .Values.ipv4.enabled is false, this health check could fail.

This patch uses the same logic as the bootstrap-config.json file to ensure a valid IP is always used for the health check.

Fixes: #30968
Fixes: 859d2a9 ("helm: use /ready from Envoy admin iface for healthprobes on daemonset")

Signed-off-by: Andrew Titmuss <iandrewt@icloud.com>
YutaroHayakawa pushed a commit that referenced this issue Feb 27, 2024
[ upstream commit 29a7918 ]

On IPv6-only clusters, querying localhost for the health check could attempt to check 127.0.0.1, presumable depending on host DNS configuration.

As the health check does not listen on IPv4 when .Values.ipv4.enabled is false, this health check could fail.

This patch uses the same logic as the bootstrap-config.json file to ensure a valid IP is always used for the health check.

Fixes: #30968
Fixes: 859d2a9 ("helm: use /ready from Envoy admin iface for healthprobes on daemonset")

Signed-off-by: Andrew Titmuss <iandrewt@icloud.com>
YutaroHayakawa pushed a commit that referenced this issue Feb 27, 2024
[ upstream commit 29a7918 ]

On IPv6-only clusters, querying localhost for the health check could attempt to check 127.0.0.1, presumable depending on host DNS configuration.

As the health check does not listen on IPv4 when .Values.ipv4.enabled is false, this health check could fail.

This patch uses the same logic as the bootstrap-config.json file to ensure a valid IP is always used for the health check.

Fixes: #30968
Fixes: 859d2a9 ("helm: use /ready from Envoy admin iface for healthprobes on daemonset")

Signed-off-by: Andrew Titmuss <iandrewt@icloud.com>
YutaroHayakawa pushed a commit that referenced this issue Feb 28, 2024
[ upstream commit 29a7918 ]

On IPv6-only clusters, querying localhost for the health check could attempt to check 127.0.0.1, presumable depending on host DNS configuration.

As the health check does not listen on IPv4 when .Values.ipv4.enabled is false, this health check could fail.

This patch uses the same logic as the bootstrap-config.json file to ensure a valid IP is always used for the health check.

Fixes: #30968
Fixes: 859d2a9 ("helm: use /ready from Envoy admin iface for healthprobes on daemonset")

Signed-off-by: Andrew Titmuss <iandrewt@icloud.com>
YutaroHayakawa pushed a commit that referenced this issue Feb 29, 2024
[ upstream commit 29a7918 ]

On IPv6-only clusters, querying localhost for the health check could attempt to check 127.0.0.1, presumable depending on host DNS configuration.

As the health check does not listen on IPv4 when .Values.ipv4.enabled is false, this health check could fail.

This patch uses the same logic as the bootstrap-config.json file to ensure a valid IP is always used for the health check.

Fixes: #30968
Fixes: 859d2a9 ("helm: use /ready from Envoy admin iface for healthprobes on daemonset")

Signed-off-by: Andrew Titmuss <iandrewt@icloud.com>
YutaroHayakawa pushed a commit that referenced this issue Mar 1, 2024
[ upstream commit 29a7918 ]

On IPv6-only clusters, querying localhost for the health check could attempt to check 127.0.0.1, presumable depending on host DNS configuration.

As the health check does not listen on IPv4 when .Values.ipv4.enabled is false, this health check could fail.

This patch uses the same logic as the bootstrap-config.json file to ensure a valid IP is always used for the health check.

Fixes: #30968
Fixes: 859d2a9 ("helm: use /ready from Envoy admin iface for healthprobes on daemonset")

Signed-off-by: Andrew Titmuss <iandrewt@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
3 participants