Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connectivity test fails with Envoy as DaemonSet #28057

Closed
2 tasks done
Smana opened this issue Sep 10, 2023 · 7 comments
Closed
2 tasks done

Connectivity test fails with Envoy as DaemonSet #28057

Smana opened this issue Sep 10, 2023 · 7 comments
Labels
area/servicemesh GH issues or PRs regarding servicemesh kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. need-more-info More information is required to further debug or fix the issue. needs/triage This issue requires triaging to establish severity and next steps. sig/agent Cilium agent related. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

Comments

@Smana
Copy link
Contributor

Smana commented Sep 10, 2023

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

When deploying Cilium on an EKS cluster AND configuring it with Envoy as DaemonSet the cilium connectivity test command fails.

Here is an extract of the output (let me know if you need the whole output):

  ℹ️  📜 Applying CiliumNetworkPolicy 'client-egress-to-fqdns-one.one.one.one' to namespace 'cilium-test'..
  [-] Scenario [to-fqdns/pod-to-world]
  [.] Action [to-fqdns/pod-to-world/http-to-one.one.one.one-0: cilium-test/client-6b4b857d98-b2mts (10.0.12.196) -> one.one.one.one-http (one.one.one.one:80)]
  ❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --output /dev/null --connect-timeout 2 --max-time 10 --retry 3 --retry-all-errors --retry-delay 3 http://one.one.one.one:80" failed: command terminated with exit code 22
  ℹ️  curl output:
  
  
  📄 No flows recorded for peer cilium-test/client-6b4b857d98-b2mts during action http-to-one.one.one.one-0
  📄 No flows recorded for peer one.one.one.one-http during action http-to-one.one.one.one-0
  [.] Action [to-fqdns/pod-to-world/https-to-one.one.one.one-0: cilium-test/client-6b4b857d98-b2mts (10.0.12.196) -> one.one.one.one-https (one.one.one.one:443)]
  [.] Action [to-fqdns/pod-to-world/https-to-one.one.one.one-index-0: cilium-test/client-6b4b857d98-b2mts (10.0.12.196) -> one.one.one.one-https-index (one.one.one.one:443)]
  [.] Action [to-fqdns/pod-to-world/http-to-one.one.one.one-1: cilium-test/client2-646b88fb9b-xsb7z (10.0.12.156) -> one.one.one.one-http (one.one.one.one:80)]
  ❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --output /dev/null --connect-timeout 2 --max-time 10 --retry 3 --retry-all-errors --retry-delay 3 http://one.one.one.one:80" failed: command terminated with exit code 22
  ℹ️  curl output:
  
  
  📄 No flows recorded for peer cilium-test/client2-646b88fb9b-xsb7z during action http-to-one.one.one.one-1
  📄 No flows recorded for peer one.one.one.one-http during action http-to-one.one.one.one-1
  [.] Action [to-fqdns/pod-to-world/https-to-one.one.one.one-1: cilium-test/client2-646b88fb9b-xsb7z (10.0.12.156) -> one.one.one.one-https (one.one.one.one:443)]
  [.] Action [to-fqdns/pod-to-world/https-to-one.one.one.one-index-1: cilium-test/client2-646b88fb9b-xsb7z (10.0.12.156) -> one.one.one.one-https-index (one.one.one.one:443)]
  [-] Scenario [to-fqdns/pod-to-world-2]
  [.] Action [to-fqdns/pod-to-world-2/https-cilium-io-0: cilium-test/client-6b4b857d98-b2mts (10.0.12.196) -> cilium-io-https (cilium.io:443)]
  [.] Action [to-fqdns/pod-to-world-2/https-cilium-io-1: cilium-test/client2-646b88fb9b-xsb7z (10.0.12.156) -> cilium-io-https (cilium.io:443)]
  ℹ️  📜 Deleting CiliumNetworkPolicy 'client-egress-to-fqdns-one.one.one.one' from namespace 'cilium-test'..

📋 Test Report
❌ 4/42 tests failed (4/284 actions), 13 tests skipped, 0 scenarios skipped:
Test [client-ingress]:
Test [client-egress-l7]:
  ❌ client-egress-l7/pod-to-world/http-to-one.one.one.one-1: cilium-test/client2-646b88fb9b-xsb7z (10.0.12.156) -> one.one.one.one-http (one.one.one.one:80)
Test [client-egress-l7-named-port]:
  ❌ client-egress-l7-named-port/pod-to-world/http-to-one.one.one.one-1: cilium-test/client2-646b88fb9b-xsb7z (10.0.12.156) -> one.one.one.one-http (one.one.one.one:80)
Test [to-fqdns]:
  ❌ to-fqdns/pod-to-world/http-to-one.one.one.one-0: cilium-test/client-6b4b857d98-b2mts (10.0.12.196) -> one.one.one.one-http (one.one.one.one:80)
  ❌ to-fqdns/pod-to-world/http-to-one.one.one.one-1: cilium-test/client2-646b88fb9b-xsb7z (10.0.12.156) -> one.one.one.one-http (one.one.one.one:80)
connectivity test failed: 4 tests failed

Please note that the same test works fine when disabling the following values of the Helm chart:

envoy:
  enabled: true

The results when disabling Envoy as DaemonSet:

✅ All 42 tests (284 actions) successful, 13 tests skipped, 0 scenarios skipped.

Cilium Version

cilium version
cilium-cli: v0.15.7 compiled with go1.21.0 on linux/amd64
cilium image (default): v1.14.1
cilium image (stable): v1.14.1
cilium image (running): 1.14.1

Kernel Version

Latest bottlerocket AMI

Kubernetes Version

v1.27.4

Sysdump

cilium-sysdump-20230910-102356.zip

Relevant log output

No response

Anything else?

The whole code used is here

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Smana Smana added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Sep 10, 2023
@julianwiedmann julianwiedmann added sig/agent Cilium agent related. area/servicemesh GH issues or PRs regarding servicemesh labels Sep 12, 2023
@aanm
Copy link
Member

aanm commented Oct 3, 2023

@Smana can you try the test again? Since the connectivity was to 1.1.1.1 it might have failed because it was making connections to the outside world.

@aanm aanm added the need-more-info More information is required to further debug or fix the issue. label Oct 3, 2023
@Smana
Copy link
Contributor Author

Smana commented Oct 3, 2023

@aanm Of course, do I have to change anything? upgrade the cilium CLI version?

@github-actions github-actions bot added info-completed The GH issue has received a reply from the author and removed need-more-info More information is required to further debug or fix the issue. labels Oct 3, 2023
@aanm
Copy link
Member

aanm commented Oct 3, 2023

@Smana no, a simple re-run should be enough.

@aanm aanm added need-more-info More information is required to further debug or fix the issue. and removed info-completed The GH issue has received a reply from the author labels Oct 3, 2023
@SmaineTF1
Copy link

I just run the command again and I still get the same errors. I don't understand how it would work without changing anything?
I'm gonna double check again without running envoy as daemonset

@aanm
Copy link
Member

aanm commented Oct 9, 2023

@SmaineTF1 Just because we have had CI flakes for connectivity tests done to the outside world.

Copy link

github-actions bot commented Dec 9, 2023

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Dec 9, 2023
Copy link

This issue has not seen any activity since it was marked stale.
Closing.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/servicemesh GH issues or PRs regarding servicemesh kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. need-more-info More information is required to further debug or fix the issue. needs/triage This issue requires triaging to establish severity and next steps. sig/agent Cilium agent related. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Projects
None yet
Development

No branches or pull requests

4 participants