Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent restart breaks some connectivity tests #32611

Open
2 of 3 tasks
jshr-w opened this issue May 17, 2024 · 1 comment
Open
2 of 3 tasks

Agent restart breaks some connectivity tests #32611

jshr-w opened this issue May 17, 2024 · 1 comment
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. need-more-info More information is required to further debug or fix the issue. needs/triage This issue requires triaging to establish severity and next steps.

Comments

@jshr-w
Copy link
Contributor

jshr-w commented May 17, 2024

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Repro Steps: I set up a Cilium cluster (on Kind) and initialize the connectivity disruption test. Then, I restart the cilium agent and run the cilium connectivity test. The following tests fail:

  • client-egress-l7/pod-to-world/http-to-one.one.one.one
  • client-egress-l7-named-port/pod-to-world/http-to-one.one.one.one
  • pod-to-ingress-service/pod-to-ingress-service

Other observations:

  • I have never seen these tests pass despite numerous runs, so I'm not sure if they are considered flakes.
  • The same testing setup works if I only restart the operator, so I assumed they should be passing (?)
  • Not sure if there's a configuration issue on my side. I was mainly using the default Helm values from the Github action.
  • I looked into the client-egress-l7 test a bit and it seemed like the CiliumNetworkPolicy's http rule is breaking external connectivity after a restart (if I remove those lines, it seems to resolve the issue). The error for the HTTP request is a 403 due to the CNP.

Context: I was investigating migration from CEP -> CES on a cluster and testing whether connectivity breaks. However, this connectivity break seems to be independent of enabling CES. The steps to initialize a cluster that I was working with were basically the ones here (https://github.com/cilium/cilium/actions/runs/9120715263/job/25078600023).

Cilium Version

1.16.0

Kernel Version

Linux 6.5.0-1017-azure #17~22.04.1-Ubuntu SMP x86_64 GNU/Linux

Kubernetes Version

v1.29.2

Regression

No response

Sysdump

cilium-sysdump-conn.zip

Relevant log output

No response

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct
@jshr-w jshr-w added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels May 17, 2024
@lmb
Copy link
Contributor

lmb commented May 21, 2024

So you're splitting set up and execution of the tests somehow? What commands are you running?

Might just be that you're doing things in a way that the test suite doesn't expect.

@lmb lmb added the need-more-info More information is required to further debug or fix the issue. label May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. need-more-info More information is required to further debug or fix the issue. needs/triage This issue requires triaging to establish severity and next steps.
Projects
None yet
Development

No branches or pull requests

2 participants