Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Set hubble.relay.retryTimeout=5s #32066

Merged
merged 1 commit into from
Apr 22, 2024
Merged

Conversation

chancez
Copy link
Contributor

@chancez chancez commented Apr 18, 2024

I noticed CI seemed to flake on hubble-relay being ready. Based on the logs it was due to it being unable to connect to its peers.

Based on the sysdump the endpoints did eventually come up, so I think it just needs to retry more often, and the default is 30s so lower it to 5s in CI.

An example of such a CI run: https://github.com/cilium/cilium/actions/runs/8729875690/job/23952676116?pr=31973.

I've added some temporary commits to test the changes, which I'll remove after I verify the correct options are being used.

I noticed CI seemed to flake on hubble-relay being ready.
Based on the logs it was due to it being unable to connect to it's
peers.

Based on the sysdump the endpoints did eventually come up, so I think it
just needs to retry more often, and the default is 30s so lower it to 5s
in CI.

Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com>
@chancez chancez added the release-note/ci This PR makes changes to the CI. label Apr 18, 2024
@chancez chancez self-assigned this Apr 18, 2024
@chancez
Copy link
Contributor Author

chancez commented Apr 18, 2024

Run cilium install --chart-directory=install/kubernetes/cilium --helm-set=debug.enabled=true --helm-set=bpf.monitorAggregation=none --helm-set=dnsProxy.proxyResponseMaxDelay=250ms --helm-set=hubble.relay.retryTimeout=5s --helm-set=image.repository=quay.io/cilium/cilium-ci --helm-set=image.useDigest=false --helm-set=image.tag=a49b5e5d68d4e233ba37117eb7b3e0f0ac1a0a5a --helm-set=operator.image.repository=quay.io/cilium/operator --helm-set=operator.image.suffix=-ci --helm-set=operator.image.tag=a49b5e5d68d4e233ba37117eb7b3e0f0ac1a0a5a --helm-set=operator.image.useDigest=false --helm-set=clustermesh.apiserver.image.repository=quay.io/cilium/clustermesh-apiserver-ci --helm-set=clustermesh.apiserver.image.tag=a49b5e5d68d4e233ba37117eb7b3e0f0ac1a0a5a --helm-set=clustermesh.apiserver.image.useDigest=false --helm-set=hubble.relay.image.repository=quay.io/cilium/hubble-relay-ci --helm-set=hubble.relay.image.tag=a49b5e5d68d4e233ba37117eb7b3e0f0ac1a0a5a --helm-set=hubble.relay.image.useDigest=false --helm-set=hubble.relay.enabled=true --helm-set=cni.chainingMode=portmap --helm-set-string=kubeProxyReplacement=true --helm-set=loadBalancer.l7.backend=envoy --helm-set=tls.secretsBackend=k8s --helm-set=envoy.enabled=false --wait=false

I can see it specified the right option but the test succeeded and there's no sysdumps; so I can't actually confirm it was added to the configmap, but it should be.

@chancez
Copy link
Contributor Author

chancez commented Apr 18, 2024

Removing the tmp commit so this can be marked as ready for review.

@chancez chancez marked this pull request as ready for review April 18, 2024 21:42
@chancez chancez requested review from a team as code owners April 18, 2024 21:42
@chancez chancez requested a review from brlbil April 18, 2024 21:42
@chancez
Copy link
Contributor Author

chancez commented Apr 18, 2024

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 22, 2024
@julianwiedmann julianwiedmann added this pull request to the merge queue Apr 22, 2024
@julianwiedmann
Copy link
Member

julianwiedmann commented Apr 22, 2024

@chancez should we also backport to stable, to prevent similar CI flakes there?

Merged via the queue into main with commit 2428b08 Apr 22, 2024
263 checks passed
@julianwiedmann julianwiedmann deleted the pr/chancez/relay_retries branch April 22, 2024 06:28
@chancez
Copy link
Contributor Author

chancez commented Apr 22, 2024

@julianwiedmann seems reasonable and easy? Do we just apply the necessary labels for that?

@julianwiedmann julianwiedmann added the needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch label Apr 23, 2024
@julianwiedmann
Copy link
Member

@julianwiedmann seems reasonable and easy? Do we just apply the necessary labels for that?

yep. Looks like the helm-default action doesn't exist in older releases, so leaving it up to you whether we want manual backports for those as well :).

@gandro gandro mentioned this pull request Apr 29, 2024
18 tasks
@gandro gandro added backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. and removed needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Apr 29, 2024
@github-actions github-actions bot added backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. and removed backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. labels May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants