Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscellaneous improvements to the clustermesh upgrade/downgrade test #31958

Merged
merged 5 commits into from
Apr 17, 2024

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Apr 15, 2024

  • Slow down agent rollout to better highlight possible issues;
  • Hard-code IPAM mode and operator replicas to prevent changes due to defaulting;
  • Fix incorrectly named test to ensure that it is actually executed;
  • Do not wait for hubble relay images, as not deployed;
  • Enable hubble and configure medium monitor aggregation, to simplify troubleshooting issues.

The goal being to slow down the rollout process, to better highlight
possible connection disruption occurring in the meanwhile. At the same
time, this also reduces the overall CPU load caused by datapath
recompilation, which is a possible additional cause for connection
disruption flakiness.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
The default IPAM mode is cluster-pool, which gets automatically
overwritten by the Cilium CLI to kubernetes when running on kind.
However, the default helm value gets restored upon upgrade due to
--reset-values, causing confusion and possible issues. Hence, let's
explicitly configure it to kubernetes, to prevent changes.

Similarly, let's configure a single replica for the operator.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
So that it gets actually executed.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 added area/CI Continuous Integration testing issue or flake area/clustermesh Relates to multi-cluster routing functionality in Cilium. release-note/ci This PR makes changes to the CI. needs-backport/1.13 needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Apr 15, 2024
@giorio94 giorio94 requested review from a team as code owners April 15, 2024 07:38
@giorio94
Copy link
Member Author

/test

Hubble relay is not deployed in this workflow, hence it doesn't make
sense to wait for the image availability.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
As it simplifies troubleshooting possible connection disruptions.
However, let's configure monitor aggregation to medium (i.e., the
maximum, and default value) to avoid the performance penalty due
to the relatively high traffic load.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

Sorry @viktor-kurchenko, I've pushed a couple of additional minor changes. Could you please take another look?

Copy link
Contributor

@viktor-kurchenko viktor-kurchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 17, 2024
@lmb lmb added this pull request to the merge queue Apr 17, 2024
Merged via the queue into main with commit 0c211e1 Apr 17, 2024
70 checks passed
@lmb lmb deleted the pr/giorio94/main/gha-clustermesh-upgrade-misc-improvements branch April 17, 2024 13:16
@giorio94 giorio94 mentioned this pull request Apr 18, 2024
6 tasks
@giorio94 giorio94 added the backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. label Apr 18, 2024
@jschwinger233 jschwinger233 mentioned this pull request Apr 22, 2024
9 tasks
@jschwinger233 jschwinger233 added backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. and removed needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Apr 22, 2024
@jschwinger233 jschwinger233 mentioned this pull request Apr 22, 2024
5 tasks
@jschwinger233 jschwinger233 added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Apr 22, 2024
@github-actions github-actions bot added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. labels Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake area/clustermesh Relates to multi-cluster routing functionality in Cilium. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
No open projects
Status: Released
Status: Released
Status: Released
Development

Successfully merging this pull request may close these issues.

None yet

5 participants