Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci/multicluster: Test WireGuard in clustermesh #17453

Merged
merged 2 commits into from Oct 29, 2021

Conversation

gandro
Copy link
Member

@gandro gandro commented Sep 22, 2021

This adds an additional WireGuard test to the Multicluster / Cluster mesh (ci-multicluster) CI 3.0 workflow.

The new test enables WireGuard after the regular clustermesh test suite, restarts the Cilium pods in both clusters and then runs the regular intra-cluster connectivity check (with any tests disabled that rely on the L7 proxy).

  • L7 tests are disabled because the L7 proxy is not supported by WireGuard yet.
  • We also fall back on WireGuard user-mode, due to the fact that GKE COS VM images do not ship with WireGuard kernel support. The mode doesn't matter much for this test suite, as this is about ensuring that the control plane (i.e. cilium-agent and clustermesh-apiserver) propagates the WireGuard public keys correctly.

Now that we a regression test for WireGuard+clustermesh, the documentation is also updated to mention that WireGuard may be used together with clustermesh.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 22, 2021
@gandro gandro added the release-note/ci This PR makes changes to the CI. label Sep 22, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 22, 2021
@gandro gandro force-pushed the pr/gandro/ci-wireguard-clustermesh branch 8 times, most recently from 25c6e9d to 7a66346 Compare September 23, 2021 13:28
@gandro gandro force-pushed the pr/gandro/ci-wireguard-clustermesh branch 2 times, most recently from 11950c8 to 1fedf99 Compare September 29, 2021 15:16
@gandro gandro force-pushed the pr/gandro/ci-wireguard-clustermesh branch from 1fedf99 to 62c6023 Compare October 25, 2021 14:39
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro force-pushed the pr/gandro/ci-wireguard-clustermesh branch from 62c6023 to bba1a55 Compare October 25, 2021 14:40
@gandro
Copy link
Member Author

gandro commented Oct 25, 2021

Successful test run of the Multicluster / Cluster mesh (ci-multicluster) workflow with the changes applied (based on bba1a553236c8ab4199234138e899e2223ecb8ba):
✔️ https://github.com/cilium/cilium/runs/3998382577

@gandro gandro force-pushed the pr/gandro/ci-wireguard-clustermesh branch from bba1a55 to f8db94f Compare October 25, 2021 15:40
@gandro gandro marked this pull request as ready for review October 25, 2021 15:54
@gandro gandro requested review from a team as code owners October 25, 2021 15:54
@gandro
Copy link
Member Author

gandro commented Oct 25, 2021

/test

@gandro gandro added the area/CI-improvement Topic or proposal to improve the Continuous Integration workflow label Oct 27, 2021
Copy link
Member

@nbusseneau nbusseneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes LGTM. I have two remarks:

  • The clustermesh workflow would now be the only one where we test WireGuard, as all others are testing IPsec. Is that an issue? Would there be value in testing both WireGuard and IPsec in all configurations?
  • Reconfiguring the Cilium agents via cilium config set is a neat trick, I like it 👍 On other workflows though, what we did was uninstall followed with another install. Is there a possibility the config way would behave differently as the reinstall method? If yes, I would argue we keep to the reinstall method to imitate users installing new clusters as closely as possible.

@gandro
Copy link
Member Author

gandro commented Oct 27, 2021

Thanks for the reviews!

@nbusseneau

The clustermesh workflow would now be the only one where we test WireGuard, as all others are testing IPsec. Is that an issue? Would there be value in testing both WireGuard and IPsec in all configurations?

Generally speaking, yes. The main reason we have not done that yet is because I believe none of the managed K8s providers (GCP, EKS) have WireGuard kernel support enabled by default (AKS might be an exception, need to double check). Now that we have user-mode WireGuard, we could add it.

My main concern however is that it will add ~5 minutes of additional run-time to each test. I'm not sure this is worth it.

Reconfiguring the Cilium agents via cilium config set is a neat trick, I like it +1 On other workflows though, what we did was uninstall followed with another install. Is there a possibility the config way would behave differently as the reinstall method? If yes, I would argue we keep to the reinstall method to imitate users installing new clusters as closely as possible.

The reason I chose cilium config is because I wanted to avoid having to set-up clustermesh and hubble-relay again after the uninstall. Basically cilium config saves me from doing uninstall + install + enable hubble + enable clustermesh + connect clustermesh and saves around ~2min of run-time.

@nbusseneau
Copy link
Member

The reason I chose cilium config is because I wanted to avoid having to set-up clustermesh and hubble-relay again after the uninstall. Basically cilium config saves me from doing uninstall + install + enable hubble + enable clustermesh + connect clustermesh and saves around ~2min of run-time.

Yes, this is why I find it so neat 😄 If we think there is no difference in behaviour with the reinstall method, I would actually happily take down the path of doing the same on all workflows!

@gandro
Copy link
Member Author

gandro commented Oct 27, 2021

Yes, this is why I find it so neat smile If we think there is no difference in behaviour with the reinstall method, I would actually happily take down the path of doing the same on all workflows!

🚀

In terms of behavior, there might be some differences. For IPSec encryption, we will need to use install, because install creates a K8s secret containing the key, something cilium config cannot do.

Another (albeit small) difference that I can think of is that Hubble Relay will loose it's connection to the restarted Cilium-Agent instances with cilium config. Relay will re-connect to the new instances eventually, but if we want to ensure a standing connection, we might have to restart the Hubble Relay pod after cilium config set --restart.

Edit: On the second thing, impact of this will be low though: Hubble not being reconnected yet will only impact flow connection in the sysdump for now, as flow-validation is currently disabled.

@gandro
Copy link
Member Author

gandro commented Oct 28, 2021

I'm marking this ready-to-merge, as this is a CI improvement.

The failures are unrelated as this PR does not change any code. Me running /test was not actually necessary on this PR (as it will not run the modified workflow here, see #17453 (comment) for that)

On the discussion if we want to enable WireGuard in other CI 3.0 workflows as well: Let's do that in a separate PR once we have more experience with how stable WireGuard is running in clustermesh (I don't expect any problems, but just to be on the safe side).

@gandro gandro added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 28, 2021
@nebril nebril merged commit 203fb2e into master Oct 29, 2021
@nebril nebril deleted the pr/gandro/ci-wireguard-clustermesh branch October 29, 2021 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI-improvement Topic or proposal to improve the Continuous Integration workflow ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants