Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add cluster mesh conformance tests with Kind #23496

Merged
merged 1 commit into from Mar 10, 2023

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Jan 31, 2023

This PR introduces the new kind-based conformance tests for cluster mesh, implementing a test matrix to validate (a subset of) the following combinations:

  • encryption: "none" | "ipsec" | "wireguard"
  • tunnel: "disabled" | "vxlan"
  • ipfamily: "ipv4-only" | "dual-stack" | ipv6-only"
  • kube-proxy: "iptables" | "kpr"

Additional context can be found in the enhancement proposal: #23322

Fixes: #23322
Fixes: #9994

Co-authored-by: Yutaro Hayakawa yutaro.hayakawa@isovalent.com
Signed-off-by: Marco Iorio marco.iorio@isovalent.com

test: add cluster mesh conformance tests with Kind 

@giorio94 giorio94 added area/CI Continuous Integration testing issue or flake release-note/ci This PR makes changes to the CI. labels Jan 31, 2023
@giorio94 giorio94 force-pushed the giorio94/pr/clustermesh-tests branch 4 times, most recently from c918bc3 to 041fa15 Compare February 3, 2023 14:27
@giorio94 giorio94 force-pushed the giorio94/pr/clustermesh-tests branch 7 times, most recently from 6bedf1e to 5c6574a Compare February 21, 2023 17:26
@giorio94 giorio94 added the dont-merge/blocked Another PR must be merged before this one. label Feb 21, 2023
@giorio94
Copy link
Member Author

I'm marking this ready for review, since all new tests passed correctly: https://github.com/cilium/cilium/actions/runs/4235266535/jobs/7358672515

I'm also adding the dont-merge label, since the temporary commit still needs to be dropped (I would postpone that after the reviews).

@reviewers I personally feel it could be appropriate to overwrite the old clusttermesh tests (based on GCP), rather than creating a new suite aside. Still, at the moment I've kept them separate for ease of review. Let me know your preference.

@giorio94 giorio94 marked this pull request as ready for review February 21, 2023 17:58
@giorio94 giorio94 requested review from a team as code owners February 21, 2023 17:58
Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome ✔️

.github/workflows/conformance-clustermesh.yaml Outdated Show resolved Hide resolved
.github/workflows/conformance-clustermesh.yaml Outdated Show resolved Hide resolved
@giorio94 giorio94 force-pushed the giorio94/pr/clustermesh-tests branch 2 times, most recently from e1467e6 to 50dc7a5 Compare February 22, 2023 07:55
@sayboras sayboras added dont-merge/preview-only Only for preview or testing, don't merge it. and removed dont-merge/blocked Another PR must be merged before this one. labels Feb 22, 2023
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming the second commit is removed.

Is the plan to remove the existing clustermesh workflow once this one is deemed stable?

@giorio94
Copy link
Member Author

Is the plan to remove the existing clustermesh workflow once this one is deemed stable?

Personally I would do so. I wonder if it is better to remove that in this PR or in a later one.

@pchaigno
Copy link
Member

I would remove in a subsequent PR after giving it a bit of time. New tests are often more flaky than existing one so maybe best to wait a bit and see.

@giorio94
Copy link
Member Author

giorio94 commented Feb 28, 2023

One final doubt before marking ready to merge: once merged, renovate will attempt to bump the Cilium CLI version to 1.13, but IPv6 tests would then fail. How should we handle that?

IIUC, the IPv6 tests were fixed by our PR cilium/cilium-cli#1414. If yes, we could tag cilium-cli v0.13.1 with that fix included.

The other option would be to temporarily remove the # renovate ... comment here:

https://github.com/cilium/cilium/pull/23496/files#diff-5ef4c4e5dce8b57b27fa33f08032d353da936eda2ca6e1307b89d83e361b4429R68

and add remember to add it back once cilium-cli is fixed.

@giorio94
Copy link
Member Author

IIUC, the IPv6 tests were fixed by our PR cilium/cilium-cli#1414. If yes, we could tag cilium-cli v0.13.1 with that fix included.

@tklauser IMHO Tagging cilium-cli v0.13.1 with the fix would be the best solution, if possible. Then, I could also drop the workarounds which are currently present due to failing tests in IPv6-only clusters. Otherwise I'll drop the renovate comment.

@tklauser
Copy link
Member

tklauser commented Mar 1, 2023

IIUC, the IPv6 tests were fixed by our PR cilium/cilium-cli#1414. If yes, we could tag cilium-cli v0.13.1 with that fix included.

@tklauser IMHO Tagging cilium-cli v0.13.1 with the fix would be the best solution, if possible. Then, I could also drop the workarounds which are currently present due to failing tests in IPv6-only clusters. Otherwise I'll drop the renovate comment.

We'll probably tag v0.13.1 later this week. Currently waiting for a few PRs to get reviewed and merged.

@giorio94 giorio94 force-pushed the giorio94/pr/clustermesh-tests branch from c68ad0c to 60d4e48 Compare March 3, 2023 15:23
@giorio94
Copy link
Member Author

giorio94 commented Mar 3, 2023

Last push bumped the Cilium CLI version to the newly released 1.13.1, and removed the associated temporary fixes.

@giorio94 giorio94 force-pushed the giorio94/pr/clustermesh-tests branch 2 times, most recently from 4b07d4c to fa8289d Compare March 6, 2023 09:43
This commit introduces the new kind-based conformance tests for
cluster mesh, implementing a test matrix to validate (a subset of)
the following combinations:
* encryption: "none" | "ipsec" | "wireguard"
* tunnel: "disabled" | "vxlan"
* ipfamily: "ipv4-only" | "dual-stack" | ipv6-only"
* kube-proxy: "iptables" | "kpr"

Additional context can be found in the enhancement proposal: #23322

Co-authored-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the giorio94/pr/clustermesh-tests branch from fa8289d to 8b7bbc8 Compare March 6, 2023 10:07
@giorio94
Copy link
Member Author

giorio94 commented Mar 6, 2023

Changes since last reviews:

  • I've disabled flow validation through hubble, for consistency with the other existing tests, since that almost doubled tests duration. The context flag is removed since redundant (it is also specified in the actual invocation).
--- a/.github/workflows/conformance-clustermesh.yaml
+++ b/.github/workflows/conformance-clustermesh.yaml
@@ -274,7 +274,8 @@ jobs:
         CLUSTERMESH_ENABLE_DEFAULTS="--apiserver-image=quay.io/${{ env.QUAY_ORGANIZATION_DEV }}/clustermesh-apiserver-ci \
           --apiserver-version=${SHA} --service-type=NodePort"
 
-        CONNECTIVITY_TEST_DEFAULTS="--context=${{ env.contextName1 }} \
+        CONNECTIVITY_TEST_DEFAULTS="--hubble=false \
+          --flow-validation=disabled \
           --multi-cluster=${{ env.contextName2 }} \
           --external-target=google.com \
           --collect-sysdump-on-failure"
  • I've explicitly configured the coredns upstream DNS servers since, in case of IPv4+IPV6 clusters, the default ones included also IPv6 upstreams, which are not reachable from the GH actions environments, and caused spurious failures.
--- a/.github/workflows/conformance-clustermesh.yaml
+++ b/.github/workflows/conformance-clustermesh.yaml
         config: ./.github/kind-config-cluster2.yaml
         wait: 0 # The control-plane never becomes ready, since no CNI is present
 
+    # Make sure that coredns uses IPv4-only upstream DNS servers also in case of clusters
+    # with IP family dual, since IPv6 ones are not reachable and cause spurious failures.
+    - name: Configure the coredns nameservers
+      if: matrix.ipfamily == 'dual'
+      run: |
+        COREDNS_PATCH="
+        spec:
+          template:
+            spec:
+              dnsPolicy: None
+              dnsConfig:
+                nameservers:
+                - 8.8.4.4
+                - 8.8.8.8
+        "
+
+        kubectl --context ${{ env.contextName1 }} patch deployment -n kube-system coredns --patch="$COREDNS_PATCH"
+        kubectl --context ${{ env.contextName2 }} patch deployment -n kube-system coredns --patch="$COREDNS_PATCH"
+
     - name: Wait for images to be available
       timeout-minutes: 10
       shell: bash

Link to successful run: https://github.com/cilium/cilium/actions/runs/4342212579

@tklauser, @pchaigno, @sayboras, @aanm, @YutaroHayakawa Please, have a final look if you'd like. From my side, then, this is ready to be marked as ready to merge.

@giorio94 giorio94 force-pushed the giorio94/pr/clustermesh-tests branch from 8b7bbc8 to 4b068e2 Compare March 6, 2023 10:35
@giorio94
Copy link
Member Author

giorio94 commented Mar 6, 2023

Ingress related tests failed due to known flake #23960. I'm not re-running them, since this PR does not introduce any code changes.

@sayboras
Copy link
Member

sayboras commented Mar 6, 2023

Ingress related tests failed due to known flake #23960. I'm not re-running them, since this PR does not introduce any code changes.

It should be fixed in master now, but we don't need to re-run it here.

Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM ✔️, this is good stuff 😲

@giorio94
Copy link
Member Author

giorio94 commented Mar 8, 2023

All required reviews are in, and test failures are unrelated, since this PR is only introducing new tests.
Marking as ready to merge.

@giorio94 giorio94 added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. and removed dont-merge/preview-only Only for preview or testing, don't merge it. labels Mar 8, 2023
@sayboras
Copy link
Member

Looks good to me, the issue with Ingress GHA failure is already addressed in master branch.

@borkmann borkmann merged commit e2a01fd into master Mar 10, 2023
@borkmann borkmann deleted the giorio94/pr/clustermesh-tests branch March 10, 2023 18:33
giorio94 added a commit to giorio94/cilium that referenced this pull request Apr 20, 2023
The new kind-based clustermesh workflow introduced in cilium#23496 has now
run for a while in parallel with the old one, and it seems comparably
stable while covering more scenarios. Hence, let's drop the old one.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
pchaigno pushed a commit that referenced this pull request Apr 20, 2023
The new kind-based clustermesh workflow introduced in #23496 has now
run for a while in parallel with the old one, and it seems comparably
stable while covering more scenarios. Hence, let's drop the old one.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
giorio94 added a commit that referenced this pull request Apr 21, 2023
This commit adds a copy of the new clustermesh conformance test
initially introduced in #23496 for each of the currently active stable
branches (i.e., v1.11, v1.12 and v1.13). Trigger phrases and Kubernetes
versions are updated accordingly.

In case of v1.11, the direct-routing IPv6 and IPv4+IPv6 tests have been
disabled since that configuration is currently affected a bug causing
incorrect masquerading of traffic directed to remote pods.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
pchaigno pushed a commit that referenced this pull request Apr 24, 2023
This commit adds a copy of the new clustermesh conformance test
initially introduced in #23496 for each of the currently active stable
branches (i.e., v1.11, v1.12 and v1.13). Trigger phrases and Kubernetes
versions are updated accordingly.

In case of v1.11, the direct-routing IPv6 and IPv4+IPv6 tests have been
disabled since that configuration is currently affected a bug causing
incorrect masquerading of traffic directed to remote pods.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@michi-covalent
Copy link
Contributor

michi-covalent commented May 2, 2023

do we want to backport this to stable branches? 👀

edit:

... never mind i see we already have workflows for stable branches ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
None yet
8 participants