Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workflows: add external workload conformance test #16789

Merged
merged 1 commit into from Jul 29, 2021

Conversation

nbusseneau
Copy link
Member

This was ported and adapted from cilium-cli.

Supersedes #16382.

@nbusseneau nbusseneau added area/CI-improvement Topic or proposal to improve the Continuous Integration workflow release-note/ci This PR makes changes to the CI. labels Jul 5, 2021
@nbusseneau nbusseneau requested a review from nebril July 5, 2021 16:05
@nbusseneau nbusseneau force-pushed the pr/workflow-external-workloads branch 5 times, most recently from 43521d0 to 66d8a27 Compare July 12, 2021 14:11
@nbusseneau nbusseneau force-pushed the pr/workflow-external-workloads branch 6 times, most recently from 37943cb to 15d2eab Compare July 19, 2021 17:04
@nbusseneau nbusseneau force-pushed the pr/workflow-external-workloads branch 7 times, most recently from d8db0aa to af700e7 Compare July 20, 2021 20:52
@nbusseneau
Copy link
Member Author

The changes in this PR work using the exact same commits on top of v1.10.3: https://github.com/cilium/cilium/actions/runs/1050340820 (failure is due to flow validation not working, but I expected that. It did run the connectivity test, no VM <-> cluster connectivity issue).

This means we have a regression in master with external workloads.

@nbusseneau nbusseneau force-pushed the pr/workflow-external-workloads branch 3 times, most recently from 27b1bcc to 564845e Compare July 22, 2021 16:08
@nbusseneau
Copy link
Member Author

Regression comes from commit 0681343 (PR #15632):

nbusseneau added a commit to nbusseneau/cilium that referenced this pull request Jul 23, 2021
This reverts commit 0681343, initially
from PR cilium#15632.

Note: we revert the initial changes as-is, but bump up
`CustomResourceDefinitionSchemaVersion` to `1.23.4` since the revert
itself is a change of the CRD schema.

Rationale:

The commit introduced a regression in clustermesh connectivity with
external workloads.

Identifying the regression:

We initially worked on adding external workloads testing to `cilium`
with a new workflow running a `cilium-cli` connectivity test on a GKE
cluster / GCP VM clustermesh in PR cilium#16789, but were consistently hitting
a connectivity issue soon after the GCP VM joined the clustermesh with
the GCP VM suddenly being unable to communicate with the cluster.

Example of failing runs:
- https://github.com/cilium/cilium/actions/runs/1050302990
- https://github.com/cilium/cilium/actions/runs/1056640820

This failure was not happening on the `cilium-cli` repository with a
similar workflow, with the difference being that `cilium-cli` uses a
stable version of Cilium (1.10.3 at the moment).

To check, we cherry-picked the changes from PR cilium#16789 on top of tag
`v1.10.3` in a secondary PR cilium#16946, and verified that it worked. This
indicated the changes from PR cilium#16789 are sound and a regression in
external workloads behavior had happened on `cilium`'s `master` branch.

We bisected / cherry-picked the workflow on top of older commits in
order to find the regression, still via secondary PR cilium#16946.

We confirm the regression is due to
0681343 by using these 3 scenarios:

1. Running `cilium-cli` connectivity test right on top of
  0681343:
  - Link: https://github.com/cilium/cilium/commits/db3e9108b1c9020d9cd0549b85aae552fb0bb7ba
  - Log:
      db3e9108b DO NOT MERGE
      da0fbd714 workflows: add external workload conformance test
      783dc9a62 k8s: Fix External Workloads service access
      0681343 Removes CEP subresource.
      3a55d74 fix warning log for list IPV6 address: move IPV4 to IPv6

2. Running `cilium-cli` connectivity test on top of previous `master`
  commit:
  - Link: https://github.com/cilium/cilium/commits/746f9062dc4fe081e1a6d03921fe9c2abe58acb7
  - Log:
      746f9062d DO NOT MERGE
      6cb37b377 workflows: add external workload conformance test
      627d025e8 k8s: Fix External Workloads service access
      3a55d74 fix warning log for list IPV6 address: move IPV4 to IPv6

3. Running `cilium-cli` connectivity test on top of current `master`
  with 0681343 reverted:
- Link: https://github.com/cilium/cilium/commits/6bea7087a41c149bfa1781ab7b4a6be88764ec45
- Log:
    6bea7087a DO NOT MERGE
    d6a769efc workflows: add external workload conformance test
    55da4e5f0 Revert "Removes CEP subresource."
    189cf7f contrib: Improve release script guard rails

Note: since the regression comes from a commit anterior to the external
workloads compatibility fix for `cilium-cli` with Cilium 1.10, added in
929c28f (PR cilium#16662), backporting only
the GitHub workflow while searching for the regression was insufficient
and we also had cherry-pick the fix in scenarios 1 and 2.

Results:

1. Failing: https://github.com/cilium/cilium/actions/runs/1059788504
2. Successful: https://github.com/cilium/cilium/actions/runs/1059727108
3. Successful: https://github.com/cilium/cilium/actions/runs/1059836909

This failure is consistent, and is the same failure as the one happening
in runs of initial workflow PR cilium#16789. This confirms the regression.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
@nbusseneau
Copy link
Member Author

Opened #16984 to revert regression commit, this PR is now blocked until #16984 is merged.

@nbusseneau nbusseneau added the dont-merge/blocked Another PR must be merged before this one. label Jul 23, 2021
@nbusseneau
Copy link
Member Author

Following discussion in #16984, we found the issue and opened a fix at #16986. This PR is now blocked until #16986 is merged.

@nbusseneau nbusseneau force-pushed the pr/workflow-external-workloads branch from 564845e to 1bec39a Compare July 28, 2021 09:48
@nbusseneau nbusseneau marked this pull request as ready for review July 28, 2021 09:49
@nbusseneau nbusseneau requested review from a team as code owners July 28, 2021 09:49
@nbusseneau nbusseneau requested a review from pchaigno July 28, 2021 09:49
@nbusseneau
Copy link
Member Author

nbusseneau commented Jul 28, 2021

#16986 is merged. Rebased and marked as ready for review.

@nbusseneau nbusseneau removed the dont-merge/blocked Another PR must be merged before this one. label Jul 28, 2021
@nbusseneau nbusseneau force-pushed the pr/workflow-external-workloads branch 2 times, most recently from 71615d0 to ee7a8ca Compare July 28, 2021 11:44
@nbusseneau
Copy link
Member Author

nbusseneau commented Jul 28, 2021

Link to test run of workflow changes: https://github.com/cilium/cilium/actions/runs/1074746971
No need to run rest of CI since it's a new workflow. Removing temporary test commit.

This can be marked as ready-to-merge once reviews are in :)

@nbusseneau nbusseneau force-pushed the pr/workflow-external-workloads branch from ee7a8ca to c8cfbed Compare July 28, 2021 12:05
This was ported and adapted from `cilium-cli`.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
@nbusseneau nbusseneau force-pushed the pr/workflow-external-workloads branch from c8cfbed to 72e8be7 Compare July 28, 2021 13:21
@nbusseneau nbusseneau added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 28, 2021
@nebril nebril merged commit 5726e9f into master Jul 29, 2021
@nebril nebril deleted the pr/workflow-external-workloads branch July 29, 2021 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI-improvement Topic or proposal to improve the Continuous Integration workflow ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants