New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removes CEP subresource. #15632
Removes CEP subresource. #15632
Conversation
@aanm, as we discussed this should NOT go into 1.10, we need users to upgrade to part 1 first with 1.10, is there a way to do that except for manual monitoring? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. In terms of signaling that this PR should only be merged after the release, we have a label for that. Applying.
CI should be fixed by PR #15731, will rebase once that is merged. |
test-me-please |
retest-net-next |
test-only --focus="K8sServicesTest.*Checks service across nodes Tests NodePort BPF Tests with direct routing Tests LoadBalancer" --kernel_version=net-next |
Reran the flaky test (failure in k8s-1.16-kernel-netnext (test-1.16-netnext)) in Focused-Test-Run, it passes, so I think all CI is "green" now, PTAL. |
Rebased with current master |
test-me-please |
Shall we remove |
@Weil0ng can you rebase against master so that we can run a full CI in this PR? |
This is part 2/2 of trimmming CEP subresource to improve scalability. Part 1/2 is PR cilium#15230. This will bump cilium CRD schema version and is only backward-compatible with agent that has part 1/2. Signed-off-by: Weilong Cui <cuiwl@google.com>
test-me-please |
test-1.21-4.9 (previous build was gone) |
It seems like some vbox/ssh failure? odd... |
test-1.21-4.9 |
The CI hit a flake that was fixed by #16381 so merging. |
This reverts commit 0681343, initially from PR cilium#15632. Note: we revert the initial changes as-is, but bump up `CustomResourceDefinitionSchemaVersion` to `1.23.4` since the revert itself is a change of the CRD schema. Rationale: The commit introduced a regression in clustermesh connectivity with external workloads. Identifying the regression: We initially worked on adding external workloads testing to `cilium` with a new workflow running a `cilium-cli` connectivity test on a GKE cluster / GCP VM clustermesh in PR cilium#16789, but were consistently hitting a connectivity issue soon after the GCP VM joined the clustermesh with the GCP VM suddenly being unable to communicate with the cluster. Example of failing runs: - https://github.com/cilium/cilium/actions/runs/1050302990 - https://github.com/cilium/cilium/actions/runs/1056640820 This failure was not happening on the `cilium-cli` repository with a similar workflow, with the difference being that `cilium-cli` uses a stable version of Cilium (1.10.3 at the moment). To check, we cherry-picked the changes from PR cilium#16789 on top of tag `v1.10.3` in a secondary PR cilium#16946, and verified that it worked. This indicated the changes from PR cilium#16789 are sound and a regression in external workloads behavior had happened on `cilium`'s `master` branch. We bisected / cherry-picked the workflow on top of older commits in order to find the regression, still via secondary PR cilium#16946. We confirm the regression is due to 0681343 by using these 3 scenarios: 1. Running `cilium-cli` connectivity test right on top of 0681343: - Link: https://github.com/cilium/cilium/commits/db3e9108b1c9020d9cd0549b85aae552fb0bb7ba - Log: db3e9108b DO NOT MERGE da0fbd714 workflows: add external workload conformance test 783dc9a62 k8s: Fix External Workloads service access 0681343 Removes CEP subresource. 3a55d74 fix warning log for list IPV6 address: move IPV4 to IPv6 2. Running `cilium-cli` connectivity test on top of previous `master` commit: - Link: https://github.com/cilium/cilium/commits/746f9062dc4fe081e1a6d03921fe9c2abe58acb7 - Log: 746f9062d DO NOT MERGE 6cb37b377 workflows: add external workload conformance test 627d025e8 k8s: Fix External Workloads service access 3a55d74 fix warning log for list IPV6 address: move IPV4 to IPv6 3. Running `cilium-cli` connectivity test on top of current `master` with 0681343 reverted: - Link: https://github.com/cilium/cilium/commits/6bea7087a41c149bfa1781ab7b4a6be88764ec45 - Log: 6bea7087a DO NOT MERGE d6a769efc workflows: add external workload conformance test 55da4e5f0 Revert "Removes CEP subresource." 189cf7f contrib: Improve release script guard rails Note: since the regression comes from a commit anterior to the external workloads compatibility fix for `cilium-cli` with Cilium 1.10, added in 929c28f (PR cilium#16662), backporting only the GitHub workflow while searching for the regression was insufficient and we also had cherry-pick the fix in scenarios 1 and 2. Results: 1. Failing: https://github.com/cilium/cilium/actions/runs/1059788504 2. Successful: https://github.com/cilium/cilium/actions/runs/1059727108 3. Successful: https://github.com/cilium/cilium/actions/runs/1059836909 This failure is consistent, and is the same failure as the one happening in runs of initial workflow PR cilium#16789. This confirms the regression. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
In 0681343 (from PR cilium#15632), we changed CEP CRD schema and removed the `status` subresource. This broke clustermesh logic as it was still trying to update CEP using the now removed `status` subresource. In particular, this resulted in a loss of connectivity in clustermeshes with external workloads: the VM could initially join the cluster but would immediately lose connectivity after failing to update the CEP resource (see cilium#16984 for full context). We change the clustermesh logic to adhere to the new CEP update CRD schema. Fixes: 0681343 Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
In 0681343 (from PR #15632), we changed CEP CRD schema and removed the `status` subresource. This broke clustermesh logic as it was still trying to update CEP using the now removed `status` subresource. In particular, this resulted in a loss of connectivity in clustermeshes with external workloads: the VM could initially join the cluster but would immediately lose connectivity after failing to update the CEP resource (see #16984 for full context). We change the clustermesh logic to adhere to the new CEP update CRD schema. Fixes: 0681343 Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
In 0681343 (from PR cilium#15632), we changed CEP CRD schema and removed the `status` subresource. This broke clustermesh logic as it was still trying to update CEP using the now removed `status` subresource. In particular, this resulted in a loss of connectivity in clustermeshes with external workloads: the VM could initially join the cluster but would immediately lose connectivity after failing to update the CEP resource (see cilium#16984 for full context). We change the clustermesh logic to adhere to the new CEP update CRD schema. Fixes: 0681343 Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
This is part 2/2 of trimmming CEP subresource to improve scalability.
Part 1/2 is PR #15230.
This will bump cilium CRD schema version and is only backward-compatible
with agent that has part 1/2.
Signed-off-by: Weilong Cui cuiwl@google.com
Fixes: #15153