Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s ingress & gateway api: fix unintentional deletion of shared envoy cluster resource #28896

Merged

Conversation

mhofstetter
Copy link
Member

@mhofstetter mhofstetter commented Oct 31, 2023

Currently, the Envoy Cluster resources in the CiliumEnvoyConfig aren't qualified with the namespace and name of the CiliumEnvoyConfig itself.

This leads to issues when updating a CiliumEnvoyConfig due to changes in the K8s Ingress & Gateway API resources. If multiple resources are referencing the same K8s Service, the Cluster resource gets deleted if one of these K8s Ingress or Gateway resources gets deleted - and therefore breaks the other resources.

With this PR, the name of the Cluster resource and their references gets qualified like the rest of the resources. For the EDS lookup of the endpoint addresses, the field service_name of the Cluster is used. (Currently an implicit 1:1 mapping of the names -> clusters name is used for the lookup)

See Envoy Docs for further information about the property service_name: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/cluster/v3/cluster.proto#config-cluster-v3-cluster-edsclusterconfig

@mhofstetter mhofstetter added kind/bug This is a bug in the Cilium logic. area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. release-note/bug This PR fixes an issue in a previous release of Cilium. area/servicemesh GH issues or PRs regarding servicemesh feature/k8s-gateway-api feature/k8s-ingress labels Oct 31, 2023
@mhofstetter mhofstetter changed the title k8s ingress & gateway api: qualify envoy clusters and their references k8s ingress & gateway api: fix unintentional deletion of shared envoy cluster resource Oct 31, 2023
@mhofstetter
Copy link
Member Author

/test

@mhofstetter mhofstetter force-pushed the pr/mhofstetter/shared-envoy-cluster branch from ce19e9a to eaab947 Compare October 31, 2023 14:54
@mhofstetter
Copy link
Member Author

/ci-gateway-api

Currently, the Clusters resources in the CiliumEnvoyConfig aren't qualified
with the namespace and name of the CEC itself.

This leads to issues when updating the CiliumEnvoyConfigs due to changes in
the K8s Ingress & Gateway API resources. If multiple resources are referencing
the same K8s Service, the Cluster resource gets deleted if one of these K8s
Ingress or Gateway resources gets deleted - and therefore breaks the other
resources.

With this commit, the name of the Cluster resource and their references gets
qualified like the rest of the resources. For the EDS lookup of the endpoint addresses,
the field `service_name` of the Cluster is used.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
@mhofstetter mhofstetter force-pushed the pr/mhofstetter/shared-envoy-cluster branch from eaab947 to d51730c Compare October 31, 2023 17:46
@mhofstetter
Copy link
Member Author

/test

@mhofstetter mhofstetter marked this pull request as ready for review November 1, 2023 08:55
@mhofstetter mhofstetter requested review from a team as code owners November 1, 2023 08:55
Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change on ciliumenvoyconfig.go is great, I had similar support for TcpProxy cluster name qualification also in my development branch.

Correct me if I'm wrong, but the EDSClusterConfig.ServiceName allows clusters with different names to be associated with endpoints/backends that have a ClusterName field matching the ServiceName, and that's why there is no change in the way the service backends are synced to Envoy?

Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have similar question like above.

The changes look good in general.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 3, 2023
@sayboras sayboras added needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Nov 3, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.4 Nov 3, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.9 Nov 3, 2023
@mhofstetter mhofstetter added dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. and removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Nov 3, 2023
@mhofstetter
Copy link
Member Author

mhofstetter commented Nov 3, 2023

Correct me if I'm wrong, but the EDSClusterConfig.ServiceName allows clusters with different names to be associated with endpoints/backends that have a ClusterName field matching the ServiceName, and that's why there is no change in the way the service backends are synced to Envoy?

Exactly. Until now we had a 1:1 relationship between Envoy Clusters and the Endpoints. The default Endpoint lookup happens with the name of the cluster itself. By using the EDSClusterConfig.ServiceName we have the possibility to qualify the Cluster too (and have a Cluster per Ingress / Gateway that is referencing a K8s Service) and referencing the corresponding Envoy Endpoint via ServiceName (n:1 relationship).

This way garbage collection is way easier - without the need to count references.

@mhofstetter mhofstetter added affects/v1.13 This issue affects v1.13 branch and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Nov 6, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Needs backport from main in 1.13.9 Nov 6, 2023
@jrajahalme jrajahalme added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. and removed dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. labels Nov 6, 2023
@jrajahalme jrajahalme merged commit e72677d into cilium:main Nov 6, 2023
62 checks passed
@mhofstetter mhofstetter deleted the pr/mhofstetter/shared-envoy-cluster branch November 6, 2023 16:33
@mhofstetter mhofstetter added affects/v1.14 This issue affects v1.14 branch and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Nov 7, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Needs backport from main in 1.14.4 Nov 7, 2023
@mhofstetter mhofstetter added needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch and removed affects/v1.14 This issue affects v1.14 branch labels Nov 7, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.4 Nov 7, 2023
@thorn3r thorn3r added this to Needs backport from main in 1.14.5 Nov 9, 2023
@thorn3r thorn3r removed this from Needs backport from main in 1.14.4 Nov 9, 2023
@gandro gandro added the backport/author The backport will be carried out by the author of the PR. label Nov 15, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.4 Nov 15, 2023
@gandro
Copy link
Member

gandro commented Nov 15, 2023

This also didn't apply cleanly on v1.14, marking "backport/author"

@gandro gandro mentioned this pull request Nov 15, 2023
6 tasks
@aanm aanm added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Nov 21, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport done to v1.14 in 1.14.4 Nov 21, 2023
@aanm aanm moved this from Needs backport from main to Backport done to v1.14 in 1.14.5 Nov 21, 2023
@aanm aanm removed this from Backport done to v1.14 in 1.14.4 Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.13 This issue affects v1.13 branch area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. area/servicemesh GH issues or PRs regarding servicemesh backport/author The backport will be carried out by the author of the PR. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. feature/k8s-gateway-api feature/k8s-ingress kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.14.5
Backport done to v1.14
Development

Successfully merging this pull request may close these issues.

None yet

5 participants