services: don't wait for clustermesh to delete local stale backends #28745

giorio94 · 2023-10-23T16:47:10Z

fe4dda7 ("services: prevent temporary connectivity loss on agent restart") modified the handling of restored backends to prevent possibly causing temporary connectivity disruption on agent restart if a service is either associated with multiple endpointslices (e.g., it has more than 100 backends, or is dual stack) or has backends spanning across multiple clusters (i.e., it is a global service). At a high level, we now keep a list of restored backends, which continue being merged with the ones we received an update for, until the bootstrap phase completes. At that point, we trigger an update for each service still associated with stale backends, so that they can be removed.

One drawback associated with this approach, though, is that when clustermesh is enabled we currently wait for full synchronization from all remote clusters before triggering the removal of stale backends, regardless of whether the given service is global (i.e., possibly includes also remote backends) or not. One specific example in which such behavior is problematic relates to the clustermesh-apiserver. Indeed, if it gets restarted at the same time of the agents (e.g., during an upgrade), the associated service might end up including both the address of the previous pod (which is now stale) and that of the new one, which is correct. When kvstoremesh is enabled, local agents connect to it through that service. In this case, there's a circular dependency: the agent may pick the stale backend, and the connection to etcd fails, which in turn prevents the synchronization from being started, and eventually complete to trigger the removal of the stale backend. Although this dependency eventually resolves as a different backend is picked by the service load-balancing algorithm, unnecessary delay is introduced (the same could also happen for remote agents connecting through a NodePort if KPR is enabled).

To remove this dependency, let's perform a two-pass cleanup of stale backends: the first one as soon as we synchronize with Kubernetes, targeting non-global services only; the second triggered by full clustermesh synchronization, covering all remaining ones. Hence, non-global services can be fully functional also before the completion of the full clustermesh synchronization. It is worth mentioning that this fix applies to all non-global services, which can now converge faster also in case of large clustermeshes.

Marking for backport to v1.14, as the mentioned issue was observed in CI, and caused increased flakiness.

Improve deletion of stale backends associated with non-global services, without waiting for full Cluster Mesh synchronization

giorio94 · 2023-10-23T16:47:48Z

/test

daemon/cmd/state.go

pkg/k8s/service_cache.go

giorio94 · 2023-10-30T09:56:06Z

/test

fe4dda7 ("services: prevent temporary connectivity loss on agent restart") modified the handling of restored backends to prevent possibly causing temporary connectivity disruption on agent restart if a service is either associated with multiple endpointslices (e.g., it has more than 100 backends, or is dual stack) or has backends spanning across multiple clusters (i.e., it is a global service). At a high level, we now keep a list of restored backends, which continue being merged with the ones we received an update for, until the bootstrap phase completes. At that point, we trigger an update for each service still associated with stale backends, so that they can be removed. One drawback associated with this approach, though, is that when clustermesh is enabled we currently wait for full synchronization from all remote clusters before triggering the removal of stale backends, regardless of whether the given service is global (i.e., possibly includes also remote backends) or not. One specific example in which such behavior is problematic relates to the clustermesh-apiserver Indeed, if it gets restarted at the same time of the agents (e.g., during an upgrade), the associated service might end up including both the address of the previous pod (which is now stale) and that of the new one, which is correct. When kvstoremesh is enabled, local agents connect to it through that service. In this case, there's a circular dependency: the agent may pick the stale backend, and the connection to etcd fails, which in turn prevents the synchronization from being started, and eventually complete to trigger the removal of the stale backend. Although this dependency eventually resolves as a different backend is picked by the service load-balancing algorithm, unnecessary delay is introduced (the same could also happen for remote agents connecting through a NodePort if KPR is enabled). To remove this dependency, let's perform a two-pass cleanup of stale backends: the first one as soon as we synchronize with Kubernetes, targeting non-global services only; the second triggered by full clustermesh synchronization, covering all remaining ones. Hence, non-global services can be fully functional also before the completion of the full clustermesh synchronization. It is worth mentioning that this fix applies to all non-global services, which can now converge faster also in case of large clustermeshes. Co-authored-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Refactor the SyncWithK8sFinished function to return the list of services with stale backends, which should be refreshed, rather than directly refreshing them. This makes the separation more clear, allowing to avoid having to pass the refresh function as parameter and preventing possible deadlocks due to incorrect mutex locking (due to the interdependencies between the service subsystem and service cache). Suggested-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

giorio94 · 2023-10-30T10:26:17Z

Rebased onto main to pick the necessary CI changes to successfully run the tests.

giorio94 · 2023-10-30T10:26:21Z

/test

joamaki

LGTM thanks!

giorio94 · 2023-11-02T15:56:42Z

Reviews from all code owners are in, tests are green except for a non-required one (and known to be flaky). Marking as ready to merge.

giorio94 · 2023-11-07T07:48:53Z

Not sure why MLH removed the ready to merge label. Adding back again.

giorio94 added release-note/misc This PR makes changes that have no direct user impact. area/loadbalancing Impacts load-balancing and Kubernetes service implementations needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Oct 23, 2023

maintainer-s-little-helper bot added this to Needs backport from main in 1.14.4 Oct 23, 2023

giorio94 marked this pull request as ready for review October 24, 2023 07:05

giorio94 requested review from a team as code owners October 24, 2023 07:05

giorio94 requested review from ysksuzuki, joamaki and jrajahalme October 24, 2023 07:05

joamaki requested changes Oct 30, 2023

View reviewed changes

daemon/cmd/state.go Outdated Show resolved Hide resolved

pkg/k8s/service_cache.go Show resolved Hide resolved

giorio94 marked this pull request as draft October 30, 2023 07:51

giorio94 force-pushed the mio/service-obsolete-backends-two-steps branch from 6d41bcc to 3f26503 Compare October 30, 2023 09:45

giorio94 and others added 2 commits October 30, 2023 11:04

giorio94 force-pushed the mio/service-obsolete-backends-two-steps branch from 3f26503 to 5f69e0c Compare October 30, 2023 10:20

giorio94 marked this pull request as ready for review October 30, 2023 10:58

maintainer-s-little-helper bot requested a review from joamaki October 30, 2023 11:31

joamaki approved these changes Oct 30, 2023

View reviewed changes

jrajahalme approved these changes Nov 2, 2023

View reviewed changes

giorio94 added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 2, 2023

maintainer-s-little-helper bot removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 3, 2023

giorio94 added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 7, 2023

joamaki merged commit 66ba850 into cilium:main Nov 7, 2023
62 checks passed

maintainer-s-little-helper bot removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Nov 7, 2023

gandro mentioned this pull request Nov 7, 2023

Gen yaml annotations for flow proto #28944

Closed

3 tasks

jibi mentioned this pull request Nov 7, 2023

v1.14 Backports 2023-11-07 #29030

Merged

15 tasks

jibi added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Nov 7, 2023

maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.14 in 1.14.4 Nov 7, 2023

github-actions bot added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Nov 8, 2023

thorn3r mentioned this pull request Nov 9, 2023

Prepare for release v1.14.4 #29093

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

services: don't wait for clustermesh to delete local stale backends #28745

services: don't wait for clustermesh to delete local stale backends #28745

giorio94 commented Oct 23, 2023

giorio94 commented Oct 23, 2023

giorio94 commented Oct 30, 2023

giorio94 commented Oct 30, 2023

giorio94 commented Oct 30, 2023

joamaki left a comment

giorio94 commented Nov 2, 2023

giorio94 commented Nov 7, 2023

services: don't wait for clustermesh to delete local stale backends #28745

services: don't wait for clustermesh to delete local stale backends #28745

Conversation

giorio94 commented Oct 23, 2023

giorio94 commented Oct 23, 2023

giorio94 commented Oct 30, 2023

giorio94 commented Oct 30, 2023

giorio94 commented Oct 30, 2023

joamaki left a comment

Choose a reason for hiding this comment

giorio94 commented Nov 2, 2023

giorio94 commented Nov 7, 2023