Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services: don't wait for clustermesh to delete local stale backends #28745

Merged

Conversation

giorio94
Copy link
Member

fe4dda7 ("services: prevent temporary connectivity loss on agent restart") modified the handling of restored backends to prevent possibly causing temporary connectivity disruption on agent restart if a service is either associated with multiple endpointslices (e.g., it has more than 100 backends, or is dual stack) or has backends spanning across multiple clusters (i.e., it is a global service). At a high level, we now keep a list of restored backends, which continue being merged with the ones we received an update for, until the bootstrap phase completes. At that point, we trigger an update for each service still associated with stale backends, so that they can be removed.

One drawback associated with this approach, though, is that when clustermesh is enabled we currently wait for full synchronization from all remote clusters before triggering the removal of stale backends, regardless of whether the given service is global (i.e., possibly includes also remote backends) or not. One specific example in which such behavior is problematic relates to the clustermesh-apiserver. Indeed, if it gets restarted at the same time of the agents (e.g., during an upgrade), the associated service might end up including both the address of the previous pod (which is now stale) and that of the new one, which is correct. When kvstoremesh is enabled, local agents connect to it through that service. In this case, there's a circular dependency: the agent may pick the stale backend, and the connection to etcd fails, which in turn prevents the synchronization from being started, and eventually complete to trigger the removal of the stale backend. Although this dependency eventually resolves as a different backend is picked by the service load-balancing algorithm, unnecessary delay is introduced (the same could also happen for remote agents connecting through a NodePort if KPR is enabled).

To remove this dependency, let's perform a two-pass cleanup of stale backends: the first one as soon as we synchronize with Kubernetes, targeting non-global services only; the second triggered by full clustermesh synchronization, covering all remaining ones. Hence, non-global services can be fully functional also before the completion of the full clustermesh synchronization. It is worth mentioning that this fix applies to all non-global services, which can now converge faster also in case of large clustermeshes.

Marking for backport to v1.14, as the mentioned issue was observed in CI, and caused increased flakiness.

Improve deletion of stale backends associated with non-global services, without waiting for full Cluster Mesh synchronization 

@giorio94 giorio94 added release-note/misc This PR makes changes that have no direct user impact. area/loadbalancing Impacts load-balancing and Kubernetes service implementations needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Oct 23, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.4 Oct 23, 2023
@giorio94
Copy link
Member Author

/test

@giorio94 giorio94 marked this pull request as ready for review October 24, 2023 07:05
@giorio94 giorio94 requested review from a team as code owners October 24, 2023 07:05
daemon/cmd/state.go Outdated Show resolved Hide resolved
pkg/k8s/service_cache.go Show resolved Hide resolved
@giorio94 giorio94 marked this pull request as draft October 30, 2023 07:51
@giorio94 giorio94 force-pushed the mio/service-obsolete-backends-two-steps branch from 6d41bcc to 3f26503 Compare October 30, 2023 09:45
@giorio94
Copy link
Member Author

/test

giorio94 and others added 2 commits October 30, 2023 11:04
fe4dda7 ("services: prevent temporary connectivity loss on agent
restart") modified the handling of restored backends to prevent
possibly causing temporary connectivity disruption on agent restart if
a service is either associated with multiple endpointslices (e.g.,
it has more than 100 backends, or is dual stack) or has backends
spanning across multiple clusters (i.e., it is a global service).
At a high level, we now keep a list of restored backends, which continue
being merged with the ones we received an update for, until the bootstrap
phase completes. At that point, we trigger an update for each service
still associated with stale backends, so that they can be removed.

One drawback associated with this approach, though, is that when
clustermesh is enabled we currently wait for full synchronization from
all remote clusters before triggering the removal of stale backends,
regardless of whether the given service is global (i.e., possibly
includes also remote backends) or not. One specific example in which
such behavior is problematic relates to the clustermesh-apiserver
Indeed, if it gets restarted at the same time of the agents
(e.g., during an upgrade), the associated service might end up
including both the address of the previous pod (which is now stale)
and that of the new one, which is correct. When kvstoremesh is
enabled, local agents connect to it through that service. In this
case, there's a circular dependency: the agent may pick the stale
backend, and the connection to etcd fails, which in turn prevents the
synchronization from being started, and eventually complete to trigger
the removal of the stale backend. Although this dependency eventually
resolves as a different backend is picked by the service load-balancing
algorithm, unnecessary delay is introduced (the same could also happen
for remote agents connecting through a NodePort if KPR is enabled).

To remove this dependency, let's perform a two-pass cleanup of stale
backends: the first one as soon as we synchronize with Kubernetes,
targeting non-global services only; the second triggered by full
clustermesh synchronization, covering all remaining ones. Hence,
non-global services can be fully functional also before the
completion of the full clustermesh synchronization. It is worth
mentioning that this fix applies to all non-global services,
which can now converge faster also in case of large clustermeshes.

Co-authored-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Refactor the SyncWithK8sFinished function to return the list of services
with stale backends, which should be refreshed, rather than directly
refreshing them. This makes the separation more clear, allowing to avoid
having to pass the refresh function as parameter and preventing possible
deadlocks due to incorrect mutex locking (due to the interdependencies
between the service subsystem and service cache).

Suggested-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the mio/service-obsolete-backends-two-steps branch from 3f26503 to 5f69e0c Compare October 30, 2023 10:20
@giorio94
Copy link
Member Author

Rebased onto main to pick the necessary CI changes to successfully run the tests.

@giorio94
Copy link
Member Author

/test

@giorio94 giorio94 marked this pull request as ready for review October 30, 2023 10:58
Copy link
Contributor

@joamaki joamaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@giorio94
Copy link
Member Author

giorio94 commented Nov 2, 2023

Reviews from all code owners are in, tests are green except for a non-required one (and known to be flaky). Marking as ready to merge.

@giorio94 giorio94 added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 2, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 3, 2023
@giorio94 giorio94 added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 7, 2023
@giorio94
Copy link
Member Author

giorio94 commented Nov 7, 2023

Not sure why MLH removed the ready to merge label. Adding back again.

@joamaki joamaki merged commit 66ba850 into cilium:main Nov 7, 2023
62 checks passed
@maintainer-s-little-helper maintainer-s-little-helper bot removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Nov 7, 2023
@gandro gandro mentioned this pull request Nov 7, 2023
3 tasks
@jibi jibi mentioned this pull request Nov 7, 2023
15 tasks
@jibi jibi added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Nov 7, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.14 in 1.14.4 Nov 7, 2023
@github-actions github-actions bot added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/loadbalancing Impacts load-balancing and Kubernetes service implementations backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. release-note/misc This PR makes changes that have no direct user impact.
Projects
No open projects
1.14.4
Backport pending to v1.14
Development

Successfully merging this pull request may close these issues.

None yet

4 participants