services: don't wait for clustermesh to delete local stale backends #28745

fe4dda7 ("services: prevent temporary connectivity loss on agent restart") modified the handling of restored backends to prevent possibly causing temporary connectivity disruption on agent restart if a service is either associated with multiple endpointslices (e.g., it has more than 100 backends, or is dual stack) or has backends spanning across multiple clusters (i.e., it is a global service). At a high level, we now keep a list of restored backends, which continue being merged with the ones we received an update for, until the bootstrap phase completes. At that point, we trigger an update for each service still associated with stale backends, so that they can be removed. One drawback associated with this approach, though, is that when clustermesh is enabled we currently wait for full synchronization from all remote clusters before triggering the removal of stale backends, regardless of whether the given service is global (i.e., possibly includes also remote backends) or not. One specific example in which such behavior is problematic relates to the clustermesh-apiserver Indeed, if it gets restarted at the same time of the agents (e.g., during an upgrade), the associated service might end up including both the address of the previous pod (which is now stale) and that of the new one, which is correct. When kvstoremesh is enabled, local agents connect to it through that service. In this case, there's a circular dependency: the agent may pick the stale backend, and the connection to etcd fails, which in turn prevents the synchronization from being started, and eventually complete to trigger the removal of the stale backend. Although this dependency eventually resolves as a different backend is picked by the service load-balancing algorithm, unnecessary delay is introduced (the same could also happen for remote agents connecting through a NodePort if KPR is enabled). To remove this dependency, let's perform a two-pass cleanup of stale backends: the first one as soon as we synchronize with Kubernetes, targeting non-global services only; the second triggered by full clustermesh synchronization, covering all remaining ones. Hence, non-global services can be fully functional also before the completion of the full clustermesh synchronization. It is worth mentioning that this fix applies to all non-global services, which can now converge faster also in case of large clustermeshes. Co-authored-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Refactor the SyncWithK8sFinished function to return the list of services with stale backends, which should be refreshed, rather than directly refreshing them. This makes the separation more clear, allowing to avoid having to pass the refresh function as parameter and preventing possible deadlocks due to incorrect mutex locking (due to the interdependencies between the service subsystem and service cache). Suggested-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

services: don't wait for clustermesh to delete local stale backends #28745

services: don't wait for clustermesh to delete local stale backends #28745

Commits on Oct 30, 2023