Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterMesh/KVStoreMesh Readiness and Bootstrap QPS #30361

Merged
merged 12 commits into from Mar 6, 2024

Conversation

thorn3r
Copy link
Contributor

@thorn3r thorn3r commented Jan 22, 2024

This adds a readiness probe to the kvstoremesh container which will report ready once all remote clusters have been synchronized. A per-cluster timeout on initial sync prevents a deadlock in a situation where the pod is restarted simultaneously on multiple clusters. In addition to the per-cluster timeout, a global timeout is also implemented to catch any other failures.

To increase the speed of the initial synchronization for both clustermesh-apiserver and kvstoremesh, a new kvstore-opt was added, etcd.bootstrapQps, which allows an initial rate-limit to be set for the etcd client. Once the container is ready, the standard rate-limit configured via etcd.qps is applied. etcd.bootstrapQps is set to 10,000 for both containers in the chart.

Some rough benchmarking with simulated load provided promising results:

  • This test was run with 2 connected clusters
  • Load on clustermesh-apiserver/kvstoremesh was simulated by generating 10,000 CiliumIdentities to be synchronized. cilium-operator was disabled to prevent garbage collection

For clustermesh-apiserver, time-to-ready dropped from ~500s to ~14s (roughly a 60x improvement)
apiserver-10000identities

For kvstoremesh, time-to-ready dropped from ~100s to 13s (roughly a 10x improvement)
kvstoremesh-10000identities

Additional details are in each commit message.

Add a readinessProbe to the kvstoremesh container that reports initial synchronization status to support configuring a separate, initial rate-limit to be used while synchronizing. Both clustermesh-apiserver and kvstoremesh now use a high initial rate-limit to decrease start time.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 22, 2024
@thorn3r thorn3r force-pushed the clustermeshQPS branch 3 times, most recently from 185578a to aacf0d6 Compare January 22, 2024 21:48
@thorn3r
Copy link
Contributor Author

thorn3r commented Jan 22, 2024

/test

@thorn3r thorn3r force-pushed the clustermeshQPS branch 2 times, most recently from aaaa15e to 518cf09 Compare January 23, 2024 19:14
@thorn3r
Copy link
Contributor Author

thorn3r commented Jan 23, 2024

/test

@thorn3r
Copy link
Contributor Author

thorn3r commented Jan 24, 2024

/test

Add a 'synced' field to remoteCluster in kvstoremesh to track its
synchronization to the local kvstore. This will be used in a following
commit to add a ReadinessProbe to the kvstoremesh container.

A timeout on remote cluster connection is implemented to prevent
blocking readiness forever if we are unable to establish a connection to
a remote cluster. By default this is set to 15s and can be configured via
the option "per-cluster-ready-timeout".

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Clustermesh-apiserver's Health API Server has been moved to a new
'health' package and modularized for reuse with KVStoreMesh.

A slice of EndpointFunc must be provided to the HealthAPIServerCell to
be registered as endpoints the Health API will serve.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
The /healthz endpoint for ClusterMesh is not currently used. If someone
were to configure a livenessProbe targeting this endpoint, it would
cause clustermesh-apiserver to fail if the kube-apiserver was unhealthy.
This would prevent new agents in other clusters from being able to
synchronize data.

For now, we will remove this endpoint to prevent this scenario.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Make common.ClusterMesh an interface so that it can be mocked in
integration tests. The original type is now an exported struct,
common.clusterMesh.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Register a '/readyz' endpoint for KVStoreMesh that exposes the status of
the initial synchronization of remote clusters to the local kvstore.
The endpoint will return a 500 until the synchronization has completed,
at which point it will respond with 200.

A new cell added to KVStoreMesh executes a OneShot job to wait for
remote cluster synchronization and signal completion to the
HealthAPIServer via the shared SyncState. There are 2 configurable
timeouts to prevent an unreachable cluster or synchronization error
from blocking readiness forever and potentially causing a cascading
failure as more KVStoreMesh instances restart:

- 'per-cluster-ready-timeout' is the duration to wait for a remote cluster
connection before removing it from the readiness check.

- 'global-ready-timeout' is the duration to wait for all remote clusters
  to synchronize before cancelling the wait and reporting ready.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, lgtm!

install/kubernetes/cilium/values.yaml.tmpl Outdated Show resolved Hide resolved
Add a readinessProbe to the kvstoremesh container in the
clustermesh-apiserver deployment utilizing the '/readyz' endpoint of
the health API.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
The following commit will add new functionality to APILimiter and
coverage with tests using testify. To avoid dropping in new tests that
don't match the rest of the suite, all of the pkg/rate tests have been
refactored to use the testify framework.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Add 2 new methods to APILimiter that allow the RateLimit and BurstLimit
to be adjusted externally.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Add a new kvstore-opt 'etcd.bootstrapQps' to configure an initial client
rate limit to use until the caller signals that bootstrap is complete.

This is used in ClusterMesh and KVStoreMesh to allow setting a higher
rate limit during the initial synchronization to etcd. To prevent
changing behavior for other components using etcd clients, this option
will only affect components providing a channel via
extraOptions.BootstrapComplete, which should be closed to signal to the
client that bootstrap is complete.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Remove duplicate import of clustermesh/types from clustermesh/common

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Similar to KVStoreMesh, the settings for clustermesh-apiserver's
readinessProbe should be configurable via helm values.

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
@thorn3r
Copy link
Contributor Author

thorn3r commented Mar 5, 2024

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 5, 2024
@joestringer
Copy link
Member

Should this be release-note/major for bootstrapping performance improvements + better health metrics/observability?

@joestringer
Copy link
Member

joestringer commented Mar 6, 2024

Looks like there's still three unresolved conversations that are blocking the merge. The interface at the bottom also reports that tophat review is necessary, but the codeowners checks in the top right don't specify that. Not sure if it's required, but I can check back later once the unresolved conversations are resolved and provide a rubber stamp if necessary.

@thorn3r thorn3r added release-note/major This PR introduces major new functionality to Cilium. and removed release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels Mar 6, 2024
@thorn3r
Copy link
Contributor Author

thorn3r commented Mar 6, 2024

Thanks @joestringer, resolved those 👍 Looks like it is still requiring tophat review, and interestingly:

Merging is blocked
Merging can be performed automatically with 0 approving reviews.

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Providing the required 0 reviews to satisfy our GitHub overlords.

(We may want to update CODEOWNERS for pkg/rate if we expect further changes there, would skip tophat requirement in future)

@joestringer joestringer added this pull request to the merge queue Mar 6, 2024
Merged via the queue into cilium:main with commit 7e97ea6 Mar 6, 2024
63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/major This PR introduces major new functionality to Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants