-
Notifications
You must be signed in to change notification settings - Fork 966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidation of CRD Scaling Issues #2895
Comments
|
One dimension to consider here is that we expect at some point in time, we will have the # of CRDs per cluster as a scalability threshold. Where this threshold will land (500 or 5000) is very relevant to our discussions around this topic and hopefully with scalability improvements like the OpenAPI spec lazy-marshalling or discovery client improvements, it should be feasible to put such thresholds at a higher level. But even so, that threshold will impose a theoretical limit on the combination of Crossplane providers we can install and use in a cluster, i.e., at some point in time that threshold <= total # of CRDs available in all Crossplane providers available. And if the threshold is high enough, (or even if there is no threshold defined but if the Kubernetes control-plane and its clients are optimized enough), we may not feel the need to tackle with these performance problems anymore because we would consider all practical combinations of provider installations as feasible, at least with an appropriately sized cluster. So, I believe there is still room for upstream client-side and server-side improvements that we should pursue. But in #2649 we have already considered two alternative approaches as workarounds that will, as @negz stated here, delay these issues:
We are now optimistic about the server-side and client-side optimizations that will make the control-plane more scalable. But whatever a future threshold might be and whatever further optimizations/improvements are implemented, what is our scalability threshold/goal for the (active) number of managed resource types in a cluster with a given size? This is in my opinion valuable because:
|
|
Kubernetes Design: Discovery Client Cache Busting Just dropping a link to the above doc, which proposes an improvement for discovery caching in client-go. This wouldn't help with the time taken to perform discovery, though the doc does touch on future improvements that would make a small difference for us. Rather this proposal ensures discovery would happen significantly less often for each client, which should make it significantly less painful. |
|
@ulucinar I would like to point out that a Sizing Guide would not be very applicable for managed cluster users like us. We are using AKS clusters, and AKS does not provide anyway to configure the control plane. Also when you start using multiple providers in the same cluster, crd numbers start stacking up. This means even if you do not use terrajet providers - which are the main aggressor here - You can still get into trouble. In light of these facts, a selective crd/controller deployment would be invaluable for Crossplane users. For our use cases We know that we are using at most ~%10 of the crds or controllers that are available in our Crossplane clusters. |
|
Just want to chime in and stress the point others have made here that having 1000, 2000 or 3000 crds around seems unsustainable, regardless of performance improvements in the wider ecosystem. Not to mention all the XRD people will create on top of the ones that come with the providers. Would it not be better to be able to do something like the following: apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
name: provider-jet-gcp
spec:
package: "crossplane/provider-jet-gcp:v0.2.0-preview"
apiGroups:
- accessapproval.gcp.jet.crossplane.io
- cloudplatform.gcp.jet.crossplane.io
- container.gcp.jet.crossplane.ioThese issues are holding back our adoption of Crossplane since we need to use several of the terrajet providers for multiple clouds. |
|
Any folks that are interested in disabling/enabling certain CRDs and API groups when installing a crossplane provider should put a thumbs up 👍 on this tracking issue: #2869 |
|
Despite the excellent one-pager having been merged, I'm going to reopen this as a place to continue the conversation around these performance issues. |
|
I'm actually having trouble replicating the client-side issues at the moment. 🤔 I believe I'm using a client with 50 qps / 300 burst per https://github.com/kubernetes/kubernetes/blob/release-1.23/staging/src/k8s.io/kubectl/pkg/cmd/cmd.go#L92. I have ~1,600 CRDs and ~350 API versions but with a cold cache my VM seems to complete discovery within about a second. 🤔 $ kubectl --kubeconfig=$HOME/.kube/config.eks version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"archive", BuildDate:"1980-01-01T00:00:00Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"linux/arm64"}
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.9-eks-a64ea69", GitCommit:"540410f9a2e24b7a2a870ebfacb3212744b5f878", GitTreeState:"clean", BuildDate:"2022-05-12T19:15:31Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl --kubeconfig=$HOME/.kube/config.eks get crd|wc -l
1577
$ kubectl --kubeconfig=$HOME/.kube/config.eks api-versions|wc -l
343
$ rm -rf ~/.kube/cache
$ time kubectl --kubeconfig=$HOME/.kube/config.eks get nodes
NAME STATUS ROLES AGE VERSION
ip-192-168-103-188.us-west-2.compute.internal Ready <none> 8h v1.22.9-eks-810597c
ip-192-168-63-15.us-west-2.compute.internal Ready <none> 8h v1.22.9-eks-810597c
ip-192-168-93-0.us-west-2.compute.internal Ready <none> 8h v1.22.9-eks-810597c
kubectl --kubeconfig=$HOME/.kube/config.eks get nodes 0.17s user 0.21s system 35% cpu 1.067 total
$ du -hs ~/.kube/cache
7.5M /home/negz/.kube/cache
$ speedtest
Retrieving speedtest.net configuration...
Testing from CenturyLink (REDACTED)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Ziply Fiber (Seattle, WA) [6.71 km]: 12.355 ms
Testing download speed................................................................................
Download: 443.00 Mbit/s
Testing upload speed......................................................................................................
Upload: 410.68 Mbit/sI'm running the test on a NixOS VM on my home wifi (https://github.com/negz/nix) with a gigabit internet connection. I feel like without (much?) rate limiting in play this kind of makes sense. I wouldn't imagine downloading roughly 7.5MB of text would take very long with a client-side capacity of ~50MB/s, especially presuming |
|
I've run the test on my M1 Mac on home wifi with a gigabit connection. Below were my results: $ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.2", GitCommit:"f66044f4361b9f1f96f0053dd46cb7dce5e990a8", GitTreeState:"clean", BuildDate:"2022-06-15T14:14:10Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.9-eks-a64ea69", GitCommit:"540410f9a2e24b7a2a870ebfacb3212744b5f878", GitTreeState:"clean", BuildDate:"2022-05-12T19:15:31Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.24) and server (1.22) exceeds the supported minor version skew of +/-1
$ rm -rf ~/.kube/cache
$ time kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-192-168-103-188.us-west-2.compute.internal Ready <none> 46h v1.22.9-eks-810597c
ip-192-168-63-15.us-west-2.compute.internal Ready <none> 46h v1.22.9-eks-810597c
ip-192-168-93-0.us-west-2.compute.internal Ready <none> 46h v1.22.9-eks-810597c
kubectl get nodes 0.43s user 1.11s system 8% cpu 18.546 total
$ du -sh ~/.kube/cache
3.4M /Users/tnthornton/.kube/cache
$ speedtest
Retrieving speedtest.net configuration...
Testing from CenturyLink (REDACTED)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Mimo Connect Ltd (Hillsboro, OR) [27.25 km]: 13.147 ms
Testing download speed................................................................................
Download: 369.93 Mbit/s
Testing upload speed......................................................................................................
Upload: 179.55 Mbit/s |
|
@tnthornton very strange - your setup is almost identical to mine, but your test looks like it took 18 seconds to my 1. Only possible differences I can spot are different kubectl versions, and the fact that I'm running from a Linux VM rather than macOS. |
|
@negz Results with same kubeconfig on different client version (with burst + QPS): Speedtest: |
|
Running on my Linux box (Pop!_OS) 22.04 LTS x86_64 k version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.2", GitCommit:"f66044f4361b9f1f96f0053dd46cb7dce5e990a8", GitTreeState:"clean", BuildDate:"2022-06-15T14:22:29Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.9-eks-a64ea69", GitCommit:"540410f9a2e24b7a2a870ebfacb3212744b5f878", GitTreeState:"clean", BuildDate:"2022-05-12T19:15:31Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.24) and server (1.22) exceeds the supported minor version skew of +/-1
rm -rf ~/.kube/cache
time kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-192-168-103-188.us-west-2.compute.internal Ready <none> 47h v1.22.9-eks-810597c
ip-192-168-63-15.us-west-2.compute.internal Ready <none> 47h v1.22.9-eks-810597c
ip-192-168-93-0.us-west-2.compute.internal Ready <none> 47h v1.22.9-eks-810597c
kubectl get nodes 0.35s user 0.21s system 16% cpu 3.446 total
du -sh ~/.kube/cache
6.0M /home/decoder/.kube/cache
Retrieving speedtest.net configuration...
Testing from Clouvider Limited (**.***.***.**)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by 23M GmbH (Frankfurt Am Main) [0.89 km]: 9.183 ms
Testing download speed................................................................................
Download: 93.56 Mbit/s
Testing upload speed......................................................................................................
Upload: 20.45 Mbit/s |
|
Looks like mine took about ~15 seconds: My speedtest numbers form google: |
|
For historical context, all the folks posting results here are testing against the same EKS server I used. We're definitely noticing a trend of the slowdown being on MacOS. I ran my original test above in qemu. I was running a NixOS guest with an M1 Mac host. If I repeat the test on Linux I'm still seeing it complete in around a second, but on Mac I see it take 12-18 seconds: $ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"archive", BuildDate:"1980-01-01T00:00:00Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.9-eks-a64ea69", GitCommit:"540410f9a2e24b7a2a870ebfacb3212744b5f878", GitTreeState:"clean", BuildDate:"2022-05-12T19:15:31Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
$ rmd ~/.kube/cache
$ time k get nodes
NAME STATUS ROLES AGE VERSION
ip-192-168-103-188.us-west-2.compute.internal Ready <none> 47h v1.22.9-eks-810597c
ip-192-168-63-15.us-west-2.compute.internal Ready <none> 47h v1.22.9-eks-810597c
ip-192-168-93-0.us-west-2.compute.internal Ready <none> 47h v1.22.9-eks-810597c
kubectl get nodes 0.28s user 0.42s system 5% cpu 12.459 total |
|
From Turkey, on a Mac with $ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:25:17Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.9-eks-a64ea69", GitCommit:"540410f9a2e24b7a2a870ebfacb3212744b5f878", GitTreeState:"clean", BuildDate:"2022-05-12T19:15:31Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
$ rm -rf ~/.kube/cache
$ time kubectl get nodes
I0624 00:50:43.001250 14168 request.go:665] Waited for 1.198823843s due to client-side throttling, not priority and fairness, request: GET:https://A6285F016C78529CEAB0DFC3AA564637.sk1.us-west-2.eks.amazonaws.com/apis/marketplaceordering.azure.jet.crossplane.io/v1alpha1?timeout=32s
I0624 00:50:53.200546 14168 request.go:665] Waited for 11.396419185s due to client-side throttling, not priority and fairness, request: GET:https://A6285F016C78529CEAB0DFC3AA564637.sk1.us-west-2.eks.amazonaws.com/apis/servicediscovery.aws.jet.crossplane.io/v1alpha1?timeout=32s
I0624 00:51:03.397846 14168 request.go:665] Waited for 21.59258613s due to client-side throttling, not priority and fairness, request: GET:https://A6285F016C78529CEAB0DFC3AA564637.sk1.us-west-2.eks.amazonaws.com/apis/batch.azure.jet.crossplane.io/v1alpha1?timeout=32s
I0624 00:51:13.597394 14168 request.go:665] Waited for 31.791645943s due to client-side throttling, not priority and fairness, request: GET:https://A6285F016C78529CEAB0DFC3AA564637.sk1.us-west-2.eks.amazonaws.com/apis/bigquery.gcp.jet.crossplane.io/v1alpha1?timeout=32s
I0624 00:51:23.797334 14168 request.go:665] Waited for 41.991043352s due to client-side throttling, not priority and fairness, request: GET:https://A6285F016C78529CEAB0DFC3AA564637.sk1.us-west-2.eks.amazonaws.com/apis/cloudtasks.gcp.jet.crossplane.io/v1alpha1?timeout=32s
NAME STATUS ROLES AGE VERSION
ip-192-168-103-188.us-west-2.compute.internal Ready <none> 2d v1.22.9-eks-810597c
ip-192-168-63-15.us-west-2.compute.internal Ready <none> 2d v1.22.9-eks-810597c
ip-192-168-93-0.us-west-2.compute.internal Ready <none> 2d v1.22.9-eks-810597c
kubectl get nodes 0.82s user 1.32s system 4% cpu 50.878 totalUpdate $ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.2", GitCommit:"f66044f4361b9f1f96f0053dd46cb7dce5e990a8", GitTreeState:"clean", BuildDate:"2022-06-15T14:22:29Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.9-eks-a64ea69", GitCommit:"540410f9a2e24b7a2a870ebfacb3212744b5f878", GitTreeState:"clean", BuildDate:"2022-05-12T19:15:31Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.24) and server (1.22) exceeds the supported minor version skew of +/-1
$ rm -rf ~/.kube/cache
$ time kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-192-168-103-188.us-west-2.compute.internal Ready <none> 2d v1.22.9-eks-810597c
ip-192-168-63-15.us-west-2.compute.internal Ready <none> 2d v1.22.9-eks-810597c
ip-192-168-93-0.us-west-2.compute.internal Ready <none> 2d v1.22.9-eks-810597c
kubectl get nodes 0.78s user 1.30s system 12% cpu 17.072 total |
|
Another Linux user checking in: |
|
Mac M1: Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:51:05Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.9-eks-a64ea69", GitCommit:"540410f9a2e24b7a2a870ebfacb3212744b5f878", GitTreeState:"clean", BuildDate:"2022-05-12T19:15:31Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
1577
343
NAME STATUS ROLES AGE VERSION
ip-192-168-103-188.us-west-2.compute.internal Ready <none> 2d2h v1.22.9-eks-810597c
ip-192-168-63-15.us-west-2.compute.internal Ready <none> 2d2h v1.22.9-eks-810597c
ip-192-168-93-0.us-west-2.compute.internal Ready <none> 2d2h v1.22.9-eks-810597c
kubectl --kubeconfig=./kc get nodes 0.68s user 1.34s system 12% cpu 15.929 total
3.4M /Users/nate/.kube/cache
Testing download speed................................................................................
Download: 16.65 Mbit/s
Testing upload speed......................................................................................................
Upload: 7.53 Mbit/s(Not my home office network) |
|
On the API server memory consumption front, I did some not very scientific tests with https://github.com/kcp-dev/kcp today and found that it appears to use about half the memory of the Kubernetes API server. I'm not sure why, but the fact that it's possible to serve CRDs while using less memory is possible. Rough resident memory observations from the API server (v1.24.0 in
Resident memory kcp-dev/kcp@6b9b240 (
|
|
Dropping a breadcrumb to this write-up from @negz about the current state of performance issues, in the context of the desire to disable/limit/select which CRDs will get installed: |
|
I think these performance tests would be more useful if always run from the same source server in the cloud to the same target cluster for comparable results. |
|
@jonnylangefeld The idea was to (casually) test multiple different clients against the same server. We all used the same EKS API server. To my knowledge everyone experiencing slowness was being affected by kubernetes/kubernetes#110753. |
|
Crossplane does not currently have enough maintainers to address every issue and pull request. This issue has been automatically marked as |
|
/fresh |
|
Another CRD scaling issue is related to K8S tooling, such as intellij currently failing to scale to crossplane gcp upbound provider crds, see https://youtrack.jetbrains.com/issue/IDEA-306067/Kubernetes-plugin-fails-to-import-context-fails-on-YAML-document-exceeds-the-limit |
|
Coming up on a year of opening. Is there any sort of milestone that can include this? Im honestly just trying to get a feel for crossplane and only interested in the S3 portion of the native aws provider |
|
The memory/API issues become more apparent on lighter deployments where crossplane is effectively only being used to orchestrate one resource, say S3 buckets (which is a common pattern where I've worked). Not sure what the solution is in this scenario, though I'd like to note that crossplane is by far the best solution I've found to cross platform orchestration - absolutely everything else I've tried has had a major gotcha blocking use, and while this isn't a blocker it's unhelpful for local testing clusters or "edge" type deployments where your usecase is just provisioning a remote logging bucket off platform for compliance reasons. I appreciate this usecase is like cracking a wallnut with a sledgehammer, and crossplane by default is a much larger scoped project - but it's a potential usecase that might be popular. It's not uncommon to have "kubernetes" first topology where the only exceptions are blob storage buckets and cloud managed databases. |
|
hey everyone, appreciate the continued interest and enthusiasm on this issue! I wanted to start off by confirming that this issue is important to the community as a whole, and we're still seeing a resolution of this issue in a complete way to be a priority. After all the performance related work we've done in upstream Kubernetes, we were expecting to have most use cases that impact control plane performance completely resolved. A great reference for these efforts in in this design doc, especially the sections mentioning minimum versions of Kubernetes that have all the fixes we've been pushing upstream. Any folks that are still running into performance issues should feel free to share some specifics about their environment here (e.g. k8s version, number of nodes, memory/cpu per node, etc.), so we can get a better idea about what envs may still be affected. That being said, we also understand that the experience and ergonomics around a high number of CRDs for use cases that only need a handful are also still important to the community - we still see a lot of interest on #2869 - so improving this experience is also still a priority for the project too. One possible solution here would be to break up the providers into smaller installable units to support easier installation selectivity, and then rely on the package manager's dependency resolution to ensure everything required is still present at runtime. That's just one possible approach - we'll need to figure out the best way to tackle this in the roadmap. We can also make some more clear documentation and guidance so folks don't run into scenarios that render their cluster unusable in the first place. That would be nice to avoid, especially for first time crossplane users just kicking the tires. So be on the look out for:
Thanks for all the great feedback! 🙇 |
|
imo, The only way to do this on the up-and-up is to contribute to upstream K8s. If CNCF Crossplane needs to enable thousands of CRDs in CNCF K8s, then Crossplane should work with K8s to make it work. jmtc. |
|
Piling on thousands of CRDs also has implications for other components in the cluster i.e. anything that touches CRDs will need additional resources allocated. As an example, we had to bump our Argo CD instance resource allocation to 3x the normal because we added Upbound's I like the idea of breaking down providers into subsets or perhaps some crossplane core mechanism to filter out unused CRDs. |
|
I'm going to close this analysis-focused issue. The problem this issue outlines is still very real and still our top priority. Going forward we've committed to (somehow) installing fewer CRDs. Exactly how we'll do that is still in design, but I've created #3852 to track the effort. |
What problem are you facing?
Previously in crossplane/terrajet#47, #2649 and kubernetes/kubernetes#105932, we have observed high resource consumption in the Kubernetes API Server when over 500 CRDs are installed. Especially with local
kindclusters, installing provider-jet-aws with over 700 CRDs could render the cluster unresponsive on machines with relatively low CPU resources, due to the high CPU consumption. And with the managed Kubernetes offerings such as GKE, EKS and AKS, with varying levels dependent on the cluster configurations, we had observed API service disruptions as we installed 100s or 1000s of CRDs. While discussing those issues as the Crosplane community, we had tried a couple of different workarounds detailed in #2649, such as installing CRDs in smaller batches and giving the API Server some time to finish aggregating their OpenAPI specs.Meanwhile after getting in touch with the upstream Kubernetes maintainers, we learned that they already were working on kubernetes/kube-openapi#251, which would introduce lazy marshalling for the aggregated OpenAPI specs of the installed CRDs served from the
/openapi/v2endpoint. Collecting profiling data and builingkindimages with that PR and testing them in localkindclusters, we observed substantial improvements in peak resource consumption. However, as discussed in #2649 there were still some open points left:Although we could not actually observe the effects of the aggregated OpenAPI spec lazy-marshalling optimization in a managed Kubernetes offering [1], we expected it to alleviate control plane service disruptions. Starting with Kubernetes versions
v1.20.13,v1.21.7,v1.22.4,v1.23.0this fix is available and as of this writing:v1.23.1andv1.22.4are available for most GKE regions via therapidGKE release channel, andv1.22.4andv1.21.7are available for most AKS regions.This allowed us to test the hypothesis with the lazy-marshalling optimization and we have recently made a number of tests with the managed Kubernetes offerings. Please note that the following are reports from single experiments, and only server-side issues are reported. For all clusters even if there is no API service disruption, we always experience client-side throttling.
Related issues
GKE Zonal
On a GKE
v1.23.1zonal cluster with three worker nodes of typee2-medium, it took56 minfor theProviderRevisionto acquire theHealthycondition for a provider-jet-aws installation with 763 CRDs. During this period, the cluster needed at least one repair, and there was over 50 min of service disruption. After a slight confusion regarding worker-node HA and control-plane HA, this thread suggests using regional clusters for control-plane high availability, and states that control-plane service disruptions are expected for zonal GKE clusters as GKE is scaling the managed control-plane under load. However, as discussed next, the situation with a regional cluster was not good either.GKE Regional
A
v1.23.1cluster was provisioned in theus-east1region with onee2-mediumnode in each availability zone, with a total of 3 worker nodes in 3 zones. Although it took only~ 150 sfor theProviderRevisionto acquire theHealthycondition for theprovider-jet-aws@v0.4.0-previewinstallation, the regional GKE cluster went through the repairing mode at least three times afterwards. In between these "RUNNING" and "RECONCILING" states of the regional cluster, we have observed different kinds of errors in response to thekubectlcommands run, notably connection errors and I/O timeouts while reaching the API server. It took over an hour for the cluster to stabilize. However, the control-plane was intermittently available for short periods during this period.EKS
When we performed these tests, none of the Kubernetes versions consuming the lazy-marshalling optimization were available for EKS clusters. However, we used a
v1.21.5-eks-bc4871b3-worker node cluster to test aprovider-jet-aws@v0.4.0-previewinstallation. Grafana dashboards indicate a control-plane with two members from the very beginning beforeprovider-jet-awspackage is installed into the system. It took theProviderRevisionabout 2 min to acquire theHealthycondition. The cluster stayed stable after the provider installation. A reexamination of the Grafana dashboards after 3 hours reveals that the two-member control-plane was scaled up and down during this period. Apart from the client-side throttling issues, control-plane is stable with aprovider-jet-awsinstallation.AKS
A
v1.22.4cluster was provisioned with two worker nodes. It took~112 sfor theProviderRevisionto acquire theHealthycondition for aprovider-jet-aws@v0.4.0-previewinstallation. The cluster stayed stable after the provider installation. Also installedprovider-jet-azure@v0.7.0-previewon this cluster making the total CRD count1430. It took91 sforprovider-jet-azure's ProviderRevision to acquire theHealthycondition. It takes~ 40 sfor the discovery client to refresh its cache. And we observe some pod restarts due to timeouts. But the cluster stays stable. However, addingprovider-jet-gcp@v0.2.0-previewas a third provider has finally "breaks" the cluster. Crossplane is having a hard time installing the package.We also repeated these experiments with a two worker node AKS
v1.20.13with similar results: It took107 sfor the ProviderRevision to acquire theHealthycondition. The cluster stayed stable after the provider installation. Reaching a total of 780 CRDs, the cluster has no severe issues. Client-side throttling kicks in as expected.We also need a deeper understanding on which metrics are being employed by the Cloud providers when deciding to scale up their control planes. Usual metrics are CPU/memory consumption and/or utilization of the control-plane components but they could as well be depending on metrics such as the number of installed CRDs and/or API Server response time SLIs, etc.
Another dimension where we need clarification is why we observe control plane service disruptions when the managed control plane is scaling up. It's perfectly expected, at some point as we continue adding more CRDs to the cluster, the control plane will need to scale up but if it has high-availability features, then why are we not isolated from the effects?
We are now at a point where there are several upstream Kubernetes issues and PRs, Crossplane issues and discussions happening at different places, which makes it hard to figure out the exact problems and root causes. Also under #2649, there is still some discussion going on. This motivates us to open this issue so that we can decide how to move on.
[1]: Currently, we are not aware of a configuration option or another mechanism that would allow us to replace the
kube-openapiwith a custom build for an AKS/EKS/GKE cluster.How could Crossplane help solve your problem?
We can have a one-pager that includes:
This would help all stakeholders to see all the data we have, reproduce the problems quickly and make more informed decisions about how to move forward. The tooling could be part of the Kubernetes Conformance tests at one point but it’s not in the scope of this issue.
Another goal we should work towards is to get CRDs as a scalability dimension in the Kubernetes thresholds document as discussed here.
The text was updated successfully, but these errors were encountered: