New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump context deadline for ProviderRevision controller to 3 mins #2570
Conversation
Fixes crossplane#2564 Signed-off-by: Hasan Turken <turkenh@gmail.com>
Hi @turkenh, |
Another question is: Although I do not think that it's currently needed, but will we consider another approach for registering CRDs in revision controller? |
@ulucinar yes, I didn't observe any issues with controllers (5/5 tries) except the following throttling log (this one is running for 15mins now):
|
Interesting, thanks for sharing your observation. Is your cluster also a Kind cluster? I will give provider-tf-azure a retry. |
Yes, it was a kind cluster. |
GKE (autopilot) is performing much worse. APIServer stopped responding for some requests while the package is being installed:
And now observing these events with this change:
|
It is the same for a standard (not autopilot) GKE cluster. I think we broke something on GKE side, my clusters are no longer responding and clusters are in this state 😕 So, we definitely need to understand what is really going on with this amount of CRD types. |
However, bumping the context deadline helped on my local kind cluster and I think it still makes sense to bump the context deadline to give enough time to install that many CRDs as long as API server keeps working. I am just wondering if we should give some more wiggle room and set it as 5 mins instead? 🤔 |
Consider also the case where multiple Terraform-based providers are provisioned on a single cluster for a multi-cloud scenario :P But I think installing every CRD we have generated for a provider is not the way to go as it looks like we are putting too much stress on the API server (something to be investigated) and it's not efficient, some way of selectively installing CRDs and starting the associated controllers sounds like the approach we should take. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we have discussed, this is definitely not the complete answer for the problem of installing CRDs on the order of hundreds. Increasing the timeout doesn't address all concerns, like dealing with slow api-servers or installing multiple providers with thousands CRDs. But bumping it from 1 to 3 is a sane increase IMO that will get us to a place where at least you can install ~500 CRDs in most clusters. When the real solutions come in, like sharding or even maybe fix in apiserver, context timeout of 1 minute could still be a problem. So, we're just moving one stone out of the way without much compromise, like having it +10 mins.
Thanks @turkenh !
Successfully created backport PR #2571 for |
Description of your changes
This PRs bumps the context deadline for
providerrevision
controller from 1 min to 3 mins. The current deadline is not sufficient when the provider package contains a high number of CRDs. Even it does not make sense to bump the deadline indefinitely to support an infinite number of CRDs, we expect that supporting around 1000 CRDs should be a good expectation, given provider-tf-aws contains around 750.I have experimented installing
turkenh/provider-tf-aws:daf1e9f7-2
with this change couple of times and noticed that it is installed in around 2 mins. I believe 3 mins should be a good value leaving some room for different environments and around 250 more CRDs (to reach our rough target of 1000).Fixes #2564
I have:
make reviewable
to ensure this PR is ready for review.backport release-x.y
labels to auto-backport this PR if necessary.How has this code been tested
Install the provider package in the issue description: