Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase or Decrease polling rate for instance in a final state ready or failed. #53

Merged
merged 13 commits into from
Jul 3, 2024

Conversation

santiago-ventura
Copy link
Contributor

In order to prevent the Operator hits the CF rates limits, we have introduced two new annotations, service-operator.cf.cs.sap.com/polling-interval-ready and service-operator.cf.cs.sap.com/polling-interval-fail to facilitate the increase or decrease of the re-queue time duration when a Custom Resources is in a State final Ready or Failed.

@RalfHammer RalfHammer self-requested a review June 20, 2024 12:00

func TestReconcilerHelpers(t *testing.T) {
RegisterFailHandler(Fail)
// RunSpecs(t, "Reconciler Helpers Suite")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to activate this test?

RalfHammer
RalfHammer previously approved these changes Jun 27, 2024
@santiago-ventura santiago-ventura marked this pull request as ready for review June 27, 2024 11:20
@santiago-ventura santiago-ventura requested review from TheBigElmo and removed request for bKiralyWdf June 27, 2024 11:21
@santiago-ventura
Copy link
Contributor Author

Hi @TheBigElmo, please review the PR.
Ralf has already completed his review.
Thxs!

@TheBigElmo
Copy link
Contributor

Why would you set this on a CRO level, but not in general on startup of the controller, so you can override the default values? Do you have numbers for end users, when to adjust it (e.g. number of resources)? Are the CF rate limits maybe too low?

@santiago-ventura
Copy link
Contributor Author

Hi @TheBigElmo

I understand your question about why not set these time durations for general use on the controller's startup.

In our use case, we might want different timeout values depending on which instance we're creating and its purpose.

So, by making it configurable per instance, we can always set it when we create the resource.

We can differentiate it, so we took this approach because of the ability to decouple it per resource.

@TheBigElmo
Copy link
Contributor

That's completely fine to do this on a CRO level, but it should have a possibility on startup too.

@RalfHammer
Copy link
Contributor

That's completely fine to do this on a CRO level, but it should have a possibility on startup too.
We would prefer to implement this possibility on startup in a separate PR.

@santiago-ventura
Copy link
Contributor Author

@TheBigElmo, you are right; it is possible to have both options.
We have more backlog items and can introduce that feature in another PR next.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mentioning the default values/behavior is helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes to the document.
This includes information about the default values and behavior under Default Requeue After Interval.

@TheBigElmo
Copy link
Contributor

ok, let's do it with another PR next.
What about the questions:

Do you have numbers for end users, when to adjust it (e.g. number of resources)? Are the CF rate limits maybe too low?

@santiago-ventura
Copy link
Contributor Author

santiago-ventura commented Jul 3, 2024

ok, let's do it with another PR next. What about the questions:

Do you have numbers for end users, when to adjust it (e.g. number of resources)? Are the CF rate limits maybe too low?

Hi @TheBigElmo

Here is the number we have managed so far without reconciling the limit. It reconciles every 10 min at least; we would have with caching six requests per instance per hour. Keep in mind that Spaces are much more wasteful, taking 240 requests per hour.

So, in theory, we would have problems at around 10k instances per user, assuming no other workload. If you add keys, it becomes more like 5k.

In practice, we have problems around 200 spaces, which makes sense given that it's 200*240 > 40k requests.

We currently have a backlog item for testing how many requests the operator will execute with 10.000 CRs.

Copy link
Contributor

@TheBigElmo TheBigElmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, please create an additional pr for the controller flags.

@TheBigElmo TheBigElmo merged commit a778a14 into main Jul 3, 2024
6 checks passed
@TheBigElmo TheBigElmo deleted the decrease-polling-rate-for-instance-in-a-final-state branch July 3, 2024 11:36
@santiago-ventura
Copy link
Contributor Author

Hi @TheBigElmo
Thank you!

Yes, our product owner will create a backlog item for the controller flags.
We currently have other changes in the making. We will work on controller flags in the following sprints
to have the CF service operator in production as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants