Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent reporting of paused resources #10130

Open
JoelSpeed opened this issue Feb 9, 2024 · 33 comments · May be fixed by #10364
Open

Consistent reporting of paused resources #10130

JoelSpeed opened this issue Feb 9, 2024 · 33 comments · May be fixed by #10364
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@JoelSpeed
Copy link
Contributor

What would you like to be added (User Story)?

As a user of Cluster API, I would like to see a condition on each resources status that tells me if a resource is paused, so that I know that controllers have seen my pause request and acknowledged it before operate on the resources manually.

Detailed Description

Kubernetes is a distributed and event driven system, with a tendency for caching. This means that, if I set spec.paused on a resource, the controller for it may be processing an older event from the same resource, and it may take some amount of time before it reconciles the event relating to my change to pause the resource.

Before I make my changes, I want to be sure that the controller has seen my request to pause the resource, so that I am certain that no further changes will be made.

To do this, I think we should make all controllers add a condition that shows whether or not the controller believes they are paused or not.

In the general running case:

conditions:
- type: Paused
  status: False
  reason: ResourceNotPaused
  message: "Resource is operating as expected"

However, when paused, I want to see

conditions:
- type: Paused
  status: True
  reason: ResourcePaused
  message: "Resource has been paused by spec.paused"

This will allow integrations with kubectl wait to wait for the condition before continuing.

Anything else you would like to add?

Alternatively, you could set this as a pausedGeneration but I feel a condition is more expressive for this use case.

It would be good to define this as part of the contracts so that every CAPI controller eventually sets this in a consistent way.

Label(s) to be applied

/kind feature
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 9, 2024
@sbueringer
Copy link
Member

Just for my understanding:

This will allow integrations with kubectl wait to wait for the condition before continuing.

kubectl wait without further configuration already looks for the Paused condition? Or would we pass the condition name into the kubectl wait command?

@JoelSpeed
Copy link
Contributor Author

I'd expect the end result would be users could do kubectl wait --for=condition=Paused=true --timeout=30s in scripts to make sure that within 30s the controllers acknowledge the pause has occurred

There are plenty of examples through the kube docs of waiting --for=condition=Ready=true, this would be similar but I guess, opposite in a way

@vincepri
Copy link
Member

vincepri commented Feb 9, 2024

Discussed with Joel a bit more about this issue before opening it, while trying to understand how to detect that the controller actually acknowledged that a resource was paused it seemed almost impossible to do without a new field or condition (easier to add). Another problem this solves is pausing/unpausing against stale caches.

Generally +1 on having some sort of contract for the paused field, which can give users a signal to proceed with other operations.

@vincepri
Copy link
Member

vincepri commented Feb 9, 2024

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 9, 2024
@sbueringer
Copy link
Member

Sounds reasonable in general.

Thinking a bit about it, something like:

  • set pause + take generation from the response
  • wait for status.observedGeneration to be updated

wouldn't work at all. Because basically all objects in CAPI can be paused without changes to these objects, simply by pausing the Cluster object.

@sbueringer
Copy link
Member

cc @fabriziopandini

@fabriziopandini
Copy link
Member

fabriziopandini commented Apr 3, 2024

Two comments from my side:

  • If we are introducing a sort of contract that also the provider are expected to abide to, let's discuss this in the office hours / make sure that everyone has a chance to comment
  • @sbueringer raised a good point above about resources being paused "indirectly" by pausing the cluster. It will be great to figure out how to address this before starting the implementation (cc @theobarberbany)

@sbueringer
Copy link
Member

sbueringer commented Apr 3, 2024

My point was about that we need something additional to what we have today because it is not enough with the current implementation to just check for .status.observedGeneration.

@theobarberbany
Copy link

Hey all 👋

Happy to come along to the call later on to discuss this, I'm fairly new to CAPI so all input appreciated! :)

I'm not entirely following the above - I think that the way we currently check if a resource is paused checks if the cluster is paused by looking at the spec prior to looking at of the individual resource annotations?

@sbueringer
Copy link
Member

sbueringer commented Apr 3, 2024

That is how our controllers are checking if the current resource is paused. What I meant was the way that a user of CAPI (or another tool building on top of CAPI) would check if a CAPI controller already "noticed" that a resource is paused (and thus stopped reconciling it further).

Concrete example:

  • A user is pausing the cluster via Cluster.spec.paused
  • MD controller might still reconcile a MD for a (usually pretty short) period of time until the controller noticed that Cluster.spec.paused is set.
    • This can happen because the MD controller is using a local cache to retrieve the Cluster object.

@theobarberbany
Copy link

Ah right, I think I see!

This then justifies the need for a condition, correct?

@sbueringer
Copy link
Member

I think it justifies the addition of something :). But a condition also seems like a natural choice to me

@theobarberbany
Copy link

theobarberbany commented Apr 4, 2024

I've had a little bit of a think about the points raised by @fabriziopandini along with @damdo. (I'm still trying to understand CAPI so bear with me :) )

For the example Stefan outlined above I think the trade off is: how quickly (or slowly) are we willing to have the system react (and therefore update the condition on a child resource) to a cluster being paused.

If we want a more reactive system, then we need to trade off against more requests to the API server in normal (un paused) operation. Currently, I think it's probably best to accept a potentially slower to react system to avoid this. (i.e accept there may be some delay, and that the system is eventually consistent)

Currently, we do watch on clusters - so if there is a change to the cluster object we will re-reconcile. However, we exclude if the cluster is paused. AFAIK This means we could pause a cluster, then not reconcile any of the child resources (e.g an MD).

Once the MD controller would reconcile, it could be using a local cache - and therefore not realise it's been paused. This means that the messaging around the use of the conditions would need to clearly state that, for any potential consumer of the condition that they may need to wait for some time (up to the re-sync) for the controllers to stop reconciling. The condition would only be present once this happened however...

We could change the current code by updating the watch to not exclude paused clusters, and then to also use a direct client for the get when checking if the cluster is paused. This would ensure that once the .spec.paused was written to the cluster, we would not reconcile again (avoids caching issues).

However, I'm not convinced this is the right choice - It would mean that in normal operation we would see more traffic to the api server (as the get of the cluster object would no longer be cached). This feels a bit like optimising for the wrong thing (the majority of clusters I think we expect not to be paused for the majority of the time)

As I'm quite new to CAPI, I'm not sure if there are other trade offs here I'm missing. Does this sound reasonable? :)

I'm also not entirely sure that I understand how the contract with providers is impacted here? (Sorry, again I'm new! 😅) If we want to make sure everything reflects the paused condition, then I think that I get how it could be a potential change to the contract. I'm not entirely sure this is what Joel was suggesting though...

So.. to summarise:

  • I think it's probably fine to have some delay between setting spec.paused on a cluster, and having the condition propagate to resources of that cluster, so long as we are clear that an and user must wait on the condition being present prior to starting work.
  • If we want the guarantee that as soon as spec.paused is set on a cluster we must make a direct client Get() at the start of every reconcile, which I don't think is the right trade off
  • I'm not following the contract change with providers 😓

@sbueringer
Copy link
Member

sbueringer commented Apr 5, 2024

If we want a more reactive system, then we need to trade off against more requests to the API server in normal (un paused) operation. Currently, I think it's probably best to accept a potentially slower to react system to avoid this. (i.e accept there may be some delay, and that the system is eventually consistent)

I would say > 99% of our API calls are hitting the local cache (we did some performance optimizations in CAPI v1.5.0).

Currently, we do watch on clusters - so if there is a change to the cluster object we will re-reconcile. However, we exclude if the cluster is paused. AFAIK This means we could pause a cluster, then not reconcile any of the child resources (e.g an MD).

I didn't go through all controllers, but if the exclude is done via the ClusterUnpaused predicate we can drop the predicate and just rely on the check in the reconcile. I think changing that shouldn't really make a difference. But good catch!

Once the MD controller would reconcile, it could be using a local cache - and therefore not realise it's been paused. This means that the messaging around the use of the conditions would need to clearly state that, for any potential consumer of the condition that they may need to wait for some time (up to the re-sync) for the controllers to stop reconciling. The condition would only be present once this happened however...

I think no:

  • If setting Cluster paused to true triggers the reconcile, the reconciler will see that on the Cluster resource (controller-runtime first updates the local cache and then broadcasts the event)
  • If setting MachineDeployment to paused triggers the reconcile, the reconciler will see that on the MachineDeployment resource (controller-runtime first updates the local cache and then broadcasts the event)

So once we drop the ClusterUnpaused predicate we have a guarantee that in whichever way the user sets the pause, the reconcile which is triggered by that event will realize that the Cluster/MD is paused. Which is not the same as that we can guarantee that after the user sets the pause the very next Reconcile sees the pause (as things could happen concurrently).

We could change the current code by updating the watch to not exclude paused clusters, and then to also use a direct client for the get when checking if the cluster is paused. This would ensure that once the .spec.paused was written to the cluster, we would not reconcile again (avoids caching issues).

See above about order of CR updating the local cache and then broadcasting the event (basically we don't have to use a direct client).

As I'm quite new to CAPI, I'm not sure if there are other trade offs here I'm missing. Does this sound reasonable? :)

Nice analysis so far. With the one miss about ordering in CR :) (assuming I'm correct, but I spent quite some time on it when doing performance optimization, would be good to double check though)

Re: contract

I think this comes down to, if we expect to have the "condition on each resources status" and each includes infra / bootstrap / control plane provider resources, then we have to establish a contract that everyone should add the condition in basically the same way as core CAPI.

One option here could be:

  • Implement it in core CAPI now
  • Establish an optional contract for everyone else
  • Eventually make the optional contract mandatory when we bump the contract (e.g. to v1beta2/v1/...)

@theobarberbany
Copy link

I would say > 99% of our API calls are hitting the local cache (we did some performance optimizations in CAPI v1.5.0).

Oh that's cool and good to know! 😄

I didn't go through all controllers, but if the exclude is done via the ClusterUnpaused predicate we can drop the predicate and just rely on the check in the reconcile. I think changing that shouldn't really make a difference. But good catch!

Ok - I will give this a shot on my WIP PR!

controller-runtime first updates the local cache and then broadcasts the event

I see - I wasn't aware of how this works, but that's useful as it makes reasoning about this easier :)

Which is not the same as that we can guarantee that after the user sets the pause the very next Reconcile sees the pause (as things could happen concurrently).

Ok, so if im understanding you correctly, this won't be instant but it should be relatively fast?

would be good to double check though

I'll have a dig into the controller runtime code and see if I come to the same understanding, thanks! :)

RE Contract:

This makes sense to me :)

Having slept on it a bit, I think that it may be confusing for an end user if this is something present on only core capi resources, and only some provider ones - so eventually making the optional contract mandatory makes sense!

@sbueringer
Copy link
Member

sbueringer commented Apr 5, 2024

Ok, so if im understanding you correctly, this won't be instant but it should be relatively fast?

It will be relatively fast, yup (I guess usually at most a few milliseconds). But more importantly it guarantees that we don't have to wait for a resync as it guarantees that the change that set the pause will trigger an event that will trigger a reconcile that will "see" the pause. If there is currently a reconcile in progress when the event arrives another reconcile will be triggered.

I'll have a dig into the controller runtime code and see if I come to the same understanding, thanks! :)

I simplified it a bit :). This basically comes down to that step 5 happens before step 6 here: https://github.com/kubernetes/sample-controller/blob/master/docs/images/client-go-controller-interaction.jpeg
(IIRC it's somewhere in the informer code in client-go)

@theobarberbany
Copy link

@sbueringer, I've had a stab at updating my WIP PR: #10364

Please PTAL when you have a second! I'll try and update comments etc tomorrow!

@sbueringer
Copy link
Member

@theobarberbany I think before going back and forth on an implementation we should cover this point:

If we are introducing a sort of contract that also the provider are expected to abide to, let's discuss this in the office hours / make sure that everyone has a chance to comment

@nrb
Copy link
Contributor

nrb commented Apr 10, 2024

I do think this needs to be discussed in office hours and maybe have a collaborative doc, because I quite like the option Stefan posted above. However, there might be something I'm missing and a different approach could be helpful.

One option here could be:
Implement it in core CAPI now
Establish an optional contract for everyone else
Eventually make the optional contract mandatory when we bump the contract (e.g. to v1beta2/v1/...)

@nrb
Copy link
Contributor

nrb commented Apr 10, 2024

From the April 10 2024 community meeting:

If we'd like to propagate the paused condition across providers, we should author a proposal to make it clear what's expected. This could be a PR to amend the existing contract. @sbueringer or @fabriziopandini which proposal(s) are relevant to this contract? I looked in the proposals directory, but none of them were obviously covering the contract to me.

The proposal would need to provide an example or tool of what providers need to do in order to conform. This might be a pseudo-code algorithm for observing the pause at the top of the hierarchy and propagating it down.

This might also apply to bootstrap providers; we should survey the resources affected and document them in the proposal PR(s).

For core CAPI, we can implement it now. For providers, we can make this optional and eventually make it required.

No objections to the idea in general, seems like folks are in agreement with the proposed approach.

@nojnhuh
Copy link
Contributor

nojnhuh commented Apr 10, 2024

Just want to note that a clusterctl move-specific mechanism to synchronize pause operations already exists and may need to be accounted for here: #8473

@sbueringer
Copy link
Member

sbueringer commented Apr 11, 2024

If we'd like to propagate the paused condition across providers, we should author a proposal to make it clear what's expected. This could be a PR to amend the existing contract. @sbueringer or @fabriziopandini which proposal(s) are relevant to this contract?

We don't have a proposal for the contract. What I meant is a PR to update the contract instead of an additional proposal. The proposal is currently only documented in our book:

(I hope I didn't forget any)

The proposal would need to provide an example or tool of what providers need to do in order to conform. This might be a pseudo-code algorithm for observing the pause at the top of the hierarchy and propagating it down.

I think what it comes down to is that every controller should have something like this: (check the current & the cluster object for pause)

	// Return early if the object or Cluster is paused.
	if annotations.IsPaused(cluster, deployment) {
		log.Info("Reconciliation is paused for this object")
		return ctrl.Result{}, nil
	}

And when we hit this case we should set the Paused condition to true, otherwise false (I think we have to be careful that we set the Paused condition to false before we start making changes again after unpause). So I think what we can do is describe the expectation and then provide an example in some way.

As far as I'm aware we only check current object + cluster today in core CAPI. We don't propagate it downwards (e.g. MD => MS => Machine => InfraMachine / BootstrapConfig). Basically each controller can be paused by either pausing the cluster or the reconciled object.

Just want to note that a clusterctl move-specific mechanism to synchronize pause operations already exists and may need to be accounted for here: #8473

Thx for bringing this up. The issue (based on it's title) would have had some impact, but I looked over the corresponding PR and it seems very orthogonal to me (it seems to come down to an annotation that blocks clusterctl move, but pause is not affected as far as I can tell). @fabriziopandini I think you reviewed that PR can you maybe double check that I'm not missing something?

@fabriziopandini
Copy link
Member

The issue (based on it's title) would have had some impact, but I looked over the corresponding PR and it seems very orthogonal to me

This is correct, "clusterctl.cluster.x-k8s.io/block-move" is orthogonal to pause

@nrb
Copy link
Contributor

nrb commented Apr 11, 2024

As far as I'm aware we only check current object + cluster today in core CAPI. We don't propagate it downwards (e.g. MD => MS => Machine => InfraMachine / BootstrapConfig). Basically each controller can be paused by either pausing the cluster or the reconciled object.

Yeah, I realize this isn't how it works today. My understanding of the request is that the condition would be propagated, though. Off the top of my head, if the controllers view that the Cluster is paused, they'd set the condition to signal acknowledgement. When they observe that it is unpaused, they would update the condition.

To spell it out explicitly - I think the intent is to put a condition on Cluster, MD, MS, Machine, and InfraMachine. If I'm misreading it @JoelSpeed can correct me.

@sbueringer
Copy link
Member

sbueringer commented Apr 11, 2024

My understanding was also that we want the condition everywhere. What I meant with we don't propagate is the following:
User pauses an MD => This also automatically pauses MS => same for Machine and further down

Or phrased differently:

  • A MD is considered paused if either: the MD is paused or the Cluster
  • A MS is considered paused if either: the MS is paused or the Cluster
  • A Machine is considered paused if either: the Machine is paused or the Cluster

For example a Machine wouldn't be paused just because its MachineSet is paused.

@nrb
Copy link
Contributor

nrb commented Apr 11, 2024

I see - so all resources check themselves and the Cluster, it is not transitive down the hierarchy right now.

@fabriziopandini
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 11, 2024
@sbueringer
Copy link
Member

I see - so all resources check themselves and the Cluster, it is not transitive down the hierarchy right now.

Yup. And I think it makes sense. Basically this allows you to:

  • pause an entire cluster
  • pause every single reconciler without affecting other reconcilers further down the hierarchy (e.g. we can pause MD or MS without pausing Machine reconciliation)

@nrb
Copy link
Contributor

nrb commented Apr 19, 2024

/assign @theobarberbany

@theobarberbany
Copy link

@sbueringer @fabriziopandini, I hope you're having a good friday! :)

I've had a crack at what I've understood about updating the contract!

PTAL and let me know your thoughts :)

#10519

@theobarberbany
Copy link

I've also noticed, in these same documents (or anywhere in the book that I could search) we don't specifically state the expectation that controllers will observe the paused annotation, or .spec on the cluster. Although, I may not be looking in the right place for this 😓

@sbueringer
Copy link
Member

I wouldn't be surprised a lot of these things were just copy&pasted across providers as far as I know.

@fabriziopandini
Copy link
Member

I've also noticed, in these same documents (or anywhere in the book that I could search) we don't specifically state the expectation that controllers will observe the paused annotation, or .spec on the cluster. Although, I may not be looking in the right place for this 😓

Probably the best documentation we have about this is in https://cluster-api.sigs.k8s.io/clusterctl/provider-contract#move (as far as I remember paused have been introduced to make move more robust, but might be I'm wrong)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants