Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One-pager design for ignore changes on updates / granular management policies #3822

Merged
merged 1 commit into from Jun 2, 2023

Conversation

lsviben
Copy link
Contributor

@lsviben lsviben commented Mar 6, 2023

Description of your changes

Extended the initial proposal with an one-pager design doc.

Iterating over the solution we came to a more generalized approach that introduces granular management policies through which features like ObserveOnly and ignore changes can be realized.

Fixes #3516

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR if necessary.

How has this code been tested

N/A

Copy link
Member

@negz negz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this design @lsviben!

design/one-pager-ignore-changes.md Outdated Show resolved Hide resolved
design/one-pager-ignore-changes.md Outdated Show resolved Hide resolved
design/one-pager-ignore-changes.md Show resolved Hide resolved
design/one-pager-ignore-changes.md Outdated Show resolved Hide resolved
design/one-pager-ignore-changes.md Outdated Show resolved Hide resolved
design/one-pager-ignore-changes.md Show resolved Hide resolved
design/one-pager-ignore-changes.md Outdated Show resolved Hide resolved
design/one-pager-ignore-changes.md Outdated Show resolved Hide resolved
@ferpizza
Copy link

I'm currently impacted by the issues we are trying to solve, shared here

TL,DR; 3 different fields on the NodePool manifest of provider-gcp are generating issues with GKE's autoUpgrade and autoScaling functionalities.

These 3 fields have something in common. None of them are on my initial manifest and neither is a mandatory field.

So, I'm thinking that another approach to solving this is to avoid updating the original manifest spec.forProvider section on every reconcile cycle, and instead add a full manifest of current state on status.atProvider

This way, CrossPlane will ensure that only the values set by the user are sustained over time while allowing all other values to flow freely. Additionally, providers and other plugins will be able to get the full info on the resource from the status.atProvider section, so nothing will be missed.

@lsviben
Copy link
Contributor Author

lsviben commented Mar 14, 2023

I'm currently impacted by the issues we are trying to solve, shared here

TL,DR; 3 different fields on the NodePool manifest of provider-gcp are generating issues with GKE's autoUpgrade and autoScaling functionalities.

These 3 fields have something in common. None of them are on my initial manifest and neither is a mandatory field.

So, I'm thinking that another approach to solving this is to avoid updating the original manifest spec.forProvider section on every reconcile cycle, and instead add a full manifest of current state on status.atProvider

This way, CrossPlane will ensure that only the values set by the user are sustained over time while allowing all other values to flow freely. Additionally, providers and other plugins will be able to get the full info on the resource from the status.atProvider section, so nothing will be missed.

Thanks for the input!

Basically your idea would be to turn off late-initialization and instead rely on status.atProvider which is a new feature coming with Observe-Only Resources. The "ignoring" would be done by just not setting the fields in the spec.forProvider and as there is no late-initialization, there is no undesired conflict between Crossplane and external resource managers. I hope I got it right?

Ill add your use case and alternative solution suggestion to the proposal

@ferpizza
Copy link

Your interpretation sounds about right, but I'm a bit new to CrossPlane and I'm not fully familiar with what late-initialization is trying to achieve, TBH.

I'm getting some inspiration on ArgoCD, that keeps 2 manifests versions: the one that the user uploads (desired manifest), and the one that is actually running on the cluster (current manifest).

Then, on every cycle, we would make sure that the current manifests includes all the values in the desired manifest, and reconcile only if there is a deviation between these two.

It's expected that the current manifest will have more values that the desired manifest. And this is OK.
If the user wants to take control over these other values, it only need to add them to their desired manifests and upload changes.

Regarding how this overlaps with Observe-Only goals, I believe this proposal will accomplishes both of them.

@negz
Copy link
Member

negz commented Mar 22, 2023

Basically your idea would be to turn off late-initialization and instead rely on status.atProvider which is a new feature coming with Observe-Only Resources.

This is along the lines of what I was thinking after our discussion a few weeks back. We can't turn off late-init by default because that would be a breaking API change. We could add a knob to disable it though - for example the new managementPolicy field being added in support of OO resources.

At that point we'd have:

  • The full observed state in status.atProvider.
  • (Optionally) The initial 'bootstrap' desired state in spec.initProvider.
  • (Optionally) The constantly-reconciled desired state in spec.forProvider.

If you wanted to allow the external system (e.g. a GKE node pool autoscaler) to control a particular field you would:

  • (Optionally) add an initial desired value to spec.initProvider.
  • Don't specify a desired value under spec.forProvider.
  • Set spec.managementPolicy: DoNotLateInit (hopefully with a better policy name)

One downside of this approach is that you have to opt-out of late init entirely (not just for the fields you care about). I'm not sure how much of an issue that would actually be in practice.

@turkenh
Copy link
Member

turkenh commented Mar 22, 2023

Set spec.managementPolicy: DoNotLateInit (hopefully with a better policy name)

The default management policy is FullControl, meaning that XP will own/control each and every field, full resource control. This is indeed why we are late-initializing; to know the default values and take them under our control (i.e. source of truth).

Stating this, I like the idea of using the managementPolicy to control the late-init behavior since disabling late-init would mean XP managing the resource but only the provided fields. We can find a suitable policy name considering this, just to throw out some ideas: PartialControl, SelectiveControl, FieldControl, ManagedFields or CustomControl.

@negz
Copy link
Member

negz commented Mar 22, 2023

Throwing out another potential policy name: ExplicitControl (i.e. only control the fields I explicitly tell you to).

@lsviben
Copy link
Contributor Author

lsviben commented Mar 22, 2023

Basically your idea would be to turn off late-initialization and instead rely on status.atProvider which is a new feature coming with Observe-Only Resources.

This is along the lines of what I was thinking after our discussion a few weeks back. We can't turn off late-init by default because that would be a breaking API change. We could add a knob to disable it though - for example the new managementPolicy field being added in support of OO resources.

At that point we'd have:

  • The full observed state in status.atProvider.
  • (Optionally) The initial 'bootstrap' desired state in spec.initProvider.
  • (Optionally) The constantly-reconciled desired state in spec.forProvider.

If you wanted to allow the external system (e.g. a GKE node pool autoscaler) to control a particular field you would:

  • (Optionally) add an initial desired value to spec.initProvider.
  • Don't specify a desired value under spec.forProvider.
  • Set spec.managementPolicy: DoNotLateInit (hopefully with a better policy name)

One downside of this approach is that you have to opt-out of late init entirely (not just for the fields you care about). I'm not sure how much of an issue that would actually be in practice.

I like this approach of using the managementPolicy and just adding a new policy.

And in the start we don't need to support spec.initProvider as non of the use cases we got up till now really needs something like this.

Just not sure if this should be optional:

  • (Optionally) The constantly-reconciled desired state in spec.forProvider.

IMO the fields that are in spec.forProvider still need to be constantly-reconciled by Crossplane.

For the naming (hardest part as always) I narrowed down the suggestions to:
PartialControl - shows that Crossplane is only partially controlling the resource, meaning something else could be controlling other parts/fields.
SelectiveControl - Crossplane just controls selected fields of spec.forProvider
ExplicitControl - similar to above, Crossplane just controls the fields that are explicitly set in spec.forProvider.

I'm personally leaning towards PartialControl as it feels most unambiguous, and fits with the existing FullControl.

@ctkeyser
Copy link

ctkeyser commented Mar 28, 2023

for our use-case, we're creating an azuread group (https://marketplace.upbound.io/providers/upbound/provider-azuread/v0.5.0/resources/groups.azuread.upbound.io/Group/v1beta1), and we want to set owners and members only on group creation. the owners of the group then manage owners+members outside of crossplane, and also crossplane won't fight with other automation if owners/members leave the company. to achieve this, we had to switch to the terraform provider and use lifecycle { ignore_changes = [ owners, members ] }.
it's sounding like there was no use-case for a spec.initProvider (if not just doing an ignoreChanges field), so wanted to make sure our use-case was known

@lsviben
Copy link
Contributor Author

lsviben commented Mar 31, 2023

for our use-case, we're creating an azuread group (https://marketplace.upbound.io/providers/upbound/provider-azuread/v0.5.0/resources/groups.azuread.upbound.io/Group/v1beta1), and we want to set owners and members only on group creation. the owners of the group then manage owners+members outside of crossplane, and also crossplane won't fight with other automation if owners/members leave the company. to achieve this, we had to switch to the terraform provider and use lifecycle { ignore_changes = [ owners, members ] }. it's sounding like there was no use-case for a spec.initProvider (if not just doing an ignoreChanges field), so wanted to make sure our use-case was known

Thanks for the input, I included it in the use cases :)

@lsviben
Copy link
Contributor Author

lsviben commented Mar 31, 2023

Updated the proposal with the solution we discussed above (#3822 (comment)), as it seems like we are converging to a common view.

Now the suggested solution is to add a new Management Policy(name still up for discussion) that would act in all ways as FullControl except for:

  • skipping late initialization
  • orphans the resource on delete

In addition, as there are use cases which need a way to initialize resources with some parameters, but not use them later, the initProvider field is added to handle these cases.

Although, while writing this solution update, it seems to me like these can be divided into 2 features.
One is the new management policy which fixes the use cases where late initialization is conflicting with external management of certain fields.
Second is the initProviders which allow users to set some fields just during creation time (although probably this one needs to be used with the new management policy to work well).

Not sure if we should split these features?

P.S Its also worth to note that the previous suggested solution with ignoredChanges solves both issues.

@ferpizza
Copy link

Now the suggested solution is to add a new Management Policy(name still up for discussion) that would act in all ways as FullControl except for:
* skipping late initialization
* orphans the resource on delete

Wondering why we are forcing the Orphan by design.

Following up on the GCP nodepool example having issues with autoUpgrade and autoScaler, I would want to avoid managing certain fields (eg. nodeCount, version) and still eliminate the Nodepool on delete.

While Orphan is a sane default value for these cases, the user should be able to change spec.deletionPolicy to Delete if it fits its needs better (TL,DR; should continue to work as today)

@lsviben
Copy link
Contributor Author

lsviben commented Mar 31, 2023

Now the suggested solution is to add a new Management Policy(name still up for discussion) that would act in all ways as FullControl except for:

  • skipping late initialization
  • orphans the resource on delete

Wondering why we are forcing the Orphan by design.

Following up on the GCP nodepool example having issues with autoUpgrade and autoScaler, I would want to avoid managing certain fields (eg. nodeCount, version) and still eliminate the Nodepool on delete.

While Orphan is a sane default value for these cases, the user should be able to change spec.deletionPolicy to Delete if it fits its needs better (TL,DR; should continue to work as today)

Agree that by default we should Orphan but leave an option to specify the resource to be deleted. Not sure how to do that yet though. It would be possible with the DeletionPolicy, but that is planned to be deprecated in favor of ManagementPolicy.

But anyway, a valid point and I think we should keep this question open.

@turkenh
Copy link
Member

turkenh commented Apr 6, 2023

Agree that by default we should Orphan but leave an option to specify the resource to be deleted. Not sure how to do that yet though. It would be possible with the DeletionPolicy, but that is planned to be deprecated in favor of ManagementPolicy.

But anyway, a valid point and I think we should keep this question open.

Yes, this was something that I realized independently and came here to comment on.

Now the suggested solution is to add a new Management Policy(name still up for discussion) that would act in all ways as FullControl except for:

  • skipping late initialization
  • orphans the resource on delete

Using the new ManagementPolicy instead of the existing DeletionPolicy starts becoming a stretch if we want to introduce a management policy like PartialControl. To cover all cases, we may need to overload the management policy with extended policy names considering the deletion. FullControl, FullControlOrphanOnDelete, PartialControl, PartialControlOrphanOnDelete, ObserveOnly.

Considering all of these, I am wondering whether we should keep the DeletionPolicy to control management and deletion independently 🤔 The main motivation for deprecation was ManagementPolicy covering existing cases, and the new policy ObserveOnly should obviously orphan the resource. However, we can still keep the DeletionPolicy and ignore it when the ManagementPolicy is ObserveOnly as we operate read-only.

@bobh66
Copy link
Contributor

bobh66 commented Apr 6, 2023

Considering all of these, I am wondering whether we should keep the DeletionPolicy to control management and deletion independently 🤔 The main motivation for deprecation was ManagementPolicy covering existing cases, and the new policy ObserveOnly should obviously orphan the resource. However, we can still keep the DeletionPolicy and ignore it when the ManagementPolicy is ObserveOnly as we operate read-only.

I agree with this - keeping DeletionPolicy separate is a cleaner interface that is easier to understand.

@ferpizza
Copy link

ferpizza commented Apr 6, 2023

To cover all cases, we may need to overload the management policy with extended policy names considering the deletion. FullControl, FullControlOrphanOnDelete, PartialControl, PartialControlOrphanOnDelete, ObserveOnly.

From my perspective PartialControl and ObserveOnly can become one. ObserveOnly seems an edge case of PartialControl

If we want to observe only, our manifest will include only the resource name (eg. as we do now when importing existing resources from a cloud provider)

If we add any other value to that manifest, then it behaves as partial control, ensuring informed values are always in sync.

It could be confusing to have values in an ObserveOnly manifest that are not kept in sync, so i assume that name only is the way to go for this case.

@turkenh
Copy link
Member

turkenh commented Apr 6, 2023

From my perspective PartialControl and ObserveOnly can become one. ObserveOnly seems an edge case of PartialControl

If we want to observe only, our manifest will include only the resource name (eg. as we do now when importing existing resources from a cloud provider)

If we add any other value to that manifest, then it behaves as partial control, ensuring informed values are always in sync.

The main problem here is what to do regarding the creation of resources. ObserveOnly clearly indicates that XP should operate as read-only including the creation of the resource.

For example, what should the XP controller do after noticing there is no such bucket with the below manifest? Should it create it or error out assuming the user only wants to observe it?

apiVersion: s3.aws.upbound.io/v1beta1
kind: Bucket
metadata:
  annotations:
    crossplane.io/external-name: example-bucket
  name: example-bucket
spec:
  managementPolicy: PartialControl
  forProvider:
    region: us-west-1

We already had some similar discussions in the OO design PR, please check: #3531 (comment)

Copy link
Contributor

@amotolani amotolani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts on the deletionPolicy

@lsviben
Copy link
Contributor Author

lsviben commented Apr 12, 2023

Agree that by default we should Orphan but leave an option to specify the resource to be deleted. Not sure how to do that yet though. It would be possible with the DeletionPolicy, but that is planned to be deprecated in favor of ManagementPolicy.
But anyway, a valid point and I think we should keep this question open.

Yes, this was something that I realized independently and came here to comment on.

Now the suggested solution is to add a new Management Policy(name still up for discussion) that would act in all ways as FullControl except for:

  • skipping late initialization
  • orphans the resource on delete

Using the new ManagementPolicy instead of the existing DeletionPolicy starts becoming a stretch if we want to introduce a management policy like PartialControl. To cover all cases, we may need to overload the management policy with extended policy names considering the deletion. FullControl, FullControlOrphanOnDelete, PartialControl, PartialControlOrphanOnDelete, ObserveOnly.

Considering all of these, I am wondering whether we should keep the DeletionPolicy to control management and deletion independently thinking The main motivation for deprecation was ManagementPolicy covering existing cases, and the new policy ObserveOnly should obviously orphan the resource. However, we can still keep the DeletionPolicy and ignore it when the ManagementPolicy is ObserveOnly as we operate read-only.

Ok I see that most people are aligned with separating the ManagementPolicy from the DeletionPolicy. I on board with that as well, makes it cleaner and more explicit IMO.

We wont need the OrphanOnDelete ManagementPolicy then? We would just have FullControl , PartialControl, ObserveOnly. And the deletion would be managed by the DeletionPolicy. Im not sure if we should have the special case that we ignore the DeletionPolicy for ObserveOnly, seems not intuitive. Id rather have it Orphan by default when the ManagementPolicy is ObserveOnly.

@turkenh
Copy link
Member

turkenh commented Apr 12, 2023

We wont need the OrphanOnDelete ManagementPolicy then? We would just have FullControl , PartialControl, ObserveOnly.

Correct.

Im not sure if we should have the special case that we ignore the DeletionPolicy for ObserveOnly, seems not intuitive.

Yeah, this is something I am not happy with. However, we can document something like "DeletionPolicy will be ignored when ManagementPolicy is ObserveOnly and the external resource will always be Orphaned".

Id rather have it Orphan by default when the ManagementPolicy is ObserveOnly.

I don't think setting a conditional default is technically possible, at least with the way we are currently using, i.e. API level defaulting.

@lsviben
Copy link
Contributor Author

lsviben commented May 17, 2023

I am now thinking whether we should not make such an exception but rather change how we default the deletionPolicy value. For example, it does not default at API level, and we default to Orphan if managementPolicy is ObserveOnly but Delete otherwise.
I am still not happy since ObserveOnly kind of implies that we will only observe (and never delete the resource), but this is a natural outcome of the decision we are making on keeping deletionPolicy together with managementPolicy in the scope of this design.

One caveat with above, even if we set the default of the deletionPolicy, a user might change the managementPolicy from FullControl to ObserveOnly on an existing resource that was initialized as deletionPolicy: Delete in the beginning. I cannot convince myself to delete an external resource with managementPolicy: ObserveOnly in any case.

Yeah this is tricky. I would also prefer to avoid cases where external ObserveOnly resources would accidentally be deleted.

We could always also go back to just ignoring the deletionPolicy if ObserveOnly is set and forcing users to change the management policy if they want to delete, but I dont like that as it means that resource need to be edited to be deleted.

I would still like to leave an option for users to configure a resource they want to just Observe and explicitly set if they want to delete it. Maybe it would be easier to digest the fact that an ObserveOnly reasource can be deleted if the name of the managementPolicy is just Observe instead of ObserveOnly

This makes me wonder if we should stick with the decision of deprecating deletionPolicy in favor of managementPolicy but find another way to control the "late-initialization" behavior for this design.

We had an offline discussion with @lsviben on the topic and wanted to record some alternative options as the outcome of that.

Introduce another field for that:

spec:
  managementPolicy: Default # or ObserveOnly, OrphanOnDelete, ObserveDelete
  fieldManagementPolicy: AllFields # or SpecifiedFieldsOnly

Or use a nested structure for managementPolicy:

spec:
  managementPolicy:
    allowedActions: All  # or ObserveOnly, OrphanOnDelete, ObserveDelete
    fieldScope: AllFields  # or SpecifiedFieldsOnly

Deprecating deletionPolicy for both cases.

Some thoughts behind this was to decouple the late initialization from the management policies, and to put the deletion decision back under management policies. If we "map" the management policies to the steps of the external client we could have the policies be easily understandable.

BTW we were struggling to find a good name for a field that for now would control if late initialization is on or off and that can possibly be extended later with similar options. Maybe the simplest solution would be to just have a disableLateInitialization field but we are hoping to pick something that can be extended if needed in the future.

@negz
Copy link
Member

negz commented May 18, 2023

I am now thinking whether we should not make such an exception but rather change how we default the deletionPolicy value. For example, it does not default at API level, and we default to Orphan if managementPolicy is ObserveOnly but Delete otherwise.

This seems like a good compromise to me.

I worry that if we overthink this too much we'll end up with something that is potentially safer, but harder to understand. Perhaps it would be sufficient to add a strong warning to the managementPolicy API field docstring that it does not control what happens when the MR is deleted, and that you must set deletionPolicy accordingly?

@turkenh
Copy link
Member

turkenh commented May 18, 2023

I worry that if we overthink this too much we'll end up with something that is potentially safer, but harder to understand. Perhaps it would be sufficient to add a strong warning to the managementPolicy API field docstring that it does not control what happens when the MR is deleted, and that you must set deletionPolicy accordingly?

I think my main concern is with this design is that we are changing what the managementPolicy controls. Previously, (and with provider-kubernetes) it was which of the CRUD actions should be allowed for the resource (and deprecation of deletionPolicy well fits in this case).

Now, we are moving to some other direction:

  • FullControl => All CRUD actions with all fields
  • PartialControl => All CRUD actions with selected fields
  • ObserveOnly => Only Observe

This has two caveats:

  • Interplay with deletion policy (as already mentioned)
  • Further extensibility of the policy, e.g. what would we do if we want to support something like ObserveUpdate

This is why I am proposing to leave managementPolicy as is, but control disabling late-init behavior with another configuration.

@negz
Copy link
Member

negz commented May 18, 2023

Thinking out loud, I think we want to affect the following things:

  • Should Crossplane late initialize unspecified desired state from the observed state? (A create-time concern)
  • Should Crossplane delete or orphan the external resource (ER) when the MR is deleted? (A delete-time concern)
  • Should Crossplane only observe the ER? Or should it reconcile desired state. (An all-the-time concern)

If I try to frame these as "what is Crossplane allowed to do" (not what can't it do), I think it's something like:

  • Is Crossplane allowed to create and update the external resource?
  • Is Crossplane allowed to delete the external resource?
  • Is Crossplane allowed to late-initialize desired state from the external resource

One option would be to frame this as a set of enums, similar to the verbs array of an RBAC Role.

If we were to frame this as a set of enum values representing what Crossplane should do for a particular managed resource, I think the default behavior would look like this:

spec:
  managementPolicy: [Observe, Create, Initialize, Update, Delete]

Maybe for this case, like RBAC, you could support a shorthand:

spec:
  managementPolicy: ["*"]

An observe-only resource would look like this:

spec:
  managementPolicy: [Observe]

A resource that you didn't want to late-initialize would look like this:

spec:
  managementPolicy: [Observe, Create, Update, Delete]

And a resource that you didn't want to delete (the equivalent to deletionPolicy: Orphan would look like this:

spec:
  managementPolicy: [Observe, Create, Initialize, Update]

Some further thoughts:

  • Is this still a policy? Would manage: ["*"], lifecycle: ["*"], or allow: ["*"] make more sense?
  • What about the new combinations this would open up?
    • Is managementPolicy: [] equivalent to the crossplane.io/paused annotation?
    • Would anyone find managementPolicy: [Create] useful? i.e. create it, but never update or delete it?

@lsviben
Copy link
Contributor Author

lsviben commented May 18, 2023

One option would be to frame this as a set of enums, similar to the verbs array of an RBAC Role.

If we were to frame this as a set of enum values representing what Crossplane should do for a particular managed resource, I think the default behavior would look like this:

spec:
  managementPolicy: [Observe, Create, Initialize, Update, Delete]

Maybe for this case, like RBAC, you could support a shorthand:

spec:
  managementPolicy: ["*"]

An observe-only resource would look like this:

spec:
  managementPolicy: [Observe]

A resource that you didn't want to late-initialize would look like this:

spec:
  managementPolicy: [Observe, Create, Update, Delete]

And a resource that you didn't want to delete (the equivalent to deletionPolicy: Orphan would look like this:

spec:
  managementPolicy: [Observe, Create, Initialize, Update]

Some further thoughts:

  • Is this still a policy? Would manage: ["*"], lifecycle: ["*"], or allow: ["*"] make more sense?

  • What about the new combinations this would open up?

    • Is managementPolicy: [] equivalent to the crossplane.io/paused annotation?
    • Would anyone find managementPolicy: [Create] useful? i.e. create it, but never update or delete it?

For me this feels like it could solve a lot of confusion about what exactly the different management policies do. I just wonder if we should just have the options [Observe, Create, Update, Delete] as they correspond to the external client interface. The Initialize can be confused with the MR Initialize which from what I see mostly just sets some default tags, while the late initialization update happens elsewhere.

So IMO we would still need a separate field that controls the actions on the managed resource, unlike managementPolicy which controls what actions happen on the external resource. Something like localManagement(name is just a WIP, i cant think of anything smarter now), and add there the actions that affect the managed resource like [Initialize, LateInitialize], maybe there is something more? So a default resource would look like this:

spec:
  managementPolicy: "*"
  localManagement: "*"

Disable late init:

spec:
  managementPolicy: "*"
  localManagement: [Initialize]

ObserveOnly:

spec:
  managementPolicy: [Observe]
  localManagement: []

On the other hand, for sure this is more complex then just setting for example a managmentPolicy: ObserveOnly but its very flexible. Of course some combinations wont make sense, like setting just managementPolicy: Update/Delete initially.

@negz
Copy link
Member

negz commented May 18, 2023

The Initialize can be confused with the MR Initialize which from what I see mostly just sets some default tags, while the late initialization update happens elsewhere.

We could avoid this by calling it LateInitialize:

spec:
  managementPolicy: [Observe, Create, LateInitialize, Update, Delete]

So IMO we would still need a separate field that controls the actions on the managed resource,

I don't think I agree. I would argue that Observe is an action on the managed resource too - I think of Observe as being short-hand for "observe the external resource, and update the managed resource's status.atProvider accordingly. So Observe already affects how the MR is updated. I think it would be fine for LateInitialize to do the same, and it keeps everything in "one place" (fewer fields to control how the MR is reconciled).

Worth noting that there are even more slightly oddball combinations this would open up, like:

  managementPolicy: [Create, Update, Delete]

I would interpret this as:

  • Create the resource using the desired state specified in spec.forProvider
  • Update the resource using the desired state specified in spec.forProvider
  • Do not update the observed state in status.atProvider.

In practice we'd still be observing the ER in order to know whether we should create or update it, we just wouldn't be recording our observations. I also don't really know why anyone would use this particular policy, but it seems like we if we took this "array of enums" path we should support it for consistency.

@bobh66
Copy link
Contributor

bobh66 commented May 19, 2023

Of course some combinations wont make sense, like setting just managementPolicy: Update/Delete initially.

This might actually be useful for migration scenarios where the resource already exists and Crossplane needs to take ownership of it.

I think using managementPolicy to directly control reconciliation and the external interfaces makes a lot of sense - it's better to be explicit about what each enum value is doing. PartialControl is vague and non-specific - using Create, Update etc is much easier to understand.

I'd prefer to have the ability to turn on/off specific external interface functionality, and late initialization, at my discretion.

@negz
Copy link
Member

negz commented May 19, 2023

spec:
  managementPolicy: [Observe, Create, LateInitialize, Update, Delete]

Let me try write documentation for this, as I imagine it would appear in the API docs:

managementPolicy controls how Crossplane reconciles this managed resource with its equivalent external resource. It is an array of operations Crossplane should perform. The following operations are supported:

  • Create - Use spec.initProvider and spec.forProvider to create the external resource.
  • Update - Use spec.forProvider to update the external resource.
  • Delete - Delete the external resource when the managed resource is deleted.
  • Observe - Update status.atProvider to reflect the state of the external resource.
  • LateInitialize - Update unspecified spec.forProvider fields to reflect the state of the external resource.

Crossplane will only perform the operations that a resource's managementPolicy allows. The default managementPolicy is [*]. This is semantically equivalent to [Create, Update, Delete, Observe, LateInitialize].

@lsviben
Copy link
Contributor Author

lsviben commented May 19, 2023

spec:
  managementPolicy: [Observe, Create, LateInitialize, Update, Delete]

Let me try write documentation for this, as I imagine it would appear in the API docs:

managementPolicy controls how Crossplane reconciles this managed resource with its equivalent external resource. It is an array of operations Crossplane should perform. The following operations are supported:

  • Create - Use spec.initProvider and spec.forProvider to create the external resource.
  • Update - Use spec.forProvider to update the external resource.
  • Delete - Delete the external resource when the managed resource is deleted.
  • Observe - Update status.atProvider to reflect the state of the external resource.
  • LateInitialize - Update unspecified spec.forProvider fields to reflect the state of the external resource.

Crossplane will only perform the operations that a resource's managementPolicy allows. The default managementPolicy is [*]. This is semantically equivalent to [Create, Update, Delete, Observe, LateInitialize].

For me this seems understandable and flexible. If there are no other concerns, I will update the proposal to reflect the proposed changes to the management policy.

@turkenh
Copy link
Member

turkenh commented May 22, 2023

spec:
  managementPolicy: [Observe, Create, LateInitialize, Update, Delete]

I like the flexibility and explicitness of this API. My only concern is its verbosity, especially for some common use cases. For example:

spec:
  deletionPolicy: Orphan

becomes

spec:
  managementPolicy: [Observe, Create, LateInitialize, Update]

But I don't think it is a significant problem and fine with moving forward with it.

@lsviben
Copy link
Contributor Author

lsviben commented May 24, 2023

spec:
  managementPolicy: [Observe, Create, LateInitialize, Update, Delete]

Let me try write documentation for this, as I imagine it would appear in the API docs:

managementPolicy controls how Crossplane reconciles this managed resource with its equivalent external resource. It is an array of operations Crossplane should perform. The following operations are supported:

  • Create - Use spec.initProvider and spec.forProvider to create the external resource.
  • Update - Use spec.forProvider to update the external resource.
  • Delete - Delete the external resource when the managed resource is deleted.
  • Observe - Update status.atProvider to reflect the state of the external resource.
  • LateInitialize - Update unspecified spec.forProvider fields to reflect the state of the external resource.

Crossplane will only perform the operations that a resource's managementPolicy allows. The default managementPolicy is [*]. This is semantically equivalent to [Create, Update, Delete, Observe, LateInitialize].

For me this seems understandable and flexible. If there are no other concerns, I will update the proposal to reflect the proposed changes to the management policy.

Updated the proposal, please take another look.

@lsviben
Copy link
Contributor Author

lsviben commented May 24, 2023

I am not sure if this document should still be a proposal for ignore_changes. I feel that it went beyond that and now the ignore_changes are just one of the symptoms we are solving through a broader solution.

Maybe more fitting would be something like : "Crossplane management policy" or "Crossplane managed resource management policy" - but that has too many manage in it.

WDYT? I would change the background and goals a bit to fit more with the proposed solution.

@yardbirdsax
Copy link

yardbirdsax commented May 24, 2023

I'm admittedly a bit late to this party, but I'm wondering if it might be useful to include a policy option along the lines of UpdateWithApproval? My thought here is specifically around cases such as when changes are going to be made to an existing, Crossplane managed, resource, and we don't want Crossplane providers to apply them until someone says "go ahead". How this is done, technically (i.e. what field(s) needs to get updated to what values), is probably something best left to the underlying provider. This is particularly relevant for providers such as the Terraform one, which uses a system (Terraform) where the idea of "review what's going to happen before a change is made" is deeply ingrained.

I think this could be done with the current spec by:

  • Setting the policy to Create | Observe on initial creation.
  • Make a change to spec and observe what change would be.
  • Change policy to Create | Observe | Update, so that the change gets applied.
  • Change policy back to Create | Observe, so that future changes don't get applied.

While this is functional, it requires quite a few touches to do so.

@bobh66
Copy link
Contributor

bobh66 commented May 24, 2023

Setting the policy to Create | Observe on initial creation.

I believe this will also prevent applying updates to restore the existing state if it gets out of sync, in addition to blocking changes to spec, which I'm not sure is what you want. The reconciler (today) doesn't distinguish between "something changed on the far end and I need to run terraform apply to fix it" and "something changed in spec and I need to run terraform apply to apply the update"

I think controlled apply/approval is a much larger discussion that should be dealt with in a separate issue/design doc.

@yardbirdsax
Copy link

I believe this will also prevent applying updates to restore the existing state if it gets out of sync, in addition to blocking changes to spec, which I'm not sure is what you want.

Actually, at least in some cases, that might well be desired. Perhaps some out-of-band change was made in a break-glass kind of scenario, and we don't want that to automatically be reverted.

I think controlled apply/approval is a much larger discussion that should be dealt with in a separate issue/design doc.

That's fair, just wanted to bring it up as a thing to consider. I can open a separate issue for it after I read through the relevant contribution guidelines etc.

@bobh66
Copy link
Contributor

bobh66 commented May 24, 2023

Actually, at least in some cases, that might well be desired. Perhaps some out-of-band change was made in a break-glass kind of scenario, and we don't want that to automatically be reverted.

This is one of the scenarios behind the "pause reconciliation" feature that uses annotation crossplane.io/paused: "true"

I agree with @negz that managementPolicy: [] appears to be equivalent to crossplane.io/paused: "true" and is (to me) easier to manage than the annotation.

Copy link
Member

@turkenh turkenh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, just left a couple of comments that are nonblocking!

design/one-pager-ignore-changes.md Outdated Show resolved Hide resolved
design/one-pager-ignore-changes.md Outdated Show resolved Hide resolved
design/one-pager-ignore-changes.md Outdated Show resolved Hide resolved
design/one-pager-ignore-changes.md Outdated Show resolved Hide resolved
design/one-pager-ignore-changes.md Outdated Show resolved Hide resolved
design/one-pager-ignore-changes.md Outdated Show resolved Hide resolved
design/one-pager-ignore-changes.md Show resolved Hide resolved
design/one-pager-ignore-changes.md Show resolved Hide resolved
design/one-pager-ignore-changes.md Show resolved Hide resolved
Copy link
Member

@turkenh turkenh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, just left a couple of comments that are nonblocking!

@jbw976
Copy link
Member

jbw976 commented May 30, 2023

I am not sure if this document should still be a proposal for ignore_changes. I feel that it went beyond that and now the ignore_changes are just one of the symptoms we are solving through a broader solution.

Agreed on this @lsviben - do you want to go ahead and update the PR title to capture this broader scope too?

@lsviben lsviben changed the title One-pager design for ignore changes on updates One-pager design for ignore changes on updates / granular management policies May 30, 2023
@lsviben
Copy link
Contributor Author

lsviben commented May 30, 2023

I am not sure if this document should still be a proposal for ignore_changes. I feel that it went beyond that and now the ignore_changes are just one of the symptoms we are solving through a broader solution.

Agreed on this @lsviben - do you want to go ahead and update the PR title to capture this broader scope too?

Done! Kept the ignore changes part for historic and intention reasons and added granular management policies

Signed-off-by: lsviben <sviben.lovro@gmail.com>
@turkenh turkenh merged commit a53bd81 into crossplane:master Jun 2, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Granular management policies (e.g. ability to ignore changes etc.)