Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist claim resourceRefs before creating XRs #2467

Merged
merged 7 commits into from
Aug 16, 2021
Merged

Conversation

negz
Copy link
Member

@negz negz commented Aug 10, 2021

Description of your changes

Fixes #2078
Fixes #2464
Fixes #2473

This PR updates the claim reconciler to persist a claim's spec.resourceRef before dynamically provisioning an XR. It uses a similar approach to the XR reconciler in that it now uses a dry-run create to generate an XR name (and validate the XR) without actually creating it. 'Binding' a claim to an XR is now broken up into three stages.

  1. We set the XR's claimRef during Configure.
  2. We set (and persist) the XRC's resourceRef during Bind
  3. We persist the XR's claimRef during Apply.

This approach has a handful of benefits:

  1. It prevents 'leaking' XRs. Previously if anything went wrong between when we first created an XR and when we persisted its resourceRef we'd return from the Reconcile and requeue a new one. The new reconcile would have no way of knowing that we'd already created an XR for this claim so it would create a new one.
  2. It ensures dynamically provisioned XRs never exist without a claimRef - it's set when the XR is created. This can help avoid issues like Race condition while setting XR's spec.claimRef #2464. Note that XRs should still be designed to support static binding - i.e. they should not assume that the claimRef will always be set at XR creation time.
  3. It eliminates one superfluous API server write, because we only update the XR once, rather than applying it then updating it to bind it.

Note that this commit changes the behaviour of the (AFAIK exceedingly obscure) edge case in which someone attempts to claim an XR tha does not exist - i.e. by explicitly setting their claim's resourceRef. Previously we'd assume they did in fact want to claim an XR that hadn't yet been created, and wait around for it to exist. Now we assume that the resourceRef was populated by a previous iteration of the reconciler and proceed to dynamically provision an XR of that name. One interesting side effect here is that this means claim authors can now influence the (Kubernetes) name of the XR they would like to provision, though we'll still bail out if they try to 'claim over' an XR that is claimed by some other entity.

This PR also contains a small additional fix for #2473, which I (re)discovered while addressing the original issue. The original refactoring done in this PR is a partial fix for this issue because we now detect 'misbound' claims earlier and refuse to make any mutations on composite resources we don't own, so I only needed to fix the delete flow to ensure a claim could not delete an XR it was not bound to.

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR if necessary.

How has this code been tested

I've tested this by walking through the GCP getting started guide and confirming that dynamic provisioning works as expected. I've also followed similar steps to those outlined in #2473 to ensure I can't reproduce that issue with this PR.

@negz
Copy link
Member Author

negz commented Aug 12, 2021

A little more testing - I ran through the slightly more complex AWS (with VPC) getting started guide and that also appears to work as expected.

Copy link
Member

@hasheddan hasheddan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM! Thanks for this improvement @negz!

I believe this pattern of using ObjectTypers was copied from a past iteration
of the Claim reconciler that dealt in strongly typed Claims. At some point those
types did not have their APIVersion and Kind deserialized when read from the API
server so we needed to look them up in the scheme.

This iteration of the claim reconciler always deals with types that are built on
the *unstructured.Unstructured type, which always has its APIVersion and Kind
set when it is read from the API server. Therefore we can just use GetObjectKind
rather than needing to pass around an ObjectTyper.

Signed-off-by: Nic Cope <negz@rk0n.org>
Just a little cleanup - previously there were two conflicting comments - one
stated we wanted to keep the external name even if it was empty while the other
stated that we did not.

Signed-off-by: Nic Cope <negz@rk0n.org>
This is similar to the approach we take in the XR reconciler when creating
composed resources. The dry-run create causes the XR's metadata.name to be
populated based on its metadata.generateName, without the XR actually being
created. This gives us a generated XR name that is very likely to be available
that we could persist as our claim's resourceRef _before_ we actually create the
XR, thus allowing us to reduce the likelihood that we would leak XRs.

Signed-off-by: Nic Cope <negz@rk0n.org>
This commit leverages our newly implemented ability to name XRs before they're
created by persisting the XR's name to the claim's resourceRef before we create
(i.e. apply) the XR. 'Binding' a claim to an XR is now broken up into three
stages.

1. We set the XR's claimRef during Configure.
2. We set (and persist) the XRC's resourceRef during Bind
3. We persist the XR's claimRef during Apply.

This approach has a handful of benefits:

1. It prevents 'leaking' XRs. Previously if anything went wrong between when we
   first created an XR and when we persisted its resourceRef we'd return from
   the Reconcile and requeue a new one. The new reconcile would have no way of
   knowing that we'd already created an XR for this claim so it would create a
   new one.
2. It ensures dynamically provisioned XRs never exist without a claimRef - it's
   set when the XR is created. This can help avoid issues like crossplane#2464. Note that
   XRs should still be designed to support static binding - i.e. they should not
   assume that the claimRef will always be set at XR creation time.
3. It eliminates one superfluous API server write, because we only update the XR
   once, rather than applying it then updating it to bind it.

Note that this commit changes the behaviour of the (AFAIK _exceedingly_ obscure)
edge case in which someone attempts to claim an XR tha does not exist - i.e. by
explicitly setting their claim's resourceRef. Previously we'd assume they did in
fact want to claim an XR that hadn't yet been created, and wait around for it to
exist. Now we assume that the resourceRef was populated by a previous iteration
of the reconciler and proceed to dynamically provision an XR of that name. One
interesting side effect here is that this means claim authors can now influence
the (Kubernetes) name of the XR they would like to provision, though we'll still
bail out if they try to 'claim over' an XR that is claimed by some other entity.

Signed-off-by: Nic Cope <negz@rk0n.org>
Previously this was done during claim binding. In reality either are fine,
but it seems a little more 'in spirit' of the Configurator, which is broadly
responsible for late-initializing the claim with XR values.

Signed-off-by: Nic Cope <negz@rk0n.org>
Signed-off-by: Nic Cope <negz@rk0n.org>
Previously a malicious (or malformed) claim could reference an XR that was
already bound to another claim (potentially in another namespace), and when the
malicious claim was deleted the XR would be deleted too.

Signed-off-by: Nic Cope <negz@rk0n.org>
@negz
Copy link
Member Author

negz commented Aug 17, 2021

/backport

@github-actions
Copy link

Successfully created backport PR #2483 for release-1.2.

@github-actions
Copy link

Successfully created backport PR #2484 for release-1.3.

@negz
Copy link
Member Author

negz commented Aug 17, 2021

/backport

(Again, because I was too lazy to rebase the previous PRs.)

@github-actions
Copy link

The process '/usr/bin/git' failed with exit code 1

1 similar comment
@github-actions
Copy link

The process '/usr/bin/git' failed with exit code 1

@negz
Copy link
Member Author

negz commented Aug 17, 2021

/backport

@github-actions
Copy link

Successfully created backport PR #2490 for release-1.2.

@github-actions
Copy link

Successfully created backport PR #2491 for release-1.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants