Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate XR name locally by appending random suffix to the claim name #4780

Closed
wants to merge 1 commit into from

Conversation

pedjak
Copy link
Contributor

@pedjak pedjak commented Oct 11, 2023

Description of your changes

There is no need to make a round-trip to k8s api server to get XR name, we
have enough data on the controller side to construct an unique XR name, by
using the convention:

<CLAIM_NAME>-<5_CHAR_RANDOM_SUFFIX>

Change details:

  • removed APIDryRunCompositeConfigurator
  • updated configurator_test.go
  • updated and added more tests in reconciler_test.go

Fixes #4651

I have:

  • Read and followed Crossplane's contribution process.
  • Added or updated unit and E2E tests for my change.
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR, if necessary.
  • Opened a PR updating the docs, if necessary.

@pedjak pedjak requested review from turkenh and a team as code owners October 11, 2023 12:56
@pedjak pedjak requested a review from jbw976 October 11, 2023 12:56
@pedjak
Copy link
Contributor Author

pedjak commented Oct 11, 2023

As the discussion in #4742 has shown, we do not want to generate longer suffixes, because this is going to further reduce the max length of a claim name. This PR sticks to 6-char length suffix to avoid breaking API changes. /cc @sttts

@pedjak pedjak force-pushed the generate-xr-names-locally branch 2 times, most recently from 32118b1 to 0631917 Compare October 12, 2023 19:19
@pedjak
Copy link
Contributor Author

pedjak commented Oct 12, 2023

@sttts updated PR to handle situations when the generated name is already taken, ptal

// otherwise, we can move forward even if we requeue
cm.SetResourceReference(nil)
if err := r.client.Update(ctx, cm); err != nil {
return reconcile.Result{}, errors.Wrap(err, errUpdateClaim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see @phisco's PR handling conflict errors

@sttts
Copy link
Contributor

sttts commented Oct 13, 2023

Logic looks good 👍 All comments are about cosmetics, less events and conditions and go docs to be fixed.

@pedjak
Copy link
Contributor Author

pedjak commented Oct 13, 2023

@sttts I have addressed your comments, ptal

Copy link
Contributor

@sttts sttts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconciler logic looks good. PR needs some more eyes, especially for the tests.

@sttts sttts added this to the v1.14 milestone Oct 13, 2023
@pedjak pedjak force-pushed the generate-xr-names-locally branch 2 times, most recently from 1c87065 to c5ef3fe Compare October 16, 2023 11:11
@pedjak pedjak force-pushed the generate-xr-names-locally branch 3 times, most recently from d5bcb75 to e8e9fb5 Compare October 17, 2023 11:15
@turkenh
Copy link
Member

turkenh commented Oct 18, 2023

I believe this should also fix #4712, right?

@negz
Copy link
Member

negz commented Oct 20, 2023

Would it make sense for this to use the NameGenerator added in #4858?

@pedjak
Copy link
Contributor Author

pedjak commented Oct 21, 2023

Would it make sense for this to use the NameGenerator added in #4858?

It looks like overengineering, given that the composite configurator in this PR just became a pure function and it needs just a way to generate some random strings/names with a given base. Therefore, apiserver's NameGenerator is a great fit here:

type NameGenerator interface {
	// GenerateName generates a valid name from the base name, adding a random suffix to
	// the base. If base is valid, the returned name must also be valid. The generator is
	// responsible for knowing the maximum valid name length.
	GenerateName(base string) string
}

and there is already SimpleNameGenerator available for the use. It seems to me very unlikely that we would need to plug in another generator here ever.

On the other side, @sttts NameGenerator is just a wrapper around apiserver's NameGenerator - after getting a name, it invokes client.Get() method to check if the generated name is already taken. If yes, it retries. For that reason, the method signature must look like this:

type NameGenerator interface {
	GenerateName(ctx context.Context, cd resource.Object) error
}

This interface is about trying to guarantee name uniqueness. We need to pass Context and resource.Object just because of client.Get() calls. Therefore, it needs to return an error.

In this PR we do not have such expectations, we just wanna generate a random name, and if it already taken, we are going to recover in the next reconcile loop.

Once this PR gets in, we would like to apply the same approach on the composite side, and replace the work done in #4858 .

@jbw976 jbw976 removed this from the v1.14 milestone Oct 26, 2023
There is no need to make a round-trip to k8s api server
to get XR name, we have enough data on the controller side
to construct an unique XR name, by using the convention:

<CLAIM_NAME>-<5_CHAR_RANDOM_SUFFIX>

Change details:
* removed `APIDryRunCompositeConfigurator`
* updated `configurator_test.go`

Signed-off-by: Predrag Knezevic <predrag.knezevic@upbound.io>
Copy link
Member

@turkenh turkenh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU, there are two different problems that this PR aims to solve:

  1. Do not rely on Kube API (via dry-run) to generate a random name for the composite of a given claim => Take API server out of XR name generation path #4651
  2. Handle the edge case where the name is taken between the generation and the actual create call.

These problems existed with both claim -> composite and composite -> composed. #4858 solved the 1st problem for composite -> composed.

The reason why we found ourselves dealing with the 2nd problem here, instead of just fixing the 1st one (which is what #4651 asks for) is that, we realized that we increased the probability of the 2nd with the solution we put in this PR, whereas, #4858 claims that the approach used there didn't increase the probability of 2nd compared to the existing solution.

Is my understanding correct?

If yes, would it be ok to apply the approach in #4858 here and leave 2nd problem out of scope for this specific PR? We can discuss it separately with a dedicated issue and decide on whether we can live without it or should fix it.

I can also understand if we want to discuss here as well given there is already an implementation.

@@ -477,6 +480,19 @@ func (r *Reconciler) Reconcile(ctx context.Context, req reconcile.Request) (reco
if kerrors.IsConflict(err) {
return reconcile.Result{Requeue: true}, nil
}
if errors.Is(err, ErrBindCompositeConflict) {
// the claim refers to a composite belonging to a different claim
// this case can occur if:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could other cases where this could occur and I don't think we should handle all like this, i.e. reset the reference.
One example use case is, the user may intentionally get into this situation while moving a claim between namespaces without impacting the composite, see #4081

Considering the probability of hitting this due to a name conflict is low, I would propose just erroring out as before after making sure we did all we can do to decrease that probability to its minimum. Given that we are not hearing this as a pain from users with the current implementation, this feels like a good tradeoff to me.

Copy link
Contributor

@sttts sttts Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must be careful here. The name generation as implemented here increases the probability of conflict considerably. Dry-run (and my implementation) does an ad-hoc freeness check of the generated name. That reduces probability of conflicts to claims in-flight. The code here though does not check freeness and hence you get conflicts between all existing claims and the new claim. Over the runtime of a controlplane with many claim creations this will lead to a broken claim eventually (somewhere I did the math, 50% probability for a thousand claims or so).

Hence, either we keep full conflict resolution here or we switch to a freeness check. Otherwise, we regress.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One example use case is, the user may intentionally get into this situation while moving a claim between namespaces without impacting the composite

I think the instructions shared in #4081 should be extended in the following way:

  • original claim should be paused first, to avoid reconciliation

  • Get a copy of the claim

  • Update the references in the corresponding composite:

    metadata.labels[crossplane.io/claim-namespace]
    spec.claimRef.namespace

  • Change the metadata.namespace to the new one and apply it.

  • Edit the original claim and remove the finalizers

  • Delete the original claim.

In this way, the proposed change can remain.

switch {
case kerrors.IsAlreadyExists(err):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nits:

  • this could be thrown only from the Create above, so bugging me to have here for both.
  • Would it be possible to convert the above else to an early return somehow 🤔

@negz
Copy link
Member

negz commented Nov 20, 2023

If yes, would it be ok to apply the approach in #4858 here and leave 2nd problem out of scope for this specific PR? We can discuss it separately with a dedicated issue and decide on whether we can live without it or should fix it.

I agree with @turkenh. If I understand correctly:

  • In compositor: minimal invasive name generation without dry-run #4858 @sttts introduces a NameGenerator pattern. This generator uses the API server's generateName logic to randomly generate a name, then makes a best-effort attempt to ensure its available before creation.
  • In this PR @pedjak also uses the API server's generateName logic to randomly generate a name. Instead of making a best-effort attempt to ensure the generated name is available before creation, in this PR @pedjak relies on requeues to retry on name collisions. A side-effect of this approach is that sometimes we must revert state (e.g. revert a persisted reference to the XR).

Is there a reason to use a different approach in this controller from the approach we take in the XR controller? If not, I would prefer us to solve the problem consistently (no pun intended 😄).

PS - Given I'm about to be OOO for a month I trust @turkenh to review this. No need to wait for my approval.

@pedjak
Copy link
Contributor Author

pedjak commented Nov 22, 2023

@turkenh @negz

Is my understanding correct?

This PR primary solves #4651 - it generates composite names using the same logic available on the API server side, but without making any requests to API server. In this way, we avoid unnecessary communication with the API server, lowering the number of request that API server needs to process.

In the case of composite creation we save one request per claim, but in the case of managed resource creation, we could avoid sending a significant number of requests, depending how many managed resources are defined within a composition (real examples easily declare 10-20 managed resources).

#4858 does generate the names locally, but checks their availability by issuing Get request to the API server, because Get requests for unstructured objects are not cached. Thus, we have replaced a dry-run patch request with a get one, but we did not reduce the number of calls going to API server.

The logic here checks for name availability at the composite creation - if the name is taken, the create request fails, and we are going to requeue, generate a new name and try to create the composite again.

NameGenerator introduced in #4858 does not give us any guaranties that the name is not going to be taken before we try to create the composite. Given that we use patch request for that under the hood, there is a chance that we are not going to create a new composite - instead we are going to update an existing one and even worse we are going to hijack it from another claim. Probability of such event is not high, but if it happens, it can have even security implications, because suddenly a claim gets attached to composite that could belong to a different team/organization/tenant.

This PR solves the described problem as well.

If unnecessary communication with the API server is not something we are concerned with (both claim and composite controller), this PR can be closed. For the consistency, NameGenerator can be applied on the claim controller side, but the possibility to hijack another composite still remains.

@turkenh
Copy link
Member

turkenh commented Dec 13, 2023

If yes, would it be ok to apply the approach in #4858 here and leave 2nd problem out of scope for this specific PR? We can discuss it separately with a dedicated issue and decide on whether we can live without it or should fix it.

As proposed above, we decided not to proceed with this PR due to the complications it is introducing, especially the necessity to always do a migration in the first call after creation. So, currently we decided to:

@pedjak feel free to close this PR or keep it open by updating against latest master. I don't think it is a priority right now, but feel free to chose whatever makes more sense for you.

@pedjak pedjak closed this Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Take API server out of XR name generation path
5 participants