Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add machinery for tracking external resource create time #280

Closed
wants to merge 2 commits into from

Conversation

negz
Copy link
Member

@negz negz commented Aug 16, 2021

Description of your changes

Adds plumbing to fix at least one variant of crossplane-contrib/provider-aws#802.

Per crossplane-contrib/provider-aws#802 some external APIs (including some AWS APIs) appear to experience some delay between the time a new resource is successfully created and the time at which that resource appears in queries.

This commit adds a new 'crossplane.io/external-create-time' annotation and a new 'ResourcePending' field in the Observation struct. Together these can be used by an Observe method to allow for a small grace period before it determines that a resource does not exist. For example:

func Observe(ctx context.Context, mg resource.Managed) (managed.ExternalObservation, error) {
  err := api.Get(AsInput(mg))
  if IsNotFound(err) {
    if t := meta.GetExternalCreateTime(); t != nil && time.Since(t.Time) < 1 * time.Minute {
      // We're in the grace period - wait a bit longer for the resource to appear.
      return managed.ExternalObservation{ResourcePending: true}, nil
    }
    // The resource does not exist.
    return managed.ExternalObservation{ResourceExists: false}, nil
  }
  if err != nil {
    return managed.ExternalObservation{}, err
  }
  return managed.ExternalObervation{ResourceExists: true}
}

func Create(ctx context.Context, mg resource.Managed) (managed.ExternalCreation, error) {
  _ := api.Create(AsInput(mg))
  meta.SetExternalCreateTime()
  return managed.ExternalCreation{ExternalCreateTimeSet: true}, nil
}

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable test to ensure this PR is ready for review.

How has this code been tested

Per #279 I've updated the AWS RouteTable and InternetGateway controllers to use the above pattern and run the script in https://gist.github.com/negz/e1f2e74f18802d15440214a1a1abc981 for 12+ hours. I did not observe any leaked route tables or internet gateways during my testing, whereas without this fix I reliably see 2-3 leaked resources after 2-3 hours.

Per crossplane-contrib/provider-aws#802 some external APIs
(including some AWS APIs) appear to experience some delay between the time a new
resource is successfully created and the time at which that resource appears in
queries.

This commit adds a new 'crossplane.io/external-create-time' annotation and a new
'ResourcePending' field in the Observation struct. Together these can be used by
an Observe method to allow for a small grace period before it determines that a
resource does not exist. For example:

```go
func Observe(ctx context.Context, mg resource.Managed) (managed.ExternalObservation, error) {
  err := api.Get(AsInput(mg))
  if IsNotFound(err) {
    if t := meta.GetExternalCreateTime(); t != nil && time.Since(t.Time) < 1 * time.Minute {
      // We're in the grace period - wait a bit longer for the resource to appear.
      return managed.ExternalObservation{ResourcePending: true}, nil
    }
    // The resource does not exist.
    return managed.ExternalObservation{ResourceExists: false}, nil
  }
  if err != nil {
    return managed.ExternalObservation{}, err
  }
  return managed.ExternalObervation{ResourceExists: true}
}

func Create(ctx context.Context, mg resource.Managed) (managed.ExternalCreation, error) {
  _ := api.Create(AsInput(mg))
  meta.SetExternalCreateTime()
  return managed.ExternalCreation{ExternalCreateTimeSet: true}, nil
}
```

Signed-off-by: Nic Cope <negz@rk0n.org>
negz added a commit to negz/provider-aws that referenced this pull request Aug 16, 2021
Per crossplane-contrib#802 there seems to be some
lag between when some EC2 networking resources (RouteTables, InternetGateways)
are created and when they actually show up in queries. This commit leverages
crossplane/crossplane-runtime#280 to allow for this.

Signed-off-by: Nic Cope <negz@rk0n.org>
negz added a commit to negz/provider-aws that referenced this pull request Aug 16, 2021
Per crossplane-contrib#802 there seems to be some
lag between when some EC2 networking resources (RouteTables, InternetGateways)
are created and when they actually show up in queries. This commit leverages
crossplane/crossplane-runtime#280 to allow for this.

Signed-off-by: Nic Cope <negz@rk0n.org>
pkg/meta/meta.go Show resolved Hide resolved
pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
This commit moves the responsibility of setting the external-create-time
annotation into the managed.Reconciler. As part of this we now persist
annotations after every successful create.

Signed-off-by: Nic Cope <negz@rk0n.org>
@@ -631,6 +650,12 @@ func (r *Reconciler) Reconcile(_ context.Context, req reconcile.Request) (reconc
return reconcile.Result{Requeue: true}, errors.Wrap(r.client.Status().Update(ctx, managed), errUpdateManagedStatus)
}

if observation.ResourcePending {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to calculate this here instead of expecting controller to return? I wonder if we can say ignore ResourceExists: false if the creation time annotation is set and X seconds delay has not passed yet where X is given in Setup function of the controller. I guess the downside is that retries will have some delay (mostly 1-2s maybe?) but it'd let us not have to worry about eventually consistent APIs one by one when people open issues about leaking.

// Create implementations are advised not to alter status, but
// we may revisit this in future.
a := managed.GetAnnotations()
if err := retry.OnError(retry.DefaultRetry, resource.IsAPIError, func() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we run this update with retries only in case external name assigned because we lose the status when we do this operation and there are cases where information is only available as result of creation output. One possible solution could be to save the creation time before we call Create though that would require two updates with retries, the one after create being inside conditional, which increases the complexity here.

@chlunde
Copy link

chlunde commented Aug 27, 2021

I think this issue affect almost all AWS services due to a generated ID in external-name, so I think keeping the code duplication/complexity per controller low would be nice.

So I wonder if it would be possible with an API more like:

// SetupRouteTable adds a controller that reconciles RouteTables.
func SetupRouteTable(mgr ctrl.Manager, l logging.Logger, rl workqueue.RateLimiter, poll time.Duration) error {
	name := managed.ControllerName(v1beta1.RouteTableGroupKind)

	return ctrl.NewControllerManagedBy(mgr).
		Named(name).
		WithOptions(controller.Options{
			RateLimiter: ratelimiter.NewDefaultManagedRateLimiter(rl),
		}).
		For(&v1beta1.RouteTable{}).
...
+			managed.WithRecreateGraceDelay(3*time.Minute),
			managed.WithLogger(l.WithValues("controller", name)),
...
}

Now crossplane-runtime would be responsible for annotating the creation timestamp and delaying re-create. This would be executed whenever managed.ExternalCreation{ExternalNameAssigned: true...

I have not thought through how it would look from the runtime side, I just feel that it would be hard to review all controllers if the per-controller code is similar to provider-aws#805:

  • Includes the time.Since-check in Observe - adding branches
  • You must set ExternalCreateTimeSet: true,
  • You must also call meta.SetExternalCreateTime whenever you have that line of code
  • ...and since you have this logic in the controller code, you will have to add tests too

Isn't there almost enough information in the current observe code to know we should wait, if we just do the small API addition suggested above?

  • Observe returns ResourceExists: false
  • There's a "recently created" annotation on the resource
  • The controller setup has notified that it needs a grace delay with a call like WithRecreateGraceDelay

@negz
Copy link
Member Author

negz commented Aug 30, 2021

@muvaf / @chlunde I like your suggestion to avoid ExternalClient authors needing to deal with (much of) this. I'm working on a new PR that combines this one with #279 (per @turkenh's suggestion). I think it should be possible to make your changes there.

@negz negz marked this pull request as draft August 31, 2021 07:12
@negz
Copy link
Member Author

negz commented Aug 31, 2021

Closing in favor of #283.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants