Do not requeue on successful revision reconcile #2503

hasheddan · 2021-08-19T16:03:26Z

Description of your changes

This is a fairly substantial change in the package revision controller
behavior, but should drastically reduce the number of calls we make to
the API Server. We previously reconciled package revisions every minute
after a successful reconcile, but we now opt for only requeing if a
reconcile is not successful. This means that installed resources could
drift from what is specified in the package, but the odds of this
causing negative impact on an environment are somewhat mininal.

We also update to watch owned Deployments for a ProviderRevision as we
need to be notified immediately if a provider Deployment is unhealthy.
This does have the consequence that we will update all installed
resources on a regular cadence in the case that a provider Deployment
remains unhealthy. That being said, an unhealthy provider Deployment
should call for urgent intervention, either by the controller or the
cluster administrator.

A potential negative side-effect of not regularly reconciling
ProviderRevisions is that an updated ControllerConfig will not flow
through to the provider Deployment within a minute, which was the
previous guarantee. That being said, any update to the Provider itself
will trigger a reconcile, which would cause ControllerConfig changes to
also be picked up.

Update: we are now enqueing reconciles for ProviderRevisions when
an event on a ControllerConfig it references is observed.

Signed-off-by: hasheddan georgedanielmangum@gmail.com

Fixes #2449

Note: I could go either way on this being backported. Typically I would not want to backport a behavior change, but this feels like it could be considered a "bug fix" given that the excessive API calls are causing issues for folks. Because I don't foresee if having negative impact on any existing users, I have opted to backport.

I have:

Read and followed Crossplane's contribution process.
Run make reviewable to ensure this PR is ready for review.
Added backport release-x.y labels to auto-backport this PR if necessary.

How has this code been tested

Tested this by installing provider-aws and the getting-started-with-gcp Configuration. Once resources were established and Deployment was healthy, I observed no further reconciles taking place. I also observed that when a Deployment became unhealthy, we reconcile on our short wait cadence (i.e. 30s) until it becomes healthy again, at which point we cease requeing.

This is a fairly substantial change in the package revision controller behavior, but should drastically reduce the number of calls we make to the API Server. We previously reconciled package revisions every minute after a successful reconcile, but we now opt for only requeing if a reconcile is not successful. This means that installed resources could drift from what is specified in the package, but the odds of this causing negative impact on an environment are somewhat mininal. We also update to watch owned Deployments for a ProviderRevision as we need to be notified immediately if a provider Deployment is unhealthy. This does have the consequence that we will update all installed resources on a regular cadence in the case that a provider Deployment remains unhealthy. That being said, an unhealthy provider Deployment should call for urgent intervention, either by the controller or the cluster administrator. A potential negative side-effect of not regularly reconciling ProviderRevisions is that an updated ControllerConfig will not flow through to the provider Deployment within a minute, which was the previous guarantee. That being said, any update to the Provider itself will trigger a reconcile, which would cause ControllerConfig changes to also be picked up. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>

Updates all package revision tests to not requeue in the event that we successfully reconcile. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>

hasheddan

I have some general concerns around the fact that we will still hit the API server a bunch of times in the event that the Deployment is unhealthy or a case where we consistently had issues with "Object has been modified" errors -- i.e. I don't love that we have to check all of the installed resources for a package revision every time we reconcile as it means any reconcile is fairly expensive (depending on the number of resources in the package). That being said, I am also not a fan of conditionally checking resources on reconcile because I want to preserve our guarantee that we won't create a Deployment for a Provider unless all resources can be controlled, and conditionally checking resources would mean there could be a case in which we don't check, but also may create or update a Deployment.

turkenh · 2021-08-19T18:29:14Z

A potential negative side-effect of not regularly reconciling
ProviderRevisions is that an updated ControllerConfig will not flow
through to the provider Deployment within a minute, which was the
previous guarantee. That being said, any update to the Provider itself
will trigger a reconcile, which would cause ControllerConfig changes to
also be picked up.

@hasheddan Isn't it an option to watch ControllerConfig objects as well to fix this?

I have some general concerns around the fact that we will still hit the API server a bunch of times in the event that the Deployment is unhealthy or a case where we consistently had issues with "Object has been modified" errors -- i.e. I don't love that we have to check all of the installed resources for a package revision every time we reconcile as it means any reconcile is fairly expensive (depending on the number of resources in the package). That being said, I am also not a fan of conditionally checking resources on reconcile because I want to preserve our guarantee that we won't create a Deployment for a Provider unless all resources can be controlled, and conditionally checking resources would mean there could be a case in which we don't check, but also may create or update a Deployment.

In such a case, would we hit API server with exponential backoff? If that is the case, I wouldn't be concerned too much given it will be less frequent than the current case.

hasheddan · 2021-08-19T19:41:18Z

Isn't it an option to watch ControllerConfig objects as well to fix this?

@turkenh good point -- we could and use a predicate to only enqueue for ProviderConfig that reference a given ControllerConfig -- Will update 👍🏻

In such a case, would we hit API server with exponential backoff? If that is the case, I wouldn't be concerned too much given it will be less frequent than the current case.

We actually won't hit backoff here because we aren't returning an error. We could opt to return an error, but would likely want to tune backoff because we don't want to retry a bunch of times immediately and reach max backoff in the case that a Deployment is restarting a Pod or something. I agree that this change is an improvement over status quo though. I am planning on looking into the backoff we use across all controllers as a separate effort.

Adds an event handler for enqueing requests for providerrevision when an event is observed on a controllerconfig it references. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>

hasheddan · 2021-08-19T21:06:08Z

internal/controller/pkg/revision/watch.go

@@ -0,0 +1,87 @@
+/*


Thank you @negz for basically writing all of this code for me :)

(copied with slight modifications from EnqueueRequestForAllRevisionsWithRequests)

hasheddan

@turkenh updated here if you want to take another look 👍🏻

turkenh

LGTM👍

internal/controller/pkg/revision/watch.go

Updates providerrevision controller to watch for events on controllerconfigs. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>

Fixes a small typo in the providerrevision controllerconfig handler docstring. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>

muvaf

LGTM! Thanks @hasheddan , I think the tradeoff makes sense here. IMO, it's fine to rely on apiserver eventing for resources that do not get actively changed by the user, like managed resources or composition.

muvaf · 2021-08-20T20:53:25Z

internal/controller/pkg/revision/watch.go

+// EnqueueRequestForReferencingProviderRevisions enqueues a request for all
+// provider revisions that reference a ControllerConfig when the given
+// ControllerConfig changes.
+type EnqueueRequestForReferencingProviderRevisions struct {


I wonder if we can come up with something like this (maybe one that accepts add function?) which would allow us not duplicate code but it's definitely non-blocking.

github-actions · 2021-08-20T22:22:50Z

Backport failed for release-1.2, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin release-1.2
git worktree add -d .worktree/backport-2503-to-release-1.2 origin/release-1.2
cd .worktree/backport-2503-to-release-1.2
git checkout -b backport-2503-to-release-1.2
ancref=$(git merge-base 19571ad84bdbe15e36409449c4927275c05803ba f3cdf6af356bda9f59040687669a31da5a756c81)
git cherry-pick -x $ancref..f3cdf6af356bda9f59040687669a31da5a756c81

github-actions · 2021-08-20T22:22:51Z

Backport failed for release-1.1, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin release-1.1
git worktree add -d .worktree/backport-2503-to-release-1.1 origin/release-1.1
cd .worktree/backport-2503-to-release-1.1
git checkout -b backport-2503-to-release-1.1
ancref=$(git merge-base 19571ad84bdbe15e36409449c4927275c05803ba f3cdf6af356bda9f59040687669a31da5a756c81)
git cherry-pick -x $ancref..f3cdf6af356bda9f59040687669a31da5a756c81

github-actions · 2021-08-20T22:22:55Z

Successfully created backport PR #2509 for release-1.3.

hasheddan added backport release-1.2 backport release-1.1 backport release-1.3 labels Aug 19, 2021

hasheddan requested review from negz and turkenh August 19, 2021 16:03

hasheddan force-pushed the nicer-pkgs branch from b962052 to c7afedd Compare August 19, 2021 16:04

hasheddan force-pushed the nicer-pkgs branch from c7afedd to 678ce39 Compare August 19, 2021 16:05

hasheddan requested a review from muvaf August 19, 2021 16:05

Update package revision tests to not requeue on success

40251fd

Updates all package revision tests to not requeue in the event that we successfully reconcile. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>

hasheddan commented Aug 19, 2021

View reviewed changes

Add event handler for controller configs referenced by providerrevisions

f76d199

Adds an event handler for enqueing requests for providerrevision when an event is observed on a controllerconfig it references. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>

hasheddan commented Aug 19, 2021

View reviewed changes

hasheddan commented Aug 20, 2021

View reviewed changes

turkenh approved these changes Aug 20, 2021

View reviewed changes

internal/controller/pkg/revision/watch.go Outdated Show resolved Hide resolved

hasheddan added 2 commits August 20, 2021 16:11

Watch controllerconfigs in providerrevision controller

b675aee

Updates providerrevision controller to watch for events on controllerconfigs. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>

Fix minor typo in providerrevision controllerconfig handler

f3cdf6a

Fixes a small typo in the providerrevision controllerconfig handler docstring. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>

hasheddan force-pushed the nicer-pkgs branch from a3c99e9 to f3cdf6a Compare August 20, 2021 20:11

muvaf approved these changes Aug 20, 2021

View reviewed changes

hasheddan merged commit 45a7678 into crossplane:master Aug 20, 2021

github-actions bot mentioned this pull request Aug 20, 2021

[Backport release-1.3] Do not requeue on successful revision reconcile #2509

Merged

This was referenced Aug 22, 2021

[(Manual) Backport release-1.2] Do not requeue on successful revision reconcile #2511

Merged

[(Manual) Backport release-1.1] Do not requeue on successful revision reconcile #2512

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not requeue on successful revision reconcile #2503

Do not requeue on successful revision reconcile #2503

hasheddan commented Aug 19, 2021 •

edited

hasheddan left a comment

turkenh commented Aug 19, 2021

hasheddan commented Aug 19, 2021

hasheddan Aug 19, 2021

hasheddan left a comment

turkenh left a comment

muvaf left a comment

muvaf Aug 20, 2021

github-actions bot commented Aug 20, 2021

github-actions bot commented Aug 20, 2021

github-actions bot commented Aug 20, 2021

Do not requeue on successful revision reconcile #2503

Do not requeue on successful revision reconcile #2503

Conversation

hasheddan commented Aug 19, 2021 • edited

Description of your changes

How has this code been tested

hasheddan left a comment

Choose a reason for hiding this comment

turkenh commented Aug 19, 2021

hasheddan commented Aug 19, 2021

hasheddan Aug 19, 2021

Choose a reason for hiding this comment

hasheddan left a comment

Choose a reason for hiding this comment

turkenh left a comment

Choose a reason for hiding this comment

muvaf left a comment

Choose a reason for hiding this comment

muvaf Aug 20, 2021

Choose a reason for hiding this comment

github-actions bot commented Aug 20, 2021

github-actions bot commented Aug 20, 2021

github-actions bot commented Aug 20, 2021

hasheddan commented Aug 19, 2021 •

edited