Skip to content
This repository has been archived by the owner on May 3, 2022. It is now read-only.

installation controller: reinstall deleted objects #205

Merged
merged 3 commits into from
Oct 10, 2019

Conversation

juliogreff
Copy link
Contributor

One of the biggest complaints we get from users not being able to finish
their rollouts is that their incumbents are unhealthy, and one of the
biggest causes for that is when Deployments disappear[1] from the target
clusters. This happens because the installation controller does not even
try to replace objects after its first pass is completed, and
CanOverride is set to false.

Now, the installation controller will do a full run of the installer on
every sync, reinstalling any objects that are gone, whatever their kind.
Additionally, we'll start to listen to Service and Deployment events, so
that we can sync the InstallationTarget whenever someone messes with
these objects in the target clusters. We do this only for Deployments
and Services because they're the only ones that can cause shipper to
hang, as it depends on them in the capacity and traffic controller,
respectively.

This PR closes #110

[1] Users swear they haven't deleted them. We have yet to find another
cause for this mistery.

@juliogreff juliogreff added the bug Something isn't working label Sep 27, 2019
@juliogreff juliogreff added this to the release-0.7 milestone Sep 27, 2019
One of the biggest complaints we get from users not being able to finish
their rollouts is that their incumbents are unhealthy, and one of the
biggest causes for that is when Deployments disappear[1] from the target
clusters. This happens because the installation controller does not even
try to replace objects after its first pass is completed, and
CanOverride is set to false.

Now, the installation controller will do a full run of the installer on
every sync, reinstalling any objects that are gone, whatever their kind.
Additionally, we'll start to listen to Service and Deployment events, so
that we can sync the InstallationTarget whenever someone messes with
these objects in the target clusters. We do this only for Deployments
and Services because they're the only ones that can cause shipper to
hang, as it depends on them in the capacity and traffic controller,
respectively.

[1] Users swear they haven't deleted them. We have yet to find another
cause for this mistery.
@parhamdoustdar parhamdoustdar merged commit bd54e12 into master Oct 10, 2019
@parhamdoustdar parhamdoustdar deleted the jgreff/it-reinstall branch October 10, 2019 10:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Installation controller should recreate resources on application clusters if they don't exist
3 participants