Skip to content
This repository has been archived by the owner on Nov 30, 2021. It is now read-only.

fix(deisctl): use wait groups for stopping and uninstalling store #2307

Merged
merged 4 commits into from Oct 29, 2014

Conversation

carmstrong
Copy link
Contributor

When we start deis-store components, we do so in a specific order to ensure they come up properly. Unfortunately, we don't do this for stop/uninstall, and bring them all down simultaneously. This seems to result in an issue that occurs when upgrading installations where the components don't all come back properly (see #2300, #2301, #2294, #2286). This PR changes deisctl to use wait groups in the stop and uninstall commands to also stop these services in an optimal order. Additionally, this PR also updates the "upgrading Deis" instructions to instruct users to stop services before uninstalling them. This gives components the opportunity to exit in a clean state rather than be forced down uncleanly.

Note that services should always be stopped before an uninstall to guarantee a safe shutdown, but we use a specific start order for uninstall just in case this helps when people don't stop first.

Unfortunately this doesn't help people who are upgrading from v0.14.x to v0.15.0 (as the fix will only be in upcoming releases of deisctl), but a note was added to the "upgrading Deis" instructions with a workaround to address the issue.

TESTING: I tested this PR by installing v0.15.0 of Deis, performing a deisctl install platform && deisctl start platform followed by deisctl stop platform && deisctl uninstall platform, and then again a deisctl install platform && deisctl start platform. Everything came back happily.

closes #2300
refs #2301, refs #2294, refs #2286

@carmstrong carmstrong self-assigned this Oct 29, 2014
@carmstrong carmstrong added this to the 0.16 milestone Oct 29, 2014
@gabrtv
Copy link
Member

gabrtv commented Oct 29, 2014

LGTM. Clean shutdown is a definite improvement. However, I'd also like to do some more testing to make sure this fixes the reported issues.

@mboersma
Copy link
Member

Code LGTM, but we should test updating v0.14.1 -> this.

@bacongobbler
Copy link
Member

I have an 0.14.1 cluster. I'd be happy to test this if it's ready for testing.

@carmstrong
Copy link
Contributor Author

However, I'd also like to do some more testing to make sure this fixes the reported issues.

The only issue I'm asserting this PR resolves is #2300, and that's mostly because of the added workaround to the documentation. Everything else is an optimistic improvement, although I'm fairly confident in the upgrade path moving forward based on my testing of this PR.

Code LGTM, but we should test updating v0.14.1 -> this.

We really can't, as you'd need the logic from this branch to do the stop platform && uninstall platform, but it'll err out because deis-store-volume doesn't exist in v0.14.1.

@mboersma
Copy link
Member

you'd need the logic from this branch to do the stop platform && uninstall platform...

Right. I guess I can stop and uninstall them one-by-one in that order to emulate this change and see if that approach works, then validate that your deisctl changes act the same way.

@carmstrong
Copy link
Contributor Author

Right. I guess I can stop and uninstall them one-by-one in that order to emulate this change and see if that approach works, then validate that your deisctl changes act the same way.

That'd be a good test. 👍

@carmstrong
Copy link
Contributor Author

Just pushed a new commit to start gateway before volume. Some users are seeing volume hang on new installs. I'd really like to avoid having to put sleeps back in the /bin/boot scripts.

@carmstrong
Copy link
Contributor Author

I just deployed a fresh Deis v0.15 three times with this branch and haven't seen a single store issue.

@mboersma
Copy link
Member

I did the same steps as in #2300 again--although I did deisctl stop of each service individually this time while looking at this PR code to emulate how deisctl will handle it. Everything upgraded and started fine. Now I'm "upgrading" again from v0.15.0 to v0.15.0 just to make sure.

@carmstrong
Copy link
Contributor Author

Everything upgraded and started fine. Now I'm "upgrading" again from v0.15.0 to v0.15.0 just to make sure.

Glad to hear it!

@mboersma
Copy link
Member

LGTM. My testing went fine, although it was tricky to do. But this should improve things when it lands in the next deisctl release.

@carmstrong
Copy link
Contributor Author

@gabrtv Are you okay with merging this now that @mboersma and I are fairly confident that this fixes the store startup issues?

@gabrtv
Copy link
Member

gabrtv commented Oct 29, 2014

:shipit:

carmstrong added a commit that referenced this pull request Oct 29, 2014
fix(deisctl): use wait groups for stopping and uninstalling store
@carmstrong carmstrong merged commit 5f510d5 into deis:master Oct 29, 2014
@carmstrong carmstrong deleted the deisctl_wait_on_stop branch October 29, 2014 19:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

store-volume fails after update to v0.15.0
4 participants