fix(app): rollback all process types to previous version when one (or more) process type fails a deploy#1027
Conversation
|
@kmala, @bacongobbler and @mboersma are potential reviewers of this pull request based on my analysis of |
e29b5a6 to
164569a
Compare
Current coverage is 87.09% (diff: 70.00%)@@ master #1027 diff @@
==========================================
Files 42 42
Lines 3567 3573 +6
Methods 0 0
Messages 0 0
Branches 602 603 +1
==========================================
+ Hits 3106 3112 +6
Misses 301 301
Partials 160 160
|
| # This goes in the log before the rollback starts | ||
| self.log(err, logging.ERROR) | ||
| # revert all process types to old release | ||
| self.deploy(release.previous(), force_deploy=True) |
There was a problem hiding this comment.
if there is an api exception/unavailability doesn't this call deploy in recursion?
There was a problem hiding this comment.
It will recurse all the way until it finds a good deploy, and the last one would be no build which we bail out on at the top, or you hit the 100 recursion limit (or so) in python.
There was a problem hiding this comment.
yep. this code could technically roll back all the way when the controller is refusing connections like with #1019 and raise a NotExists error when it's at v1 and previous() raises that NotExists.
There was a problem hiding this comment.
I think it is worth letting it go as far as possible back when it is doing a rollback but another thought would be to have a function arg that says "stop on failure", so we'd let it rollback one level and that's it
There was a problem hiding this comment.
we should just rollback to the previous release as that is what user expects as it has to be successful release otherwise we are doing deploy wrong.
There was a problem hiding this comment.
Then I will implement my idea as that only goes to the previous release by only setting stop_on_failure=True when we hit the rollback
1f3127a to
3a3ae0a
Compare
|
Haven't tested this, but code LGTM. |
… more) process type fails a deploy Previously only the failed process type would roll itself back, resulting in a scenario where worker may be on v6 and web on v5 This moves away from using the built in rollback functionality in Deployments and rather deploys the previous release again to get the same effect. Rollback in Deployments was taking it to the last good known deployment, in this case that'd be the latest for some types... knowing the revision of the last deploy would be hard as well. The way it works when deploying the old release is an identical replicaset (and thus identical template pod hash is generated) so things will very much so look like a native rollback. If the Controller has done any changes to how it constructs the pod manifest then this could generate a totally new ReplicaSet but that is also fine as it will be booting the previous release from DB Fixes deis#1013
Previously only the failed process type would roll itself back, resulting in a scenario where worker may be on v6 and web on v5
This moves away from using the built in rollback functionality in Deployments and rather deploys the previous release again to get the same effect.
Rollback in Deployments was taking it to the last good known deployment, in this case that'd be the latest for some types... knowing the revision of the last deploy would be hard as well.
The way it works when deploying the old release is an identical replicaset (and thus identical template pod hash is generated) so things will very much so look like a native rollback.
If the Controller has done any changes to how it constructs the pod manifest then this could generate a totally new ReplicaSet but that is also fine as it will be booting the previous release from DB
Fixes #1013
Test Plan
deis create --no-remote test-rollbackdeis pull deis/example-go -a test-rollbackdeis config:set foo=bar -a test-rollbackdeis pull deis/example-foo -a test-rollbackThere should be an error message along the lines of
(app::deploy): There was a problem deploying v6. Rolling back process types to release v5.\nThere was a problem while deploying v6 of test-rollback-cmd. Additional information:\nError: image deis/example-foo:latest not foundRunning
deis logs -a test-rollbackshould show some of that, includingThere was a problem deploying v6. Rolling back process types to release v5twice. That's okay and is due to deis/logger#109The first one is when the internal rollback starts and the second one is when that message is combined with the information returned back to the user