fix(app): rollback all process types to previous version when one (or more) process type fails a deploy by helgi · Pull Request #1027 · deis/controller

helgi · 2016-08-30T21:52:01Z

Previously only the failed process type would roll itself back, resulting in a scenario where worker may be on v6 and web on v5

This moves away from using the built in rollback functionality in Deployments and rather deploys the previous release again to get the same effect.
Rollback in Deployments was taking it to the last good known deployment, in this case that'd be the latest for some types... knowing the revision of the last deploy would be hard as well.

The way it works when deploying the old release is an identical replicaset (and thus identical template pod hash is generated) so things will very much so look like a native rollback.
If the Controller has done any changes to how it constructs the pod manifest then this could generate a totally new ReplicaSet but that is also fine as it will be booting the previous release from DB

Fixes #1013

Test Plan

deis create --no-remote test-rollback
deis pull deis/example-go -a test-rollback
deis config:set foo=bar -a test-rollback
deis pull deis/example-foo -a test-rollback

There should be an error message along the lines of (app::deploy): There was a problem deploying v6. Rolling back process types to release v5.\nThere was a problem while deploying v6 of test-rollback-cmd. Additional information:\nError: image deis/example-foo:latest not found

Running deis logs -a test-rollback should show some of that, including There was a problem deploying v6. Rolling back process types to release v5 twice. That's okay and is due to deis/logger#109

The first one is when the internal rollback starts and the second one is when that message is combined with the information returned back to the user

deis-bot · 2016-08-30T21:52:05Z

@kmala, @bacongobbler and @mboersma are potential reviewers of this pull request based on my analysis of git blame information. Thanks @helgi!

codecov-io · 2016-08-30T23:11:13Z

Current coverage is 87.09% (diff: 70.00%)

Merging #1027 into master will increase coverage by 0.02%

@@             master      #1027   diff @@
==========================================
  Files            42         42          
  Lines          3567       3573     +6   
  Methods           0          0          
  Messages          0          0          
  Branches        602        603     +1   
==========================================
+ Hits           3106       3112     +6   
  Misses          301        301          
  Partials        160        160

Powered by Codecov. Last update 3f3a228...a45643c

kmala · 2016-08-31T16:29:33Z

+                # This goes in the log before the rollback starts
+                self.log(err, logging.ERROR)
+                # revert all process types to old release
+                self.deploy(release.previous(), force_deploy=True)


if there is an api exception/unavailability doesn't this call deploy in recursion?

It will recurse all the way until it finds a good deploy, and the last one would be no build which we bail out on at the top, or you hit the 100 recursion limit (or so) in python.

yep. this code could technically roll back all the way when the controller is refusing connections like with #1019 and raise a NotExists error when it's at v1 and previous() raises that NotExists.

I think it is worth letting it go as far as possible back when it is doing a rollback but another thought would be to have a function arg that says "stop on failure", so we'd let it rollback one level and that's it

we should just rollback to the previous release as that is what user expects as it has to be successful release otherwise we are doing deploy wrong.

Then I will implement my idea as that only goes to the previous release by only setting stop_on_failure=True when we hit the rollback

bacongobbler · 2016-09-02T17:43:41Z

Haven't tested this, but code LGTM.

… more) process type fails a deploy Previously only the failed process type would roll itself back, resulting in a scenario where worker may be on v6 and web on v5 This moves away from using the built in rollback functionality in Deployments and rather deploys the previous release again to get the same effect. Rollback in Deployments was taking it to the last good known deployment, in this case that'd be the latest for some types... knowing the revision of the last deploy would be hard as well. The way it works when deploying the old release is an identical replicaset (and thus identical template pod hash is generated) so things will very much so look like a native rollback. If the Controller has done any changes to how it constructs the pod manifest then this could generate a totally new ReplicaSet but that is also fine as it will be booting the previous release from DB Fixes deis#1013

helgi added bug awaiting review k8s labels Aug 30, 2016

helgi added this to the v2.5 milestone Aug 30, 2016

helgi self-assigned this Aug 30, 2016

helgi force-pushed the ticket_1013 branch 2 times, most recently from e29b5a6 to 164569a Compare August 30, 2016 22:51

kmala reviewed Aug 31, 2016
View reviewed changes

helgi force-pushed the ticket_1013 branch 4 times, most recently from 1f3127a to 3a3ae0a Compare September 1, 2016 23:31

bacongobbler added the LGTM1 label Sep 2, 2016

helgi force-pushed the ticket_1013 branch from 3a3ae0a to a45643c Compare September 2, 2016 21:16

kmala added the LGTM2 label Sep 2, 2016

mboersma added waiting for ci and removed awaiting review labels Sep 2, 2016

helgi removed the waiting for ci label Sep 2, 2016

helgi merged commit ad7fc55 into deis:master Sep 2, 2016

helgi deleted the ticket_1013 branch September 2, 2016 21:36

jchauncey mentioned this pull request Oct 11, 2016

Message with newlines in them get cut off in deis logs deis/logger#109

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(app): rollback all process types to previous version when one (or more) process type fails a deploy#1027

fix(app): rollback all process types to previous version when one (or more) process type fails a deploy#1027
helgi merged 1 commit intodeis:masterfrom
helgi:ticket_1013

helgi commented Aug 30, 2016

Uh oh!

deis-bot commented Aug 30, 2016

Uh oh!

codecov-io commented Aug 30, 2016 •

edited

Loading

Uh oh!

kmala Aug 31, 2016

Uh oh!

helgi Aug 31, 2016

Uh oh!

bacongobbler Aug 31, 2016 •

edited

Loading

Uh oh!

helgi Sep 1, 2016

Uh oh!

kmala Sep 1, 2016 •

edited

Loading

Uh oh!

helgi Sep 1, 2016

Uh oh!

bacongobbler commented Sep 2, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

helgi commented Aug 30, 2016

Test Plan

Uh oh!

deis-bot commented Aug 30, 2016

Uh oh!

codecov-io commented Aug 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current coverage is 87.09% (diff: 70.00%)

Uh oh!

kmala Aug 31, 2016

Choose a reason for hiding this comment

Uh oh!

helgi Aug 31, 2016

Choose a reason for hiding this comment

Uh oh!

bacongobbler Aug 31, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

helgi Sep 1, 2016

Choose a reason for hiding this comment

Uh oh!

kmala Sep 1, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

helgi Sep 1, 2016

Choose a reason for hiding this comment

Uh oh!

bacongobbler commented Sep 2, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

codecov-io commented Aug 30, 2016 •

edited

Loading

bacongobbler Aug 31, 2016 •

edited

Loading

kmala Sep 1, 2016 •

edited

Loading