Skip to content
This repository was archived by the owner on May 6, 2020. It is now read-only.

fix(app): rollback all process types to previous version when one (or more) process type fails a deploy#1027

Merged
helgi merged 1 commit intodeis:masterfrom
helgi:ticket_1013
Sep 2, 2016
Merged

fix(app): rollback all process types to previous version when one (or more) process type fails a deploy#1027
helgi merged 1 commit intodeis:masterfrom
helgi:ticket_1013

Conversation

@helgi
Copy link
Copy Markdown
Contributor

@helgi helgi commented Aug 30, 2016

Previously only the failed process type would roll itself back, resulting in a scenario where worker may be on v6 and web on v5

This moves away from using the built in rollback functionality in Deployments and rather deploys the previous release again to get the same effect.
Rollback in Deployments was taking it to the last good known deployment, in this case that'd be the latest for some types... knowing the revision of the last deploy would be hard as well.

The way it works when deploying the old release is an identical replicaset (and thus identical template pod hash is generated) so things will very much so look like a native rollback.
If the Controller has done any changes to how it constructs the pod manifest then this could generate a totally new ReplicaSet but that is also fine as it will be booting the previous release from DB

Fixes #1013

Test Plan

  1. deis create --no-remote test-rollback
  2. deis pull deis/example-go -a test-rollback
  3. deis config:set foo=bar -a test-rollback
  4. deis pull deis/example-foo -a test-rollback

There should be an error message along the lines of (app::deploy): There was a problem deploying v6. Rolling back process types to release v5.\nThere was a problem while deploying v6 of test-rollback-cmd. Additional information:\nError: image deis/example-foo:latest not found

Running deis logs -a test-rollback should show some of that, including There was a problem deploying v6. Rolling back process types to release v5 twice. That's okay and is due to deis/logger#109

The first one is when the internal rollback starts and the second one is when that message is combined with the information returned back to the user

@helgi helgi added this to the v2.5 milestone Aug 30, 2016
@helgi helgi self-assigned this Aug 30, 2016
@deis-bot
Copy link
Copy Markdown

@kmala, @bacongobbler and @mboersma are potential reviewers of this pull request based on my analysis of git blame information. Thanks @helgi!

@helgi helgi force-pushed the ticket_1013 branch 2 times, most recently from e29b5a6 to 164569a Compare August 30, 2016 22:51
@codecov-io
Copy link
Copy Markdown

codecov-io commented Aug 30, 2016

Current coverage is 87.09% (diff: 70.00%)

Merging #1027 into master will increase coverage by 0.02%

@@             master      #1027   diff @@
==========================================
  Files            42         42          
  Lines          3567       3573     +6   
  Methods           0          0          
  Messages          0          0          
  Branches        602        603     +1   
==========================================
+ Hits           3106       3112     +6   
  Misses          301        301          
  Partials        160        160          

Powered by Codecov. Last update 3f3a228...a45643c

Comment thread rootfs/api/models/app.py Outdated
# This goes in the log before the rollback starts
self.log(err, logging.ERROR)
# revert all process types to old release
self.deploy(release.previous(), force_deploy=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is an api exception/unavailability doesn't this call deploy in recursion?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will recurse all the way until it finds a good deploy, and the last one would be no build which we bail out on at the top, or you hit the 100 recursion limit (or so) in python.

Copy link
Copy Markdown
Member

@bacongobbler bacongobbler Aug 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. this code could technically roll back all the way when the controller is refusing connections like with #1019 and raise a NotExists error when it's at v1 and previous() raises that NotExists.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth letting it go as far as possible back when it is doing a rollback but another thought would be to have a function arg that says "stop on failure", so we'd let it rollback one level and that's it

Copy link
Copy Markdown
Contributor

@kmala kmala Sep 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should just rollback to the previous release as that is what user expects as it has to be successful release otherwise we are doing deploy wrong.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I will implement my idea as that only goes to the previous release by only setting stop_on_failure=True when we hit the rollback

@helgi helgi force-pushed the ticket_1013 branch 4 times, most recently from 1f3127a to 3a3ae0a Compare September 1, 2016 23:31
@bacongobbler
Copy link
Copy Markdown
Member

Haven't tested this, but code LGTM.

… more) process type fails a deploy

Previously only the failed process type would roll itself back, resulting in a scenario where worker may be on v6 and web on v5

This moves away from using the built in rollback functionality in Deployments and rather deploys the previous release again to get the same effect.
Rollback in Deployments was taking it to the last good known deployment, in this case that'd be the latest for some types... knowing the revision of the last deploy would be hard as well.

The way it works when deploying the old release is an identical replicaset (and thus identical template pod hash is generated) so things will very much so look like a native rollback.
If the Controller has done any changes to how it constructs the pod manifest then this could generate a totally new ReplicaSet but that is also fine as it will be booting the previous release from DB

Fixes deis#1013
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants