fix(healthcheck): check if the healthchecks are failing on a new deploy #1036

kmala · 2016-09-02T02:39:11Z

fixes #989

Testing Instructions:

deploy an app(example-go) and make sure its working
now set an readiness healthcheck such that it fails deis healthchecks:set readiness httpGet 2000 --type web so that the new deploy is failed
3)without this pr changes the command should execute fine after this PR changes the deploy should rollback to previous release.

deis-bot · 2016-09-02T02:39:12Z

@helgi is a potential reviewer of this pull request based on my analysis of git blame information. Thanks @kmala!

helgi · 2016-09-02T02:41:31Z

Don't deployments handle exactly this? They are supposed to not put a pod live unless liveness and readiness pass, by default if either of those is missing the respective type is true (as in passed)

helgi · 2016-09-02T02:46:56Z

The change looks good, just am baffled that other checks didn't get it

kmala · 2016-09-02T03:14:21Z

yes...deployments catch this and the pod will be in 0/1 state which means it is running but not ready and we don't consider this as failure because of which the secrets and env of the previous release gets deleted and hence there will be 2 releases but nothing active.

helgi · 2016-09-02T03:18:16Z

Ah I see, so it goes through the full timeout and bails out that way?

I did make it so that we keep as many secrets as available RS, so that code may be broken, as well. Let's talk about tomorrow, if I don't get to it first

helgi · 2016-09-02T15:42:00Z

rootfs/scheduler/resources/deployment.py

@@ -346,3 +346,6 @@ def wait_until_ready(self, namespace, name, **kwargs):

            waited += 1
            time.sleep(1)
+        ready, _ = self.are_replicas_ready(namespace, name)


Can you add a comment and separate it a bit from the for loop.

My only reservation is so far is that we have to wait until the timeout is done, which for some people can be a long time and the connection can be severed, tho at the same time if that is the case our system wouldn't go ahead with any cleanups either so the problem wouldn't be around

codecov-io · 2016-09-02T16:07:40Z

Current coverage is 86.68% (diff: 11.76%)

Merging #1036 into master will decrease coverage by 0.51%

@@             master      #1036   diff @@
==========================================
  Files            42         42          
  Lines          3540       3590    +50   
  Methods           0          0          
  Messages          0          0          
  Branches        597        610    +13   
==========================================
+ Hits           3087       3112    +25   
- Misses          295        316    +21   
- Partials        158        162     +4

Powered by Codecov. Last update a303f25...7740fc0

bacongobbler · 2016-09-02T21:34:22Z

rootfs/scheduler/resources/pod.py

+                if event['reason'] == 'Unhealthy':
+                    # strip out whitespaces on either side
+                    message = "\n".join([x.strip() for x in event['message'].split("\n")])
+                if message:


I think you want this indented one level lower so it's checked after the for loop, correct?

no...i kept it inside the for loop so it stops once it sees the error.

Even better, why not raise as soon as you have a message inside the first if statement?

basically I'm trying to figure out when this function should raise KubeException:

after all pods have been checked for errors

after one pod's events is unhealthy

after a single event that is unhealthy

Which situation is it?

this function gets called after all the pods are checked for errors and we waited for the deploy time+healthcheck wait time and the pod/s is in running state but not ready state

deis-admin added the in progress label Sep 2, 2016

kmala self-assigned this Sep 2, 2016

kmala added this to the v2.5 milestone Sep 2, 2016

kmala added awaiting review and removed in progress labels Sep 2, 2016

helgi added the LGTM1 label Sep 2, 2016

helgi reviewed Sep 2, 2016
View reviewed changes

kmala force-pushed the bug branch from b71eaae to fdc17e5 Compare September 2, 2016 16:07

kmala mentioned this pull request Sep 2, 2016

fix(healthcheck): use correct port the image exposes deis/workflow-e2e#320

Merged

bacongobbler reviewed Sep 2, 2016
View reviewed changes

kmala force-pushed the bug branch from fdc17e5 to 7740fc0 Compare September 2, 2016 21:38

bacongobbler added LGTM2 needs rebase waiting for ci and removed awaiting review labels Sep 2, 2016

fix(healthcheck): check if the healthchecks are failing on a new deploy

ffc9f8c

kmala force-pushed the bug branch from 7740fc0 to ffc9f8c Compare September 6, 2016 16:39

kmala removed needs rebase waiting for ci labels Sep 6, 2016

kmala merged commit 63fff81 into deis:master Sep 6, 2016

kmala deleted the bug branch September 6, 2016 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(healthcheck): check if the healthchecks are failing on a new deploy #1036

fix(healthcheck): check if the healthchecks are failing on a new deploy #1036

kmala commented Sep 2, 2016

deis-bot commented Sep 2, 2016

helgi commented Sep 2, 2016

helgi commented Sep 2, 2016

kmala commented Sep 2, 2016

helgi commented Sep 2, 2016

helgi Sep 2, 2016

codecov-io commented Sep 2, 2016 •

edited

Loading

bacongobbler Sep 2, 2016

kmala Sep 2, 2016

helgi Sep 2, 2016

bacongobbler Sep 2, 2016

kmala Sep 2, 2016

kmala Sep 2, 2016

fix(healthcheck): check if the healthchecks are failing on a new deploy #1036

fix(healthcheck): check if the healthchecks are failing on a new deploy #1036

Conversation

kmala commented Sep 2, 2016

deis-bot commented Sep 2, 2016

helgi commented Sep 2, 2016

helgi commented Sep 2, 2016

kmala commented Sep 2, 2016

helgi commented Sep 2, 2016

helgi Sep 2, 2016

Choose a reason for hiding this comment

codecov-io commented Sep 2, 2016 • edited Loading

Current coverage is 86.68% (diff: 11.76%)

bacongobbler Sep 2, 2016

Choose a reason for hiding this comment

kmala Sep 2, 2016

Choose a reason for hiding this comment

helgi Sep 2, 2016

Choose a reason for hiding this comment

bacongobbler Sep 2, 2016

Choose a reason for hiding this comment

kmala Sep 2, 2016

Choose a reason for hiding this comment

kmala Sep 2, 2016

Choose a reason for hiding this comment

codecov-io commented Sep 2, 2016 •

edited

Loading