-
Notifications
You must be signed in to change notification settings - Fork 53
fix(healthcheck): check if the healthchecks are failing on a new deploy #1036
Conversation
Don't deployments handle exactly this? They are supposed to not put a pod live unless liveness and readiness pass, by default if either of those is missing the respective type is true (as in passed) |
The change looks good, just am baffled that other checks didn't get it |
yes...deployments catch this and the pod will be in |
Ah I see, so it goes through the full timeout and bails out that way? I did make it so that we keep as many secrets as available RS, so that code may be broken, as well. Let's talk about tomorrow, if I don't get to it first |
@@ -346,3 +346,6 @@ def wait_until_ready(self, namespace, name, **kwargs): | |||
|
|||
waited += 1 | |||
time.sleep(1) | |||
ready, _ = self.are_replicas_ready(namespace, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment and separate it a bit from the for loop.
My only reservation is so far is that we have to wait until the timeout is done, which for some people can be a long time and the connection can be severed, tho at the same time if that is the case our system wouldn't go ahead with any cleanups either so the problem wouldn't be around
Current coverage is 86.68% (diff: 11.76%)@@ master #1036 diff @@
==========================================
Files 42 42
Lines 3540 3590 +50
Methods 0 0
Messages 0 0
Branches 597 610 +13
==========================================
+ Hits 3087 3112 +25
- Misses 295 316 +21
- Partials 158 162 +4
|
if event['reason'] == 'Unhealthy': | ||
# strip out whitespaces on either side | ||
message = "\n".join([x.strip() for x in event['message'].split("\n")]) | ||
if message: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want this indented one level lower so it's checked after the for
loop, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no...i kept it inside the for loop so it stops once it sees the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better, why not raise as soon as you have a message inside the first if statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically I'm trying to figure out when this function should raise KubeException:
- after all pods have been checked for errors
- after one pod's events is unhealthy
- after a single event that is unhealthy
Which situation is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function gets called after all the pods are checked for errors and we waited for the deploy time+healthcheck wait time and the pod/s is in running state but not ready state
fixes #989
Testing Instructions:
deis healthchecks:set readiness httpGet 2000 --type web
so that the new deploy is failed3)without this pr changes the command should execute fine after this PR changes the deploy should rollback to previous release.