Skip to content
This repository has been archived by the owner on May 6, 2020. It is now read-only.

fix(healthcheck): check if the healthchecks are failing on a new deploy #1036

Merged
merged 1 commit into from
Sep 6, 2016

Conversation

kmala
Copy link
Contributor

@kmala kmala commented Sep 2, 2016

fixes #989

Testing Instructions:

  1. deploy an app(example-go) and make sure its working
  2. now set an readiness healthcheck such that it fails deis healthchecks:set readiness httpGet 2000 --type web so that the new deploy is failed
    3)without this pr changes the command should execute fine after this PR changes the deploy should rollback to previous release.

@deis-bot
Copy link

deis-bot commented Sep 2, 2016

@helgi is a potential reviewer of this pull request based on my analysis of git blame information. Thanks @kmala!

@kmala kmala self-assigned this Sep 2, 2016
@kmala kmala added this to the v2.5 milestone Sep 2, 2016
@helgi
Copy link
Contributor

helgi commented Sep 2, 2016

Don't deployments handle exactly this? They are supposed to not put a pod live unless liveness and readiness pass, by default if either of those is missing the respective type is true (as in passed)

@helgi
Copy link
Contributor

helgi commented Sep 2, 2016

The change looks good, just am baffled that other checks didn't get it

@kmala
Copy link
Contributor Author

kmala commented Sep 2, 2016

yes...deployments catch this and the pod will be in 0/1 state which means it is running but not ready and we don't consider this as failure because of which the secrets and env of the previous release gets deleted and hence there will be 2 releases but nothing active.

@helgi
Copy link
Contributor

helgi commented Sep 2, 2016

Ah I see, so it goes through the full timeout and bails out that way?

I did make it so that we keep as many secrets as available RS, so that code may be broken, as well. Let's talk about tomorrow, if I don't get to it first

@helgi helgi added the LGTM1 label Sep 2, 2016
@@ -346,3 +346,6 @@ def wait_until_ready(self, namespace, name, **kwargs):

waited += 1
time.sleep(1)
ready, _ = self.are_replicas_ready(namespace, name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment and separate it a bit from the for loop.

My only reservation is so far is that we have to wait until the timeout is done, which for some people can be a long time and the connection can be severed, tho at the same time if that is the case our system wouldn't go ahead with any cleanups either so the problem wouldn't be around

@codecov-io
Copy link

codecov-io commented Sep 2, 2016

Current coverage is 86.68% (diff: 11.76%)

Merging #1036 into master will decrease coverage by 0.51%

@@             master      #1036   diff @@
==========================================
  Files            42         42          
  Lines          3540       3590    +50   
  Methods           0          0          
  Messages          0          0          
  Branches        597        610    +13   
==========================================
+ Hits           3087       3112    +25   
- Misses          295        316    +21   
- Partials        158        162     +4   

Powered by Codecov. Last update a303f25...7740fc0

if event['reason'] == 'Unhealthy':
# strip out whitespaces on either side
message = "\n".join([x.strip() for x in event['message'].split("\n")])
if message:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want this indented one level lower so it's checked after the for loop, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no...i kept it inside the for loop so it stops once it sees the error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better, why not raise as soon as you have a message inside the first if statement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically I'm trying to figure out when this function should raise KubeException:

  • after all pods have been checked for errors
  • after one pod's events is unhealthy
  • after a single event that is unhealthy

Which situation is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function gets called after all the pods are checked for errors and we waited for the deploy time+healthcheck wait time and the pod/s is in running state but not ready state

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deployment errors can lead to wrong pods being deployed
6 participants