[NEW] Add healthchecks in OpenShift templates #7184

jfchevrette · 2017-06-07T13:09:08Z

@RocketChat/core

This adds healthchecks to the OpenShift app templates.

Also fixes a couple of JSON indentation/formatting details in the template files.

geekgonecrazy

@jfchevrette thanks for opening this. No changes necessarily requested. But for sure would like to hear thoughts on this.

geekgonecrazy · 2017-06-08T17:18:36Z

.openshift/rocket-chat-ephemeral.json

+                                    "timeoutSeconds": 1,
+                                    "periodSeconds": 10,
+                                    "successThreshold": 1,
+                                    "failureThreshold": 3


So on larger servers it takes longer for instances to start up. Might need to add some variation in here.

I'm actually struggling with this a bit my self. Our demo server takes 3-4 minutes on initial load because of all of the users and channels it has to load into cache. Also a bit depending on hardware underneath.

when I threw our demo server on k8s I had to actually adjust the readiness check to something like:

"livenessProbe": { "failureThreshold": 3, "httpGet": { "path": "/api/info", "port": 3000, "scheme": "HTTP" }, "initialDelaySeconds": 300, "periodSeconds": 10, "successThreshold": 1, "timeoutSeconds": 1 }, "readinessProbe": { "failureThreshold": 32, "httpGet": { "path": "/api/info", "port": 3000, "scheme": "HTTP" }, "initialDelaySeconds": 10, "periodSeconds": 10, "successThreshold": 1, "timeoutSeconds": 1 },

My thoughts here were readiness probe needs to be agressive. We want to serve traffic to the instance as fast as possible. But pushing the liveness check back a bit to keep it from killing an instance that is still in the process of starting up.

What are your thoughts here?

@geekgonecrazy I didn't know it is 'normal' for RocketChat to take many seconds (minutes??) to start on larger servers. If this is indeed the norm, I agree that the numbers I submitted in my PR are too low (they are the kubernetes defaults).

The readiness probe is the one that checks if the pod is ready on startup. It is common to see a larger delay on this one because of situations like the one you describe. That would be the one we would increase. The liveness probe ensures the app passes a check every X seconds to ensure it hasn't transited into a bad state.

I assume that on small-medium sized instances, the startup time is relatively short. I'm not sure if 300 seconds is a sane default for most people. Then if you manage a very large instance on kubernetes I would assume that you know you can increase these numbers. How about a delay of 15 seconds? Would that be suitable for most people?

@jfchevrette I've been digging in the documentation recently and couldn't tell does the liveness check wait to start until after readiness check?

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/

If it does wait, the defaults for liveness are probably just fine.

@jfchevrette did a little verifying. So it does not wait on readiness check to pass. Which is a bit unfortunate. In the future might be able to substitute this for a command in the container that would know that it was still in the process of starting. I'll go ahead and approve this and we can tweak this as time goes on based on feedback.

Thanks!

add healthchecks + formatting

5fb7955

geekgonecrazy requested changes Jun 8, 2017

View reviewed changes

geekgonecrazy approved these changes Jun 9, 2017

View reviewed changes

geekgonecrazy merged commit 02c0d94 into RocketChat:develop Jun 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NEW] Add healthchecks in OpenShift templates #7184

[NEW] Add healthchecks in OpenShift templates #7184

jfchevrette commented Jun 7, 2017

geekgonecrazy left a comment

geekgonecrazy Jun 8, 2017 •

edited

Loading

jfchevrette Jun 8, 2017

geekgonecrazy Jun 8, 2017 •

edited

Loading

geekgonecrazy Jun 9, 2017

[NEW] Add healthchecks in OpenShift templates #7184

[NEW] Add healthchecks in OpenShift templates #7184

Conversation

jfchevrette commented Jun 7, 2017

geekgonecrazy left a comment

Choose a reason for hiding this comment

geekgonecrazy Jun 8, 2017 • edited Loading

Choose a reason for hiding this comment

jfchevrette Jun 8, 2017

Choose a reason for hiding this comment

geekgonecrazy Jun 8, 2017 • edited Loading

Choose a reason for hiding this comment

geekgonecrazy Jun 9, 2017

Choose a reason for hiding this comment

geekgonecrazy Jun 8, 2017 •

edited

Loading

geekgonecrazy Jun 8, 2017 •

edited

Loading