Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] Add healthchecks in OpenShift templates #7184

Merged

Conversation

jfchevrette
Copy link
Contributor

@RocketChat/core

This adds healthchecks to the OpenShift app templates.

Also fixes a couple of JSON indentation/formatting details in the template files.

Copy link
Contributor

@geekgonecrazy geekgonecrazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfchevrette thanks for opening this. No changes necessarily requested. But for sure would like to hear thoughts on this.

"timeoutSeconds": 1,
"periodSeconds": 10,
"successThreshold": 1,
"failureThreshold": 3
Copy link
Contributor

@geekgonecrazy geekgonecrazy Jun 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So on larger servers it takes longer for instances to start up. Might need to add some variation in here.

I'm actually struggling with this a bit my self. Our demo server takes 3-4 minutes on initial load because of all of the users and channels it has to load into cache. Also a bit depending on hardware underneath.

when I threw our demo server on k8s I had to actually adjust the readiness check to something like:

                        "livenessProbe": {
                            "failureThreshold": 3,
                            "httpGet": {
                                "path": "/api/info",
                                "port": 3000,
                                "scheme": "HTTP"
                            },
                            "initialDelaySeconds": 300,
                            "periodSeconds": 10,
                            "successThreshold": 1,
                            "timeoutSeconds": 1
                        },
                        "readinessProbe": {
                            "failureThreshold": 32,
                            "httpGet": {
                                "path": "/api/info",
                                "port": 3000,
                                "scheme": "HTTP"
                            },
                            "initialDelaySeconds": 10,
                            "periodSeconds": 10,
                            "successThreshold": 1,
                            "timeoutSeconds": 1
                        },

My thoughts here were readiness probe needs to be agressive. We want to serve traffic to the instance as fast as possible. But pushing the liveness check back a bit to keep it from killing an instance that is still in the process of starting up.

What are your thoughts here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geekgonecrazy I didn't know it is 'normal' for RocketChat to take many seconds (minutes??) to start on larger servers. If this is indeed the norm, I agree that the numbers I submitted in my PR are too low (they are the kubernetes defaults).

The readiness probe is the one that checks if the pod is ready on startup. It is common to see a larger delay on this one because of situations like the one you describe. That would be the one we would increase. The liveness probe ensures the app passes a check every X seconds to ensure it hasn't transited into a bad state.

I assume that on small-medium sized instances, the startup time is relatively short. I'm not sure if 300 seconds is a sane default for most people. Then if you manage a very large instance on kubernetes I would assume that you know you can increase these numbers. How about a delay of 15 seconds? Would that be suitable for most people?

Copy link
Contributor

@geekgonecrazy geekgonecrazy Jun 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfchevrette I've been digging in the documentation recently and couldn't tell does the liveness check wait to start until after readiness check?

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/

If it does wait, the defaults for liveness are probably just fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfchevrette did a little verifying. So it does not wait on readiness check to pass. Which is a bit unfortunate. In the future might be able to substitute this for a command in the container that would know that it was still in the process of starting. I'll go ahead and approve this and we can tweak this as time goes on based on feedback.

Thanks!

@geekgonecrazy geekgonecrazy merged commit 02c0d94 into RocketChat:develop Jun 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants