-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NEW] Add healthchecks in OpenShift templates #7184
[NEW] Add healthchecks in OpenShift templates #7184
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jfchevrette thanks for opening this. No changes necessarily requested. But for sure would like to hear thoughts on this.
"timeoutSeconds": 1, | ||
"periodSeconds": 10, | ||
"successThreshold": 1, | ||
"failureThreshold": 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So on larger servers it takes longer for instances to start up. Might need to add some variation in here.
I'm actually struggling with this a bit my self. Our demo server takes 3-4 minutes on initial load because of all of the users and channels it has to load into cache. Also a bit depending on hardware underneath.
when I threw our demo server on k8s I had to actually adjust the readiness check to something like:
"livenessProbe": {
"failureThreshold": 3,
"httpGet": {
"path": "/api/info",
"port": 3000,
"scheme": "HTTP"
},
"initialDelaySeconds": 300,
"periodSeconds": 10,
"successThreshold": 1,
"timeoutSeconds": 1
},
"readinessProbe": {
"failureThreshold": 32,
"httpGet": {
"path": "/api/info",
"port": 3000,
"scheme": "HTTP"
},
"initialDelaySeconds": 10,
"periodSeconds": 10,
"successThreshold": 1,
"timeoutSeconds": 1
},
My thoughts here were readiness probe needs to be agressive. We want to serve traffic to the instance as fast as possible. But pushing the liveness check back a bit to keep it from killing an instance that is still in the process of starting up.
What are your thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geekgonecrazy I didn't know it is 'normal' for RocketChat to take many seconds (minutes??) to start on larger servers. If this is indeed the norm, I agree that the numbers I submitted in my PR are too low (they are the kubernetes defaults).
The readiness probe is the one that checks if the pod is ready on startup. It is common to see a larger delay on this one because of situations like the one you describe. That would be the one we would increase. The liveness probe ensures the app passes a check every X seconds to ensure it hasn't transited into a bad state.
I assume that on small-medium sized instances, the startup time is relatively short. I'm not sure if 300 seconds is a sane default for most people. Then if you manage a very large instance on kubernetes I would assume that you know you can increase these numbers. How about a delay of 15 seconds? Would that be suitable for most people?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jfchevrette I've been digging in the documentation recently and couldn't tell does the liveness check wait to start until after readiness check?
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
If it does wait, the defaults for liveness are probably just fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jfchevrette did a little verifying. So it does not wait on readiness check to pass. Which is a bit unfortunate. In the future might be able to substitute this for a command in the container that would know that it was still in the process of starting. I'll go ahead and approve this and we can tweak this as time goes on based on feedback.
Thanks!
@RocketChat/core
This adds healthchecks to the OpenShift app templates.
Also fixes a couple of JSON indentation/formatting details in the template files.