New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve fluentd liveness probe #343
Conversation
@@ -3,5 +3,10 @@ | |||
port 24321 | |||
bind 0.0.0.0 | |||
</source> | |||
<source> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused why only logs.conf
gets this liveness probe. Should this be moved to common.conf
?
- "[ $(pgrep ruby | wc -l) -gt 0 ]" | ||
httpGet: | ||
path: /fluentd.pod.healthcheck?json=%7B%22log%22%3A+%22health+check%22%7D | ||
port: 9880 | ||
initialDelaySeconds: 300 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems very generous, do we need to wait 5 minutes before starting the liveness probe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we do have wait of 5 minutes, so I kept that. It is recommended to be generous in that otherwise the pod might end up in infinite loop of being restarted during startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5min does sound a little high, but the readiness probe will fail first and the pod will stop accepting data, and if we need more replicas in between that time then with autoscaler enabled I think we'll be okay. So I'm okay with leaving it at 5min
…hub.com/SumoLogic/sumologic-kubernetes-collection into vsinghal-improve-fluentd-liveness-probe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but any reason we would not also want to do this on the event deployment? The same logic holds true I believe.
+1 for doing on events deployment, missed that part. |
Description
Add new liveness probe to fluentd deployment.
The rationale is that if Fluentd can accept log messages, it must be healthy.
The endpoint itself results in a new fluentd tag
fluentd.pod-healthcheck
The query parameter in the URL defines a URL-encoded JSON object that looks like this:
{"log": "health check"}
Testing performed