Improve fluentd liveness probe #343

vsinghal13 · 2019-12-19T00:35:09Z

Description

Add new liveness probe to fluentd deployment.

httpGet:
    path: /fluentd.pod.healthcheck?json=%7B%22log%22%3A+%22health+check%22%7D
    port: 9880

The rationale is that if Fluentd can accept log messages, it must be healthy.
The endpoint itself results in a new fluentd tag fluentd.pod-healthcheck
The query parameter in the URL defines a URL-encoded JSON object that looks like this:
{"log": "health check"}

Testing performed

ci/build.sh
Redeploy fluentd and fluentd-events pods
Confirm events, logs, and metrics are coming in

rvmiller89 · 2019-12-19T19:20:14Z

deploy/helm/sumologic/conf/logs/logs.conf

@@ -3,5 +3,10 @@
  port 24321
  bind 0.0.0.0
 </source>
+<source>


I'm confused why only logs.conf gets this liveness probe. Should this be moved to common.conf ?

rvmiller89 · 2019-12-19T19:21:05Z

deploy/helm/sumologic/templates/deployment.yaml

-            - "[ $(pgrep ruby | wc -l) -gt 0 ]"
+          httpGet:
+            path: /fluentd.pod.healthcheck?json=%7B%22log%22%3A+%22health+check%22%7D
+            port: 9880
          initialDelaySeconds: 300


this seems very generous, do we need to wait 5 minutes before starting the liveness probe?

Currently we do have wait of 5 minutes, so I kept that. It is recommended to be generous in that otherwise the pod might end up in infinite loop of being restarted during startup.

5min does sound a little high, but the readiness probe will fail first and the pod will stop accepting data, and if we need more replicas in between that time then with autoscaler enabled I think we'll be okay. So I'm okay with leaving it at 5min

deploy/helm/sumologic/templates/deployment.yaml

…hub.com/SumoLogic/sumologic-kubernetes-collection into vsinghal-improve-fluentd-liveness-probe

frankreno

LGTM, but any reason we would not also want to do this on the event deployment? The same logic holds true I believe.

rvmiller89 · 2020-01-06T17:34:17Z

+1 for doing on events deployment, missed that part.

…hub.com/SumoLogic/sumologic-kubernetes-collection into vsinghal-improve-fluentd-liveness-probe

vsinghal13 and others added 3 commits December 18, 2019 16:27

Add new liveness probe for fluentd

eac89fb

add http 9880 source to logs.conf for health checking

7114f12

Generate new 'fluentd-sumologic.yaml.tmpl'

3fea72c

vsinghal13 requested review from rvmiller89, maimaisie, samjsong and frankreno December 19, 2019 00:46

rvmiller89 reviewed Dec 19, 2019

View reviewed changes

vsinghal13 and others added 5 commits December 19, 2019 13:44

move the http input to common.conf

0d7784e

Generate new 'fluentd-sumologic.yaml.tmpl'

ed009c8

change readiness probe as well

a76fb32

Merge branch 'vsinghal-improve-fluentd-liveness-probe' of https://git…

8e65962

…hub.com/SumoLogic/sumologic-kubernetes-collection into vsinghal-improve-fluentd-liveness-probe

Generate new 'fluentd-sumologic.yaml.tmpl'

874db19

rvmiller89 approved these changes Jan 2, 2020

View reviewed changes

samjsong approved these changes Jan 2, 2020

View reviewed changes

frankreno approved these changes Jan 6, 2020

View reviewed changes

vsinghal13 and others added 3 commits January 6, 2020 10:45

change probes for events deployment as well

7e3f884

Merge branch 'vsinghal-improve-fluentd-liveness-probe' of https://git…

4f3c8cc

…hub.com/SumoLogic/sumologic-kubernetes-collection into vsinghal-improve-fluentd-liveness-probe

Generate new 'fluentd-sumologic.yaml.tmpl'

877da28

vsinghal13 merged commit 81012f7 into master Jan 6, 2020

vsinghal13 deleted the vsinghal-improve-fluentd-liveness-probe branch January 6, 2020 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve fluentd liveness probe #343

Improve fluentd liveness probe #343

vsinghal13 commented Dec 19, 2019 •

edited

rvmiller89 Dec 19, 2019

rvmiller89 Dec 19, 2019

vsinghal13 Dec 19, 2019

samjsong Dec 19, 2019

frankreno left a comment

rvmiller89 commented Jan 6, 2020

Improve fluentd liveness probe #343

Improve fluentd liveness probe #343

Conversation

vsinghal13 commented Dec 19, 2019 • edited

Description

Testing performed

rvmiller89 Dec 19, 2019

Choose a reason for hiding this comment

rvmiller89 Dec 19, 2019

Choose a reason for hiding this comment

vsinghal13 Dec 19, 2019

Choose a reason for hiding this comment

samjsong Dec 19, 2019

Choose a reason for hiding this comment

frankreno left a comment

Choose a reason for hiding this comment

rvmiller89 commented Jan 6, 2020

vsinghal13 commented Dec 19, 2019 •

edited