-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add liveness and readiness probes to workers #520
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mtparet for the PR!
The idea of adding a probe to the workers has always been on the back of my mind.
I have left some comments, and I also note that the CI tests are failing.
@@ -162,7 +162,39 @@ spec: | |||
{{- else }} | |||
- "exec airflow celery worker" | |||
{{- end }} | |||
{{- if $volumeMounts }} | |||
{{- if .Values.worker.livenessProbe.enabled }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have made a typo in the value, it should be workers.{...}
not worker.{...}
(with an "s").
(This is what is causing the CI to fail)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I saw this... thanks I will fix this.
- celery@${HOSTNAME} | ||
{{- end }} | ||
{{- if .Values.worker.readinessProbe.enabled }} | ||
readinessProbe: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure it makes sense to include a readinessProbe
on a worker, as workers don't serve any traffic (in an HTTP service sense).
Read more about "readiness probes" in the official docs.
NOTE: I still like the idea of a livenessProbe
to restart unhealthy workers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right this will not help on the orchestration side as there is no any http traffic.
But from an "human" monitoring side, it will be nice to see when the worker is not working so marked as "not ready" at the pod level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtparet I think it's best to remove it. Note, a pod with a failing liveness probe will be marked as "unhealthy".
enabled: true | ||
initialDelaySeconds: 10 | ||
periodSeconds: 30 | ||
timeoutSeconds: 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 15 seconds enough to definitely have the celery ...
command return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested in our environment and it was taking 4-5 seconds but we could increase this to be more resilient.
timeoutSeconds: {{ .Values.worker.livenessProbe.timeoutSeconds }} | ||
failureThreshold: {{ .Values.worker.livenessProbe.failureThreshold }} | ||
exec: | ||
command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions about the proposed probe command:
- Will this exact command work on both airflow
1.10.12+
AND2.X.X
? - Can you explain what exactly this is checking for, and under what situations it will return non-0 exit code?
- Is the celery app always called
airflow.executors.celery_executor.app
?
Signed-off-by: Matthieu Paret <matthieu.paret@lifen.fr>
Closing in favor of #766 |
What issues does your PR fix?
What does your PR do?
Add liveness and readiness probes.
Checklist
For all Pull Requests
For releasing ONLY