Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add liveness and readiness probes to workers #520

Closed
wants to merge 1 commit into from

Conversation

mtparet
Copy link

@mtparet mtparet commented Feb 15, 2022

What issues does your PR fix?

  • Worker can stop to work and no liveness probes will restart it.

What does your PR do?

Add liveness and readiness probes.

Checklist

For all Pull Requests

For releasing ONLY

Copy link
Member

@thesuperzapper thesuperzapper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mtparet for the PR!

The idea of adding a probe to the workers has always been on the back of my mind.

I have left some comments, and I also note that the CI tests are failing.

@@ -162,7 +162,39 @@ spec:
{{- else }}
- "exec airflow celery worker"
{{- end }}
{{- if $volumeMounts }}
{{- if .Values.worker.livenessProbe.enabled }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have made a typo in the value, it should be workers.{...} not worker.{...} (with an "s").

(This is what is causing the CI to fail)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I saw this... thanks I will fix this.

- celery@${HOSTNAME}
{{- end }}
{{- if .Values.worker.readinessProbe.enabled }}
readinessProbe:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it makes sense to include a readinessProbe on a worker, as workers don't serve any traffic (in an HTTP service sense).

Read more about "readiness probes" in the official docs.

NOTE: I still like the idea of a livenessProbe to restart unhealthy workers

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right this will not help on the orchestration side as there is no any http traffic.

But from an "human" monitoring side, it will be nice to see when the worker is not working so marked as "not ready" at the pod level.

Copy link
Member

@thesuperzapper thesuperzapper Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtparet I think it's best to remove it. Note, a pod with a failing liveness probe will be marked as "unhealthy".

enabled: true
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 15 seconds enough to definitely have the celery ... command return?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested in our environment and it was taking 4-5 seconds but we could increase this to be more resilient.

timeoutSeconds: {{ .Values.worker.livenessProbe.timeoutSeconds }}
failureThreshold: {{ .Values.worker.livenessProbe.failureThreshold }}
exec:
command:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions about the proposed probe command:

  1. Will this exact command work on both airflow 1.10.12+ AND 2.X.X?
  2. Can you explain what exactly this is checking for, and under what situations it will return non-0 exit code?
  3. Is the celery app always called airflow.executors.celery_executor.app?

@thesuperzapper thesuperzapper changed the title fix: Add liveness and readiness probes to worker feat: add liveness and readiness probes to workers Feb 17, 2022
Signed-off-by: Matthieu Paret <matthieu.paret@lifen.fr>
@thesuperzapper thesuperzapper added this to PR | Needs Review in Issue Triage and PR Tracking Mar 22, 2022
@thesuperzapper thesuperzapper added this to the airflow-8.7.0 milestone Mar 22, 2022
@thesuperzapper thesuperzapper moved this from PR | Needs Review to PR | Needs Discussion in Issue Triage and PR Tracking Mar 22, 2022
@thesuperzapper thesuperzapper moved this from PR | Needs Discussion to PR | Needs Review in Issue Triage and PR Tracking Mar 23, 2022
@thesuperzapper
Copy link
Member

Closing in favor of #766

Issue Triage and PR Tracking automation moved this from PR | Needs Review to Done Aug 29, 2023
@thesuperzapper thesuperzapper removed this from the airflow-8.7.2 milestone Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants