-
Notifications
You must be signed in to change notification settings - Fork 16.4k
feat(helm): Add full probe configuration support for Scheduler #60854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(helm): Add full probe configuration support for Scheduler #60854
Conversation
This change extends the scheduler pod's probe configuration to support the full Kubernetes probe specification, bringing it in line with other chart components like workers and webserver. Changes: - Extended livenessProbe schema to accept httpGet, tcpSocket, grpc handlers - Added readinessProbe support for scheduler (disabled by default) - Extended startupProbe schema with same flexibility - Added 'enabled' flag to all probes for fine-grained control - Updated scheduler-deployment.yaml template to dynamically render probes - Maintains backward compatibility with existing configurations Fixes apache#60134
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR extends the Airflow Helm chart's scheduler component to support the full Kubernetes probe specification, aligning it with other components like workers and webserver. Additionally, it includes a significant refactoring of the workers persistence configuration structure (not mentioned in the PR description).
Changes:
- Extended scheduler probe configurations (liveness, readiness, startup) to support all Kubernetes probe handlers (exec, httpGet, tcpSocket, grpc) with enabled flags for fine-grained control
- Added readinessProbe support for scheduler (disabled by default)
- Refactored workers.persistence configuration to workers.celery.persistence with deprecation of the old structure
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| chart/values.yaml | Added deprecation notices for workers.persistence, introduced workers.celery.persistence configuration, and extended scheduler probe configurations with full Kubernetes probe spec support including comments and examples |
| chart/values.schema.json | Updated schema to mark workers.persistence as deprecated, added workers.celery.persistence schema, and extended scheduler probe schemas to include all probe handler types and optional fields like successThreshold and terminationGracePeriodSeconds |
| chart/templates/scheduler/scheduler-deployment.yaml | Implemented backward compatibility logic for workers.persistence migration, updated probe rendering to support multiple handler types (httpGet, tcpSocket, grpc, exec) with conditional rendering based on enabled flags |
Comments suppressed due to low confidence (3)
chart/templates/scheduler/scheduler-deployment.yaml:28
- The backward compatibility logic for workers.celery.persistence.enabled has a flaw. When a user upgrades from an old chart version with 'workers.persistence.enabled: false' in their values, but hasn't added 'workers.celery.persistence' to their configuration, Helm will merge their old value with the new chart's default value for 'workers.celery.persistence.enabled: true'. The template will then evaluate the first part of the 'or' expression as true (using the new default), ignoring the user's original 'false' setting.
The 'has' check only helps when the value is nil/unset, but with a boolean default value defined in values.yaml line 1074, the field will never be nil. Consider using a pattern that checks if the old config should take precedence, such as checking if workers.persistence.enabled is explicitly set by examining if it differs from the default, or removing the default value for workers.celery.persistence.enabled to let it be truly optional during the migration period.
{{- $persistence := or .Values.workers.celery.persistence.enabled (and (not (has .Values.workers.celery.persistence.enabled (list true false))) .Values.workers.persistence.enabled) }}
chart/values.yaml:1099
- The PR title and description focus exclusively on scheduler probe configuration changes, but this PR also includes significant refactoring of the workers.persistence configuration structure (deprecating workers.persistence in favor of workers.celery.persistence). These persistence-related changes should either be documented in the PR description or separated into their own PR, as they represent a distinct feature change beyond the probe configuration improvements. This affects multiple files and introduces backward compatibility considerations that deserve explicit discussion.
livenessProbe:
enabled: true
initialDelaySeconds: 10
timeoutSeconds: 20
failureThreshold: 5
periodSeconds: 60
command: ~
# List of worker sets. Each item can override values from the parent 'workers' section.
sets: []
# sets:
# - name: highcpu
# replicas: 2
# queue: "highcpu"
# resources:
# requests:
# memory: "2Gi"
# cpu: "4000m"
# limits:
# memory: "4Gi"
# cpu: "8000m"
# - name: highmem
# replicas: 2
# queue: "highmem"
# resources:
# requests:
# memory: "4Gi"
# cpu: "2000m"
# limits:
chart/values.schema.json:2881
- The description states that successThreshold must be 1 for liveness probes, but the schema doesn't enforce this constraint (no enum or const limiting it to 1). While Kubernetes will validate this at deployment time, consider adding schema-level validation (e.g., "const": 1) to catch misconfigurations earlier during values validation for liveness and startup probes, while allowing flexibility for readiness probes where values > 1 are permitted.
"storageClassName": {
"description": "If using a custom StorageClass, pass name ref to all StatefulSets here (templated).",
"type": [
"string",
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| }, | ||
| "persistence": { | ||
| "description": "Persistence configuration for Airflow Celery workers.", |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description states that successThreshold must be 1 for startup probes, but the schema doesn't enforce this constraint. While Kubernetes will validate this at deployment time, consider adding schema-level validation (e.g., "const": 1) to catch misconfigurations earlier during values validation.
jscheffl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that technically it is possible to add much more probes than currently supported by the chart. Nevertheless adding more probe options also adds complexity in maintenance and confuse users.
So far I am not convinced because for the scheduler component only certain probes make sense. I my view we should provide options that are meaningful and helpful that a standard setup is operated in good and stable manner per default with some options to tweak. Delivering a swiss army knife w/o proposed guidance is not what I would favor.
Can you please convince me for the proposed PR why and how a grpc check could be an better option to the existing ot how a tcpSocket check can be used for the scheduler that would improve for the admin who deploys?
This change extends the scheduler pod's probe configuration to support the full Kubernetes probe specification, bringing it in line with other chart components like workers and webserver.
Changes:
Fixes #60134