Skip to content

Conversation

@GaneshPatil7517
Copy link

This change extends the scheduler pod's probe configuration to support the full Kubernetes probe specification, bringing it in line with other chart components like workers and webserver.

Changes:

  • Extended livenessProbe schema to accept httpGet, tcpSocket, grpc handlers
  • Added readinessProbe support for scheduler (disabled by default)
  • Extended startupProbe schema with same flexibility
  • Added 'enabled' flag to all probes for fine-grained control
  • Updated scheduler-deployment.yaml template to dynamically render probes
  • Maintains backward compatibility with existing configurations

Fixes #60134

This change extends the scheduler pod's probe configuration to support
the full Kubernetes probe specification, bringing it in line with other
chart components like workers and webserver.

Changes:
- Extended livenessProbe schema to accept httpGet, tcpSocket, grpc handlers
- Added readinessProbe support for scheduler (disabled by default)
- Extended startupProbe schema with same flexibility
- Added 'enabled' flag to all probes for fine-grained control
- Updated scheduler-deployment.yaml template to dynamically render probes
- Maintains backward compatibility with existing configurations

Fixes apache#60134
Copilot AI review requested due to automatic review settings January 21, 2026 04:48
@boring-cyborg boring-cyborg bot added the area:helm-chart Airflow Helm Chart label Jan 21, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the Airflow Helm chart's scheduler component to support the full Kubernetes probe specification, aligning it with other components like workers and webserver. Additionally, it includes a significant refactoring of the workers persistence configuration structure (not mentioned in the PR description).

Changes:

  • Extended scheduler probe configurations (liveness, readiness, startup) to support all Kubernetes probe handlers (exec, httpGet, tcpSocket, grpc) with enabled flags for fine-grained control
  • Added readinessProbe support for scheduler (disabled by default)
  • Refactored workers.persistence configuration to workers.celery.persistence with deprecation of the old structure

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
chart/values.yaml Added deprecation notices for workers.persistence, introduced workers.celery.persistence configuration, and extended scheduler probe configurations with full Kubernetes probe spec support including comments and examples
chart/values.schema.json Updated schema to mark workers.persistence as deprecated, added workers.celery.persistence schema, and extended scheduler probe schemas to include all probe handler types and optional fields like successThreshold and terminationGracePeriodSeconds
chart/templates/scheduler/scheduler-deployment.yaml Implemented backward compatibility logic for workers.persistence migration, updated probe rendering to support multiple handler types (httpGet, tcpSocket, grpc, exec) with conditional rendering based on enabled flags
Comments suppressed due to low confidence (3)

chart/templates/scheduler/scheduler-deployment.yaml:28

  • The backward compatibility logic for workers.celery.persistence.enabled has a flaw. When a user upgrades from an old chart version with 'workers.persistence.enabled: false' in their values, but hasn't added 'workers.celery.persistence' to their configuration, Helm will merge their old value with the new chart's default value for 'workers.celery.persistence.enabled: true'. The template will then evaluate the first part of the 'or' expression as true (using the new default), ignoring the user's original 'false' setting.

The 'has' check only helps when the value is nil/unset, but with a boolean default value defined in values.yaml line 1074, the field will never be nil. Consider using a pattern that checks if the old config should take precedence, such as checking if workers.persistence.enabled is explicitly set by examining if it differs from the default, or removing the default value for workers.celery.persistence.enabled to let it be truly optional during the migration period.

{{- $persistence := or .Values.workers.celery.persistence.enabled (and (not (has .Values.workers.celery.persistence.enabled (list true false))) .Values.workers.persistence.enabled) }}

chart/values.yaml:1099

  • The PR title and description focus exclusively on scheduler probe configuration changes, but this PR also includes significant refactoring of the workers.persistence configuration structure (deprecating workers.persistence in favor of workers.celery.persistence). These persistence-related changes should either be documented in the PR description or separated into their own PR, as they represent a distinct feature change beyond the probe configuration improvements. This affects multiple files and introduces backward compatibility considerations that deserve explicit discussion.
    livenessProbe:
      enabled: true
      initialDelaySeconds: 10
      timeoutSeconds: 20
      failureThreshold: 5
      periodSeconds: 60
      command: ~

    # List of worker sets. Each item can override values from the parent 'workers' section.
    sets: []
    # sets:
    #   - name: highcpu
    #     replicas: 2
    #     queue: "highcpu"
    #     resources:
    #       requests:
    #         memory: "2Gi"
    #         cpu: "4000m"
    #       limits:
    #         memory: "4Gi"
    #         cpu: "8000m"
    #   - name: highmem
    #     replicas: 2
    #     queue: "highmem"
    #     resources:
    #       requests:
    #         memory: "4Gi"
    #         cpu: "2000m"
    #       limits:

chart/values.schema.json:2881

  • The description states that successThreshold must be 1 for liveness probes, but the schema doesn't enforce this constraint (no enum or const limiting it to 1). While Kubernetes will validate this at deployment time, consider adding schema-level validation (e.g., "const": 1) to catch misconfigurations earlier during values validation for liveness and startup probes, while allowing flexibility for readiness probes where values > 1 are permitted.
                                "storageClassName": {
                                    "description": "If using a custom StorageClass, pass name ref to all StatefulSets here (templated).",
                                    "type": [
                                        "string",

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 3006 to +3009
}
},
"persistence": {
"description": "Persistence configuration for Airflow Celery workers.",
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description states that successThreshold must be 1 for startup probes, but the schema doesn't enforce this constraint. While Kubernetes will validate this at deployment time, consider adding schema-level validation (e.g., "const": 1) to catch misconfigurations earlier during values validation.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that technically it is possible to add much more probes than currently supported by the chart. Nevertheless adding more probe options also adds complexity in maintenance and confuse users.

So far I am not convinced because for the scheduler component only certain probes make sense. I my view we should provide options that are meaningful and helpful that a standard setup is operated in good and stable manner per default with some options to tweak. Delivering a swiss army knife w/o proposed guidance is not what I would favor.

Can you please convince me for the proposed PR why and how a grpc check could be an better option to the existing ot how a tcpSocket check can be used for the scheduler that would improve for the admin who deploys?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:helm-chart Airflow Helm Chart

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Airflow Helm Chart: Scheduler Should Support Full Probe Configuration (livenessProbe/readinessProbe/readinessProbe)

2 participants