Description
Today the official Airflow Helm chart's migrateDatabaseJob only runs forward airflow db migrate. Doing a helm upgrade that targets an older airflowVersion than the one currently running leaves the metadata DB schema ahead of the running image, and the api-server pod fails to start. The chart should reconcile the DB schema in both directions — upgrade and downgrade — based on the dispatched airflowVersion.
Use case/motivation
We operate Airflow on Kubernetes via this chart and ship to multiple environments through CI. We need rollback to be a first-class operation:
- Deploying an older release tag (
helm upgrade with an older airflowVersion) should bring the cluster — schema included — back to that release.
- Today this requires out-of-band tooling: detecting current vs target version, then
kubectl execing into the still-running old api-server pod and invoking airflow db downgrade --to-version <target> --yes before helm starts rolling the new image. We've implemented this as a workflow step driven by a small bash script, but it duplicates logic for every team using the chart and only helps people deploying via GitHub Actions — not ArgoCD, not manual helm.
A chart-native solution would mean: set airflowVersion: <older> in values, run helm upgrade, the chart reconciles the schema, the new (older) pods come up.
Hard constraint that shapes the design
airflow db downgrade --to-version X.Y.Z requires the alembic revision scripts for every revision between the current head and the target. Those scripts only ship inside the image of the version that introduced them. So:
| Direction |
Image that must run the operation |
Why |
| Upgrade (current < target) |
Target image |
Forward revisions ship in the target image |
| Downgrade (current > target) |
Currently running image |
Reverse-direction code for revisions to undo only exists in the current image |
| Same |
none |
No-op |
Today's migrateDatabaseJob is correct for the first row only. A pre-upgrade hook running with airflow_image_for_migrations (the target image) cannot perform a downgrade — the target image doesn't carry the scripts that need to be reversed.
Proposed design — single reconcile job, runtime decision
Helm templates render before any cluster read, so the chart can't pick the action at template time. But it doesn't need to: one job decides at runtime which action is required.
Keep the existing migrateDatabaseJob (rendered as <release>-run-airflow-migrations) — same name, same value keys, same ServiceAccount. Only the hook annotations and the container's command change.
helm.sh/hook helm.sh/hook-weight
───────────── ─────────────────────
pre-install,pre-upgrade 1 <release>-run-airflow-migrations (same job as today)
(Was post-install,post-upgrade — moved to pre-upgrade so the schema is aligned before new pods roll, and so the downgrade branch can kubectl exec into still-running old pods.)
Container runs with the chart-templated target image. Pseudocode:
target=$AIRFLOW_TARGET_VERSION # injected from .Values.airflowVersion
current=$(discover_current_version) # query alembic_version table + mapping shipped in chart
case in
current == "" ) exec airflow db migrate ;; # fresh install
current == target ) exit 0 ;; # no-op
current < target ) exec airflow db migrate ;; # forward — target image has the scripts
current > target ) # backward — must use old image
old_pod=$(kubectl get pod -l component=api-server -o jsonpath='...' | head -1)
exec kubectl exec -n "$NAMESPACE" "$old_pod" -c api-server -- airflow db downgrade --to-version "$target" --yes
;;
esac
Why pre-upgrade for both directions works
- Forward migrate with the target image in pre-upgrade is what the chart already supports today via
airflow_image_for_migrations — moving it from post-upgrade to pre-upgrade just means the schema is correct before the new pods start rolling instead of being raced by the waitForMigrations initContainer. Functionally equivalent for existing users.
- Downgrade in pre-upgrade is the only window that works: the old api-server pods are still alive and reachable via
kubectl exec, and their image carries the alembic reverse scripts. Once pre-upgrade returns and helm starts applying manifests, those pods get replaced.
Why a single job rather than two
| Aspect |
Two-job design |
Single reconcile job |
| Templates rendered |
2 |
1 |
| Hook weights to reason about |
-10 / 1 |
1 |
| Race between hooks |
yes (downgrade must finish before migrate starts) |
none |
| "Same version" code path |
both jobs no-op |
one early-exit |
| Cluster reads |
each job re-discovers current |
once |
waitForMigrations race |
unchanged |
gone — schema is aligned before new pods roll |
Discovery of current
Preference: query alembic_version table + ship a small alembic-rev → Airflow-version map alongside appVersion bumps. Avoids needing extra RBAC for version discovery — DB credentials are already available via standard_airflow_environment.
Alternatives if the mapping is undesirable: read Deployment/<release>-api-server pod spec image (requires deployments.get), or kubectl exec -- airflow version on the running pod (uses the same pods/exec RBAC the downgrade itself needs).
RBAC
Extend the existing migrateDatabaseJob ServiceAccount (<release>-migrate-database-job) with a Role scoped to the release namespace:
pods, pods/exec (verbs: get, list, create) — to run airflow db downgrade against the live api-server pod.
Forward migrate doesn't need this — it's only consumed by the downgrade branch.
Backward compatibility
- Same job name, same value keys (
migrateDatabaseJob.resources, tolerations, etc.), same ServiceAccount name. Users' values files don't need changes.
- Hook moves from
post-install,post-upgrade to pre-install,pre-upgrade. Functionally equivalent for forward migrations — just removes the race with the waitForMigrations initContainer on the new pods.
- Upgrade and same-version paths are byte-identical for existing users (they only ever hit the forward-migrate branch).
- Downgrade is always permitted — no opt-in flag. Today a downgrade
helm upgrade half-succeeds and leaves the cluster broken; after this change it completes cleanly. There is no "safer" status quo to preserve.
Test matrix to add under chart/tests/
- Fresh install (no
alembic_version row) → forward migrate
- Same version → early-exit no-op
- Forward (current < target) → forward migrate with target image
- Backward (current > target) →
kubectl exec into discovered old pod with airflow db downgrade --to-version <target> --yes
migrate-database-job ServiceAccount's Role renders with pods/exec
End-to-end (kind via breeze k8s tests):
- Install 3.0.x → upgrade to 3.1.x → downgrade to 3.0.x → verify alembic head matches the 3.0.x branch tip and api-server starts.
Why a chart hook rather than out-of-band tooling
Because every operator (GitHub Actions, ArgoCD, Flux, manual helm upgrade) hits the same problem, the chart is the single right place to own the contract. The current state forces each deployer to maintain their own pre-helm script.
Related issues
None I could find on apache/airflow that propose chart-side support for downgrade. Closest relatives:
Are you willing to submit a PR?
Code of Conduct
Description
Today the official Airflow Helm chart's
migrateDatabaseJobonly runs forwardairflow db migrate. Doing ahelm upgradethat targets an olderairflowVersionthan the one currently running leaves the metadata DB schema ahead of the running image, and the api-server pod fails to start. The chart should reconcile the DB schema in both directions — upgrade and downgrade — based on the dispatchedairflowVersion.Use case/motivation
We operate Airflow on Kubernetes via this chart and ship to multiple environments through CI. We need rollback to be a first-class operation:
helm upgradewith an olderairflowVersion) should bring the cluster — schema included — back to that release.kubectl execing into the still-running old api-server pod and invokingairflow db downgrade --to-version <target> --yesbefore helm starts rolling the new image. We've implemented this as a workflow step driven by a small bash script, but it duplicates logic for every team using the chart and only helps people deploying via GitHub Actions — not ArgoCD, not manual helm.A chart-native solution would mean: set
airflowVersion: <older>in values, runhelm upgrade, the chart reconciles the schema, the new (older) pods come up.Hard constraint that shapes the design
airflow db downgrade --to-version X.Y.Zrequires the alembic revision scripts for every revision between the current head and the target. Those scripts only ship inside the image of the version that introduced them. So:Today's
migrateDatabaseJobis correct for the first row only. A pre-upgrade hook running withairflow_image_for_migrations(the target image) cannot perform a downgrade — the target image doesn't carry the scripts that need to be reversed.Proposed design — single reconcile job, runtime decision
Helm templates render before any cluster read, so the chart can't pick the action at template time. But it doesn't need to: one job decides at runtime which action is required.
Keep the existing
migrateDatabaseJob(rendered as<release>-run-airflow-migrations) — same name, same value keys, same ServiceAccount. Only the hook annotations and the container's command change.(Was
post-install,post-upgrade— moved to pre-upgrade so the schema is aligned before new pods roll, and so the downgrade branch cankubectl execinto still-running old pods.)Container runs with the chart-templated target image. Pseudocode:
Why pre-upgrade for both directions works
airflow_image_for_migrations— moving it from post-upgrade to pre-upgrade just means the schema is correct before the new pods start rolling instead of being raced by thewaitForMigrationsinitContainer. Functionally equivalent for existing users.kubectl exec, and their image carries the alembic reverse scripts. Oncepre-upgradereturns and helm starts applying manifests, those pods get replaced.Why a single job rather than two
waitForMigrationsraceDiscovery of
currentPreference: query
alembic_versiontable + ship a small alembic-rev → Airflow-version map alongsideappVersionbumps. Avoids needing extra RBAC for version discovery — DB credentials are already available viastandard_airflow_environment.Alternatives if the mapping is undesirable: read
Deployment/<release>-api-serverpod spec image (requiresdeployments.get), orkubectl exec -- airflow versionon the running pod (uses the samepods/execRBAC the downgrade itself needs).RBAC
Extend the existing
migrateDatabaseJobServiceAccount (<release>-migrate-database-job) with aRolescoped to the release namespace:pods, pods/exec(verbs:get,list,create) — to runairflow db downgradeagainst the live api-server pod.Forward migrate doesn't need this — it's only consumed by the downgrade branch.
Backward compatibility
migrateDatabaseJob.resources,tolerations, etc.), same ServiceAccount name. Users' values files don't need changes.post-install,post-upgradetopre-install,pre-upgrade. Functionally equivalent for forward migrations — just removes the race with thewaitForMigrationsinitContainer on the new pods.helm upgradehalf-succeeds and leaves the cluster broken; after this change it completes cleanly. There is no "safer" status quo to preserve.Test matrix to add under
chart/tests/alembic_versionrow) → forward migratekubectl execinto discovered old pod withairflow db downgrade --to-version <target> --yesmigrate-database-jobServiceAccount's Role renders withpods/execEnd-to-end (kind via
breeze k8s tests):Why a chart hook rather than out-of-band tooling
Because every operator (GitHub Actions, ArgoCD, Flux, manual
helm upgrade) hits the same problem, the chart is the single right place to own the contract. The current state forces each deployer to maintain their own pre-helm script.Related issues
None I could find on apache/airflow that propose chart-side support for downgrade. Closest relatives:
Are you willing to submit a PR?
Code of Conduct