Helm chart: support bidirectional Airflow metadata DB reconciliation on helm upgrade (downgrade as well as upgrade)

### Description

Today the official Airflow Helm chart's `migrateDatabaseJob` only runs forward `airflow db migrate`. Doing a `helm upgrade` that targets an **older** `airflowVersion` than the one currently running leaves the metadata DB schema ahead of the running image, and the api-server pod fails to start. The chart should reconcile the DB schema in **both** directions — upgrade *and* downgrade — based on the dispatched `airflowVersion`.

### Use case/motivation

We operate Airflow on Kubernetes via this chart and ship to multiple environments through CI. We need rollback to be a first-class operation:

- Deploying an older release tag (`helm upgrade` with an older `airflowVersion`) should bring the cluster — schema included — back to that release.
- Today this requires out-of-band tooling: detecting current vs target version, then `kubectl exec`ing into the still-running old api-server pod and invoking `airflow db downgrade --to-version <target> --yes` **before** helm starts rolling the new image. We've implemented this as a workflow step driven by a small bash script, but it duplicates logic for every team using the chart and only helps people deploying via GitHub Actions — not ArgoCD, not manual helm.

A chart-native solution would mean: set `airflowVersion: <older>` in values, run `helm upgrade`, the chart reconciles the schema, the new (older) pods come up.

#### Hard constraint that shapes the design

`airflow db downgrade --to-version X.Y.Z` requires the alembic revision scripts for **every revision between the current head and the target**. Those scripts only ship inside the image of the version that **introduced** them. So:

| Direction | Image that must run the operation | Why |
|---|---|---|
| Upgrade (current < target) | **Target** image | Forward revisions ship in the target image |
| Downgrade (current > target) | **Currently running** image | Reverse-direction code for revisions to undo only exists in the current image |
| Same | none | No-op |

Today's `migrateDatabaseJob` is correct for the first row only. A pre-upgrade hook running with `airflow_image_for_migrations` (the target image) cannot perform a downgrade — the target image doesn't carry the scripts that need to be reversed.

#### Proposed design — single reconcile job, runtime decision

Helm templates render before any cluster read, so the chart can't pick the action at template time. But it doesn't need to: **one** job decides at runtime which action is required.

Keep the existing `migrateDatabaseJob` (rendered as `<release>-run-airflow-migrations`) — same name, same value keys, same ServiceAccount. Only the hook annotations and the container's command change.

```
helm.sh/hook                     helm.sh/hook-weight
─────────────                    ─────────────────────
pre-install,pre-upgrade          1   <release>-run-airflow-migrations   (same job as today)
```

(Was `post-install,post-upgrade` — moved to pre-upgrade so the schema is aligned before new pods roll, and so the downgrade branch can `kubectl exec` into still-running old pods.)

Container runs with the chart-templated **target** image. Pseudocode:

```sh
target=$AIRFLOW_TARGET_VERSION       # injected from .Values.airflowVersion
current=$(discover_current_version)  # query alembic_version table + mapping shipped in chart

case in
  current == ""     )  exec airflow db migrate ;;                              # fresh install
  current == target )  exit 0 ;;                                               # no-op
  current  < target )  exec airflow db migrate ;;                              # forward — target image has the scripts
  current  > target )                                                          # backward — must use old image
    old_pod=$(kubectl get pod -l component=api-server -o jsonpath='...' | head -1)
    exec kubectl exec -n "$NAMESPACE" "$old_pod" -c api-server -- airflow db downgrade --to-version "$target" --yes
  ;;
esac
```

##### Why pre-upgrade for *both* directions works

- **Forward migrate** with the target image in pre-upgrade is what the chart already supports today via `airflow_image_for_migrations` — moving it from post-upgrade to pre-upgrade just means the schema is correct *before* the new pods start rolling instead of being raced by the `waitForMigrations` initContainer. Functionally equivalent for existing users.
- **Downgrade** in pre-upgrade is the only window that works: the **old** api-server pods are still alive and reachable via `kubectl exec`, and their image carries the alembic reverse scripts. Once `pre-upgrade` returns and helm starts applying manifests, those pods get replaced.

##### Why a single job rather than two

| Aspect | Two-job design | Single reconcile job |
|---|---|---|
| Templates rendered | 2 | 1 |
| Hook weights to reason about | -10 / 1 | 1 |
| Race between hooks | yes (downgrade must finish before migrate starts) | none |
| "Same version" code path | both jobs no-op | one early-exit |
| Cluster reads | each job re-discovers current | once |
| `waitForMigrations` race | unchanged | gone — schema is aligned before new pods roll |

##### Discovery of `current`

Preference: query `alembic_version` table + ship a small alembic-rev → Airflow-version map alongside `appVersion` bumps. Avoids needing extra RBAC for version discovery — DB credentials are already available via `standard_airflow_environment`.

Alternatives if the mapping is undesirable: read `Deployment/<release>-api-server` pod spec image (requires `deployments.get`), or `kubectl exec -- airflow version` on the running pod (uses the same `pods/exec` RBAC the downgrade itself needs).

##### RBAC

Extend the existing `migrateDatabaseJob` ServiceAccount (`<release>-migrate-database-job`) with a `Role` scoped to the release namespace:

- `pods, pods/exec` (verbs: `get`, `list`, `create`) — to run `airflow db downgrade` against the live api-server pod.

Forward migrate doesn't need this — it's only consumed by the downgrade branch.

##### Backward compatibility

- Same job name, same value keys (`migrateDatabaseJob.resources`, `tolerations`, etc.), same ServiceAccount name. Users' values files don't need changes.
- Hook moves from `post-install,post-upgrade` to `pre-install,pre-upgrade`. Functionally equivalent for forward migrations — just removes the race with the `waitForMigrations` initContainer on the new pods.
- Upgrade and same-version paths are byte-identical for existing users (they only ever hit the forward-migrate branch).
- Downgrade is always permitted — no opt-in flag. Today a downgrade `helm upgrade` half-succeeds and leaves the cluster broken; after this change it completes cleanly. There is no "safer" status quo to preserve.

##### Test matrix to add under `chart/tests/`

1. Fresh install (no `alembic_version` row) → forward migrate
2. Same version → early-exit no-op
3. Forward (current < target) → forward migrate with target image
4. Backward (current > target) → `kubectl exec` into discovered old pod with `airflow db downgrade --to-version <target> --yes`
5. `migrate-database-job` ServiceAccount's Role renders with `pods/exec`

End-to-end (kind via `breeze k8s tests`):

- Install 3.0.x → upgrade to 3.1.x → downgrade to 3.0.x → verify alembic head matches the 3.0.x branch tip and api-server starts.

#### Why a chart hook rather than out-of-band tooling

Because every operator (GitHub Actions, ArgoCD, Flux, manual `helm upgrade`) hits the same problem, the chart is the single right place to own the contract. The current state forces each deployer to maintain their own pre-helm script.

### Related issues

None I could find on apache/airflow that propose chart-side support for downgrade. Closest relatives:

- #55689 — *Not able to downgrade from AF3 to AF2 without FAB provider* (closed; about provider-side compatibility, not chart behaviour).
- #63532 / #63535 — performance/correctness of specific downgrade migrations (orthogonal — those are about the migrations themselves working at all; this issue is about when/how the chart runs them).

### Are you willing to submit a PR?

- [x] Yes I am willing to submit a PR!

### Code of Conduct

- [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helm chart: support bidirectional Airflow metadata DB reconciliation on helm upgrade (downgrade as well as upgrade) #68072

Description

Use case/motivation

Hard constraint that shapes the design

Proposed design — single reconcile job, runtime decision

Why pre-upgrade for both directions works

Why a single job rather than two

Discovery of `current`

RBAC

Backward compatibility

Test matrix to add under `chart/tests/`

Why a chart hook rather than out-of-band tooling

Related issues

Are you willing to submit a PR?

Code of Conduct

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Direction	Image that must run the operation	Why
Upgrade (current < target)	Target image	Forward revisions ship in the target image
Downgrade (current > target)	Currently running image	Reverse-direction code for revisions to undo only exists in the current image
Same	none	No-op

Aspect	Two-job design	Single reconcile job
Templates rendered	2	1
Hook weights to reason about	-10 / 1	1
Race between hooks	yes (downgrade must finish before migrate starts)	none
"Same version" code path	both jobs no-op	one early-exit
Cluster reads	each job re-discovers current	once
`waitForMigrations` race	unchanged	gone — schema is aligned before new pods roll

Helm chart: support bidirectional Airflow metadata DB reconciliation on helm upgrade (downgrade as well as upgrade) #68072

Description

Description

Use case/motivation

Hard constraint that shapes the design

Proposed design — single reconcile job, runtime decision

Why pre-upgrade for both directions works

Why a single job rather than two

Discovery of current

RBAC

Backward compatibility

Test matrix to add under chart/tests/

Why a chart hook rather than out-of-band tooling

Related issues

Are you willing to submit a PR?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Discovery of `current`

Test matrix to add under `chart/tests/`