Skip to content

edge3: upgrade from 1.x silently leaves schema stale — BaseDBManager.upgradedb stamps alembic_version_edge3 to head without running migrations #66524

@irakl1s

Description

@irakl1s

Apache Airflow version

3.2.1

What happened

Upgrading an existing deployment from Airflow 3.1.0 + apache-airflow-providers-edge3==1.3.0 to Airflow 3.2.1 + apache-airflow-providers-edge3==3.4.0, the airflow db migrate command reports success but leaves the edge3 schema stale: the alembic_version_edge3 table is stamped to head a09c3ee8e1d3, but the columns that should have been added by migrations 0002 (edge_worker.concurrency) and 0004 (edge_job.team_name, edge_worker.team_name) are missing from the database.

Any Airflow process that subsequently queries the edge tables (e.g. the scheduler) fails with:

sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn)
column edge_job.team_name does not exist

airflow db migrate itself reports Database migration done! and exits 0. There is no log line, warning, or error that signals the schema is broken — operators have no way to detect this from the migrate output alone.

The smoking-gun lines in the airflow db migrate output are:

Upgrading the EdgeDBManager database
Creating EdgeDBManager tables from the ORM     ← fallback path fires
Running stamp_revision  -> a09c3ee8e1d3        ← stamps head, no migration bodies execute
EdgeDBManager tables have been created from the ORM

There is no Running upgrade 9d34dfc2de06 -> b3c4d5e6f7a8 ... line (or any subsequent edge3 chain entry) — i.e. no op.add_column ever runs.

Expected behavior

airflow db migrate should detect that the edge3 tables already exist (from the pre-EdgeDBManager era — edge3 ≤ 3.0.x had its models registered against Base.metadata and tables were created via metadata.create_all), stamp alembic_version_edge3 to the base revision 9d34dfc2de06, and then walk the Alembic chain to head, applying every op.add_column/op.alter_column along the way.

This logic is already implemented in EdgeDBManager.initdb. The bug is that this code path is unreachable on the actual upgrade scenario — see "Root cause" below.

How to reproduce

Self-contained reproducer using only public images and public PyPI packages.

docker-compose.yml:

services:
  postgres:
    image: postgres:15
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "airflow"]
      interval: 5s
      timeout: 5s
      retries: 10

  airflow-source:
    image: apache/airflow:3.1.0-python3.10
    profiles: ["source"]
    depends_on:
      postgres: { condition: service_healthy }
    environment:
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
      AIRFLOW__CORE__EXECUTOR: airflow.providers.edge3.executors.EdgeExecutor
      AIRFLOW__CORE__LOAD_EXAMPLES: "false"
      AIRFLOW__CORE__AUTH_MANAGER: airflow.providers.fab.auth_manager.fab_auth_manager.FabAuthManager
    entrypoint: ["bash", "-c"]
    command:
      - |
        set -e
        pip install --quiet apache-airflow-providers-edge3==1.3.0
        airflow db migrate

  airflow-target:
    image: apache/airflow:3.2.1-python3.10
    profiles: ["target"]
    depends_on:
      postgres: { condition: service_healthy }
    environment:
      AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
      AIRFLOW__CORE__EXECUTOR: airflow.providers.edge3.executors.EdgeExecutor
      AIRFLOW__CORE__LOAD_EXAMPLES: "false"
      AIRFLOW__CORE__AUTH_MANAGER: airflow.providers.fab.auth_manager.fab_auth_manager.FabAuthManager
    entrypoint: ["bash", "-c"]
    command:
      - |
        set -e
        pip install --quiet apache-airflow-providers-edge3==3.4.0
        airflow db migrate

Steps:

# 1. Start Postgres
docker compose up -d postgres

# 2. Bootstrap source state (Airflow 3.1.0 + edge3 1.3.0)
#    This creates edge_job/edge_worker/edge_logs via Base.metadata.create_all
#    and stamps alembic_version + alembic_version_fab. NO alembic_version_edge3.
docker compose --profile source run --rm airflow-source

# 3. Verify source state
docker exec -i $(docker compose ps -q postgres) psql -U airflow -d airflow -c "\dt"
# Expect: edge_job, edge_worker, edge_logs present
# Expect: alembic_version, alembic_version_fab present, NO alembic_version_edge3

# 4. Run the upgrade (Airflow 3.2.1 + edge3 3.4.0)
docker compose --profile target run --rm airflow-target

# 5. Observe broken state
docker exec -i $(docker compose ps -q postgres) psql -U airflow -d airflow <<'SQL'
SELECT version_num FROM alembic_version_edge3;
-- expected fix:    9d34dfc2de06 base, walked up to a09c3ee8e1d3 with op.add_column applied
-- actual bug:      a09c3ee8e1d3 stamped, no op.add_column ever ran

SELECT EXISTS(SELECT 1 FROM information_schema.columns WHERE table_name='edge_job'    AND column_name='team_name')   AS edge_job_team_name,
       EXISTS(SELECT 1 FROM information_schema.columns WHERE table_name='edge_worker' AND column_name='team_name')   AS edge_worker_team_name,
       EXISTS(SELECT 1 FROM information_schema.columns WHERE table_name='edge_worker' AND column_name='concurrency') AS edge_worker_concurrency;
-- actual: f | f | f

SELECT team_name FROM edge_job LIMIT 1;
-- ERROR: column "team_name" does not exist
SQL

airflow db migrate in step 4 exits 0 with Database migration done!. This is the operator-visible failure mode.

Root cause

Verified at tags apache-airflow/3.2.1 (core) and providers-edge3/3.4.0 (provider).

When airflow db migrate runs against a database where core alembic_version already exists (i.e. an upgrade, not a fresh install), the call routes through airflow.utils.db.upgradedb()_run_upgradedb(). After the core upgrade completes, _run_upgradedb calls external_db_manager.upgradedb(...) — not .initdb(...).

RunDBManager.upgradedb iterates registered managers and invokes each manager's upgradedb. EdgeDBManager does not override upgradedb, so it falls through to BaseDBManager.upgradedb:

def upgradedb(self, to_revision=None, from_revision=None, show_sql_only=False, use_migration_files=False):
    self.log.info("Upgrading the %s database", self.__class__.__name__)
    self._release_metadata_locks_if_needed()
    current_revision = self.get_current_revision()

    if not current_revision and not to_revision and not use_migration_files and not show_sql_only:
        self.create_db_from_orm()      # ← FALLBACK fires here (no alembic_version_edge3 row exists)
        return

    config = self.get_alembic_config()
    command.upgrade(config, revision=to_revision or "heads", sql=show_sql_only)

create_db_from_orm does:

def create_db_from_orm(self):
    self.metadata.create_all(engine)   # SQLAlchemy: only creates *missing* tables (no-op here)
    command.stamp(config, "head")      # ← stamps directly to ORM head, no migrations run

End state on a database that came from edge3 1.x (tables exist, no alembic_version_edge3):

  • metadata.create_all is a no-op because the tables already exist.
  • command.stamp(config, "head") writes the current ORM head (a09c3ee8e1d3 for edge3 3.4.0) into alembic_version_edge3.
  • Migrations 0002 (b3c4d5e6f7a8 — adds edge_worker.concurrency), 0003 (8c275b6fbaa8DateTimeTIMESTAMP type fixes), and 0004 (a09c3ee8e1d3 — adds team_name to both tables) never execute, even though Alembic now believes they have.

Notably, EdgeDBManager.initdb does handle this scenario correctly: if edge3 tables exist but alembic_version_edge3 doesn't, it stamps to the base revision 9d34dfc2de06 and runs an incremental upgrade. But initdb is only called on the fresh-install path (when core alembic_version is also empty), not on an upgrade. The careful detection logic is dead code on this scenario.

This bug is not specific to edge3 — it will fire for any provider DB manager whose tables predated the manager itself, on any DB-already-initialized upgrade path.

Operating System

macOS (Docker Desktop) for the reproducer.

Versions of Apache Airflow Providers

  • apache-airflow-providers-edge3==1.3.0 (source)
  • apache-airflow-providers-edge3==3.4.0 (target)

Deployment

Official Apache Airflow Helm Chart

Deployment details

Originally observed on:

  • Official Apache Airflow Helm chart 1.18 → 1.21 upgrade
  • Apache Airflow 3.1.0 → 3.2.1
  • apache-airflow-providers-edge3 1.3.0 → 3.4.0

Reproduced locally on apache/airflow:3.1.0-python3.10apache/airflow:3.2.1-python3.10 (compose stanzas above) with edge3 1.3.0 / 3.4.0 from PyPI — the bug fires without any chart involved.

Anything else

Workaround for affected users

Until an upstream fix lands, affected users can run the migrations manually by invoking EdgeDBManager.initdb() directly from any Python environment that can already reach the Airflow metadata DB (i.e. anywhere airflow db migrate would work). This bypasses the broken upgradedb routing and runs the official, correct upgrade-from-existing-tables logic — stamps to base 9d34dfc2de06, then walks the chain applying every migration body:

from airflow.utils.session import create_session
from airflow.providers.edge3.models.db import EdgeDBManager
with create_session() as session:
    EdgeDBManager(session).initdb()

Empirically verified on apache/airflow:3.2.1-python3.10 + edge3==3.4.0. Log output:

Running stamp_revision  -> 9d34dfc2de06
Upgrading the EdgeDBManager database
Running upgrade 9d34dfc2de06 -> b3c4d5e6f7a8, Add concurrency column to edge_worker table.
Running upgrade b3c4d5e6f7a8 -> 8c275b6fbaa8, Fix migration file/ORM inconsistencies.
Running upgrade 8c275b6fbaa8 -> a09c3ee8e1d3, Add team_name column to edge_job and edge_worker tables.
Migrated the EdgeDBManager database

End state: schema correct, alembic_version_edge3 = a09c3ee8e1d3, all migration bodies (including 0003's DateTime → TIMESTAMP fixes) applied.

Suggested fix paths

Any one of these would resolve the bug. Listed in order of decreasing preference:

  1. Route _run_upgradedb to the manager's initdb when its version table is absent. EdgeDBManager.initdb already contains the correct upgrade-from-existing-tables logic (verified by invoking it directly on the broken state — it stamps to base and walks the chain cleanly, see the workaround section above). The fix is essentially a routing change, no new logic required.

  2. Fix BaseDBManager.upgradedb to mirror initdb's pre-Alembic detection. Before falling into create_db_from_orm, check if any of the manager's tables already exist; if so, stamp to base and run an incremental upgrade. Fixes the bug for every provider DB manager, not just edge3.

  3. Override upgradedb on EdgeDBManager to handle the upgrade-from-existing-tables case the same way initdb does. Local fix; doesn't help other providers in the same situation.

Repro frequency

100% deterministic on the reproducer above.

Related

  • #61155 — Introduce EdgeDBManager. Author handled the upgrade-from-1.x case in initdb but initdb is unreachable on this path.
  • #62234 — Re-introducing --use-migration-files and ORM/migration consistency. Refined initdb.
  • #62308 — Auto-discover DB managers from provider.yaml. Auto-discovery works correctly; the bug is downstream of that.
  • #61646 — AIP-67 multi-team Edge Executor. Added migration 0004 whose absent execution is the operator-visible symptom.

Are you willing to submit PR?

Open to it for option (1) or (2) above with maintainer guidance on direction.

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:corearea:db-migrationsPRs with DB migrationkind:bugThis is a clearly a bugpriority:highHigh priority bug that should be patched quickly but does not require immediate new release

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions