Skip to content

fix: migrate existing deadline rows in migration 0080 upgrade and downgrade#66016

Merged
vatsrahul1001 merged 6 commits into
mainfrom
fix/deadline-migration-0080-downgrade-not-null
May 18, 2026
Merged

fix: migrate existing deadline rows in migration 0080 upgrade and downgrade#66016
vatsrahul1001 merged 6 commits into
mainfrom
fix/deadline-migration-0080-downgrade-not-null

Conversation

@vatsrahul1001
Copy link
Copy Markdown
Contributor

@vatsrahul1001 vatsrahul1001 commented Apr 28, 2026

Fixes both the upgrade and downgrade paths of migration 0080_3_1_0_modify_deadline_callback_schema.py (808787349f22), which added NOT NULL columns to the deadline table without migrating existing row data, causing IntegrityError on non-empty databases.

Root cause

Upgrade (3bda03debd04 → 808787349f22): drops the old callback VARCHAR(500) / callback_kwargs JSON columns and then does ADD COLUMN callback JSON NOT NULL. PostgreSQL raises NotNullViolation immediately because existing rows have no value for the new column.

Downgrade (808787349f22 → 3bda03debd04): drops the new JSON callback column and then does ADD COLUMN callback VARCHAR(500) NOT NULL. Same NotNullViolation on PostgreSQL; MySQL silently writes NULL, which then crashes migration 0094's upgrade step with json.loads(None).

Fix

Both paths now follow the same safe-migration pattern already used by migration 0094:

  1. Read the existing column data before the schema change.
  2. Add the new column(s) as nullable=True so the DDL succeeds on all supported databases.
  3. Back-fill every row using a typed SQLAlchemy table() clause (dialect-aware JSON serialisation).
  4. Enforce NOT NULL only after all rows have valid values.

The upgrade serialises the old (callback VARCHAR, callback_kwargs JSON) pair into the {"__data__": {"path": ..., "kwargs": ...}, "__classname__": ..., "__version__": 0} envelope that migration 0094's upgrade expects.

The downgrade extracts path and kwargs from that same envelope before restoring the old VARCHAR column.

How to reproduce

  1. Start Airflow on any version that already ran migration 0080 (i.e. >= 3.1.0) with at least one deadline row in the database.
  2. Run airflow db downgrade past 808787349f22 (or airflow db upgrade from a pre-3.1.0 snapshot) – observe IntegrityError: NotNullViolation on PostgreSQL.
Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)
    Claude

@boring-cyborg boring-cyborg Bot added area:db-migrations PRs with DB migration area:deadline-alerts AIP-86 (former AIP-57) labels Apr 28, 2026
@vatsrahul1001 vatsrahul1001 force-pushed the fix/deadline-migration-0080-downgrade-not-null branch 3 times, most recently from 6bd9897 to db962e1 Compare April 29, 2026 09:34
@vatsrahul1001 vatsrahul1001 force-pushed the fix/deadline-migration-0080-downgrade-not-null branch 2 times, most recently from a3771b5 to 435af24 Compare May 4, 2026 08:41
…ngrade

Both the upgrade and downgrade paths of migration 0080
(808787349f22 - Modify deadline callback schema) added NOT NULL columns
to the deadline table without first populating them from the existing
data, causing:

* upgrade:   NotNullViolation when adding callback JSON NOT NULL to a
             non-empty table (existing rows have no value for the new
             column).
* downgrade: NotNullViolation on PostgreSQL / silent NULL on MySQL when
             adding callback VARCHAR(500) NOT NULL after dropping the
             JSON column, crashing the subsequent 0094 upgrade with
             json.loads(None).

Fix both paths with the same pattern used by migration 0094:

1. Read the existing data before any schema change.
2. Add the new column(s) as nullable so the DDL succeeds on all
   supported databases.
3. Back-fill the column using a typed SA table clause (so each dialect
   handles JSON serialisation correctly).
4. Enforce NOT NULL only after every row has a valid value.

Upgrade serialises the old (path, kwargs) pair into the
{"__data__": {"path": ..., "kwargs": ...}, "__classname__": ...,
 "__version__": 0} format expected by migration 0094.

Downgrade extracts path/kwargs back from that same JSON envelope
before restoring the VARCHAR callback column.

Made-with: Cursor
Co-authored-by: Cursor <cursoragent@cursor.com>
@vatsrahul1001 vatsrahul1001 force-pushed the fix/deadline-migration-0080-downgrade-not-null branch from 435af24 to 36eb93a Compare May 4, 2026 09:46
@vatsrahul1001 vatsrahul1001 requested a review from kaxil May 6, 2026 11:09
Copy link
Copy Markdown
Contributor

@ephraimbuddy ephraimbuddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still misses databases that already ran the old 0080. If MySQL created existing deadline.callback = NULL rows and Alembic is already stamped past 808787349f22, this edited 0080 upgrade will not run again, so 0094 can still hit json.loads(row.callback) with None.

I think we need either a repair path in 0094 or a follow-up migration that handles already-applied bad 0080 rows, plus a regression test that starts from a post-0080 schema with callback set to NULL.

Legacy MySQL deployments that ran the original (pre-#66016) 0080
migration silently wrote NULL into deadline.callback. Alembic is already
stamped past 808787349f22 on those deployments, so the fixed 0080 won't
re-run -- 0094 then crashes on json.loads(None).

Defensive handling in 0094:
- _upgrade_mysql_sqlite: detect raw_cb is None, log warning, default to
  empty envelope.
- _upgrade_postgresql: COALESCE on cb_path / cb_kwargs so NULL jsonb
  doesn't propagate into callback.data.

Regression test starts from a post-0080 schema with callback=NULL and
verifies 0094 produces an empty envelope without crashing, plus that a
mixed NULL+valid batch migrates both rows correctly.

Addresses ephraimbuddy's review comment on #66016.
- 0094 PG path: wrap kwargs COALESCE with NULLIF so JSON-literal null
  (e.g. from hand-edited DBs) is also normalized to {} -- matches the
  MySQL/SQLite defensive normalization.
- 0094 MySQL/SQLite: aggregate per-row NULL-callback WARNING into a
  single summary line after the loop to avoid log spam on deployments
  with many legacy NULL rows.
- 0080 downgrade: log a WARNING when an existing row's kwargs is not a
  dict (instead of silently resetting), mirroring the VARCHAR truncate
  warning pattern.
- 0080 tests: add test_upgrade_exact_batch_boundary covering the case
  where rows == batch_size to exercise the loop-continuation path.
@vatsrahul1001 vatsrahul1001 force-pushed the fix/deadline-migration-0080-downgrade-not-null branch from d6620f3 to 058d7e9 Compare May 18, 2026 10:20
… mismatch

`_upgrade_mysql_sqlite` declares the deadline `id` column as `sa.Uuid()`.
On SQLite the SQLAlchemy write path serializes UUID objects as a hex string
without dashes. The previous test inserted ids via `str(uuid.uuid4())`
(dashed form), so the SELECT correctly parsed them back to UUID objects
but the subsequent UPDATE's WHERE clause produced the hex form -- no rows
matched, the WHERE callback_id IS NULL filter never narrowed, and the
loop spun until the 60s execution timeout.

Insert ids via `uuid.uuid4().hex` so read and write round-trip cleanly.
@vatsrahul1001 vatsrahul1001 added this to the Airflow 3.2.2 milestone May 18, 2026
@vatsrahul1001 vatsrahul1001 added the backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch label May 18, 2026
Without this guard, an empty deadline table causes executemany with an
empty parameter list, which SQLAlchemy rejects with 'A value is required
for bind parameter'. Reverts the removal from e50f106.
@vatsrahul1001 vatsrahul1001 merged commit c8a6c55 into main May 18, 2026
143 checks passed
@vatsrahul1001 vatsrahul1001 deleted the fix/deadline-migration-0080-downgrade-not-null branch May 18, 2026 18:58
@github-actions
Copy link
Copy Markdown
Contributor

Backport successfully created: v3-2-test

Note: As of Merging PRs targeted for Airflow 3.X
the committer who merges the PR is responsible for backporting the PRs that are bug fixes (generally speaking) to the maintenance branches.

In matter of doubt please ask in #release-management Slack channel.

Status Branch Result
v3-2-test PR Link

github-actions Bot pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request May 18, 2026
…rade and downgrade (apache#66016)

* fix: migrate existing deadline rows in migration 0080 upgrade and downgrade

Both the upgrade and downgrade paths of migration 0080
(808787349f22 - Modify deadline callback schema) added NOT NULL columns
to the deadline table without first populating them from the existing
data, causing:

* upgrade:   NotNullViolation when adding callback JSON NOT NULL to a
             non-empty table (existing rows have no value for the new
             column).
* downgrade: NotNullViolation on PostgreSQL / silent NULL on MySQL when
             adding callback VARCHAR(500) NOT NULL after dropping the
             JSON column, crashing the subsequent 0094 upgrade with
             json.loads(None).

Fix both paths with the same pattern used by migration 0094:

1. Read the existing data before any schema change.
2. Add the new column(s) as nullable so the DDL succeeds on all
   supported databases.
3. Back-fill the column using a typed SA table clause (so each dialect
   handles JSON serialisation correctly).
4. Enforce NOT NULL only after every row has a valid value.

Upgrade serialises the old (path, kwargs) pair into the
{"__data__": {"path": ..., "kwargs": ...}, "__classname__": ...,
 "__version__": 0} format expected by migration 0094.

Downgrade extracts path/kwargs back from that same JSON envelope
before restoring the VARCHAR callback column.

Made-with: Cursor
Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(deadline): repair NULL callback rows in 0094 upgrade

Legacy MySQL deployments that ran the original (pre-apache#66016) 0080
migration silently wrote NULL into deadline.callback. Alembic is already
stamped past 808787349f22 on those deployments, so the fixed 0080 won't
re-run -- 0094 then crashes on json.loads(None).

Defensive handling in 0094:
- _upgrade_mysql_sqlite: detect raw_cb is None, log warning, default to
  empty envelope.
- _upgrade_postgresql: COALESCE on cb_path / cb_kwargs so NULL jsonb
  doesn't propagate into callback.data.

Regression test starts from a post-0080 schema with callback=NULL and
verifies 0094 produces an empty envelope without crashing, plus that a
mixed NULL+valid batch migrates both rows correctly.

Addresses ephraimbuddy's review comment on apache#66016.

* fix(deadline): address review nits

- 0094 PG path: wrap kwargs COALESCE with NULLIF so JSON-literal null
  (e.g. from hand-edited DBs) is also normalized to {} -- matches the
  MySQL/SQLite defensive normalization.
- 0094 MySQL/SQLite: aggregate per-row NULL-callback WARNING into a
  single summary line after the loop to avoid log spam on deployments
  with many legacy NULL rows.
- 0080 downgrade: log a WARNING when an existing row's kwargs is not a
  dict (instead of silently resetting), mirroring the VARCHAR truncate
  warning pattern.
- 0080 tests: add test_upgrade_exact_batch_boundary covering the case
  where rows == batch_size to exercise the loop-continuation path.

* test(deadline): insert deadline ids in hex form to avoid SA Uuid type mismatch

`_upgrade_mysql_sqlite` declares the deadline `id` column as `sa.Uuid()`.
On SQLite the SQLAlchemy write path serializes UUID objects as a hex string
without dashes. The previous test inserted ids via `str(uuid.uuid4())`
(dashed form), so the SELECT correctly parsed them back to UUID objects
but the subsequent UPDATE's WHERE clause produced the hex form -- no rows
matched, the WHERE callback_id IS NULL filter never narrowed, and the
loop spun until the 60s execution timeout.

Insert ids via `uuid.uuid4().hex` so read and write round-trip cleanly.

* fix(deadline): drop redundant empty-rows break in 0080 batch loops

* fix(deadline): restore empty-rows break in 0080 batch loops

Without this guard, an empty deadline table causes executemany with an
empty parameter list, which SQLAlchemy rejects with 'A value is required
for bind parameter'. Reverts the removal from e50f106.

---------
(cherry picked from commit c8a6c55)

Co-authored-by: Rahul Vats <43964496+vatsrahul1001@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
aws-airflow-bot pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request May 18, 2026
…rade and downgrade (apache#66016)

* fix: migrate existing deadline rows in migration 0080 upgrade and downgrade

Both the upgrade and downgrade paths of migration 0080
(808787349f22 - Modify deadline callback schema) added NOT NULL columns
to the deadline table without first populating them from the existing
data, causing:

* upgrade:   NotNullViolation when adding callback JSON NOT NULL to a
             non-empty table (existing rows have no value for the new
             column).
* downgrade: NotNullViolation on PostgreSQL / silent NULL on MySQL when
             adding callback VARCHAR(500) NOT NULL after dropping the
             JSON column, crashing the subsequent 0094 upgrade with
             json.loads(None).

Fix both paths with the same pattern used by migration 0094:

1. Read the existing data before any schema change.
2. Add the new column(s) as nullable so the DDL succeeds on all
   supported databases.
3. Back-fill the column using a typed SA table clause (so each dialect
   handles JSON serialisation correctly).
4. Enforce NOT NULL only after every row has a valid value.

Upgrade serialises the old (path, kwargs) pair into the
{"__data__": {"path": ..., "kwargs": ...}, "__classname__": ...,
 "__version__": 0} format expected by migration 0094.

Downgrade extracts path/kwargs back from that same JSON envelope
before restoring the VARCHAR callback column.

Made-with: Cursor
Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(deadline): repair NULL callback rows in 0094 upgrade

Legacy MySQL deployments that ran the original (pre-apache#66016) 0080
migration silently wrote NULL into deadline.callback. Alembic is already
stamped past 808787349f22 on those deployments, so the fixed 0080 won't
re-run -- 0094 then crashes on json.loads(None).

Defensive handling in 0094:
- _upgrade_mysql_sqlite: detect raw_cb is None, log warning, default to
  empty envelope.
- _upgrade_postgresql: COALESCE on cb_path / cb_kwargs so NULL jsonb
  doesn't propagate into callback.data.

Regression test starts from a post-0080 schema with callback=NULL and
verifies 0094 produces an empty envelope without crashing, plus that a
mixed NULL+valid batch migrates both rows correctly.

Addresses ephraimbuddy's review comment on apache#66016.

* fix(deadline): address review nits

- 0094 PG path: wrap kwargs COALESCE with NULLIF so JSON-literal null
  (e.g. from hand-edited DBs) is also normalized to {} -- matches the
  MySQL/SQLite defensive normalization.
- 0094 MySQL/SQLite: aggregate per-row NULL-callback WARNING into a
  single summary line after the loop to avoid log spam on deployments
  with many legacy NULL rows.
- 0080 downgrade: log a WARNING when an existing row's kwargs is not a
  dict (instead of silently resetting), mirroring the VARCHAR truncate
  warning pattern.
- 0080 tests: add test_upgrade_exact_batch_boundary covering the case
  where rows == batch_size to exercise the loop-continuation path.

* test(deadline): insert deadline ids in hex form to avoid SA Uuid type mismatch

`_upgrade_mysql_sqlite` declares the deadline `id` column as `sa.Uuid()`.
On SQLite the SQLAlchemy write path serializes UUID objects as a hex string
without dashes. The previous test inserted ids via `str(uuid.uuid4())`
(dashed form), so the SELECT correctly parsed them back to UUID objects
but the subsequent UPDATE's WHERE clause produced the hex form -- no rows
matched, the WHERE callback_id IS NULL filter never narrowed, and the
loop spun until the 60s execution timeout.

Insert ids via `uuid.uuid4().hex` so read and write round-trip cleanly.

* fix(deadline): drop redundant empty-rows break in 0080 batch loops

* fix(deadline): restore empty-rows break in 0080 batch loops

Without this guard, an empty deadline table causes executemany with an
empty parameter list, which SQLAlchemy rejects with 'A value is required
for bind parameter'. Reverts the removal from e50f106.

---------
(cherry picked from commit c8a6c55)

Co-authored-by: Rahul Vats <43964496+vatsrahul1001@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
vatsrahul1001 added a commit that referenced this pull request May 19, 2026
…rade and downgrade (#66016) (#67129)

[v3-2-test] fix: migrate existing deadline rows in migration 0080 upgrade and downgrade (#66016) (#67129)
---------
(cherry picked from commit c8a6c55)

Co-authored-by: Rahul Vats <43964496+vatsrahul1001@users.noreply.github.com>
vatsrahul1001 added a commit that referenced this pull request May 20, 2026
…rade and downgrade (#66016) (#67129)

[v3-2-test] fix: migrate existing deadline rows in migration 0080 upgrade and downgrade (#66016) (#67129)
---------
(cherry picked from commit c8a6c55)

Co-authored-by: Rahul Vats <43964496+vatsrahul1001@users.noreply.github.com>
vatsrahul1001 added a commit that referenced this pull request May 20, 2026
…rade and downgrade (#66016) (#67129)

[v3-2-test] fix: migrate existing deadline rows in migration 0080 upgrade and downgrade (#66016) (#67129)
---------
(cherry picked from commit c8a6c55)

Co-authored-by: Rahul Vats <43964496+vatsrahul1001@users.noreply.github.com>
vatsrahul1001 added a commit that referenced this pull request May 21, 2026
…rade and downgrade (#66016) (#67129)

[v3-2-test] fix: migrate existing deadline rows in migration 0080 upgrade and downgrade (#66016) (#67129)
---------
(cherry picked from commit c8a6c55)

Co-authored-by: Rahul Vats <43964496+vatsrahul1001@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:db-migrations PRs with DB migration area:deadline-alerts AIP-86 (former AIP-57) backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants