Skip to content

fix: optimize migration 0094 upgrade to use SQL instead of Python deserialization#63628

Open
YoannAbriel wants to merge 1 commit intoapache:mainfrom
YoannAbriel:fix/issue-63532
Open

fix: optimize migration 0094 upgrade to use SQL instead of Python deserialization#63628
YoannAbriel wants to merge 1 commit intoapache:mainfrom
YoannAbriel:fix/issue-63532

Conversation

@YoannAbriel
Copy link
Contributor

Migration 0094 (replace_deadline_inline_callback_with_fkey) processes every deadline row through Python's serde.deserialize() during upgrade, instantiating Python objects for each row. At scale (10M rows), this takes ~33 minutes.

The callback JSON format prior to 3.2.0 is predictable (always AsyncCallback with a fixed serde wrapper structure), so the transformation can be done entirely in SQL.

PostgreSQL: single writable CTE with gen_random_uuid() and jsonb_build_object() — handles INSERT into callback and UPDATE of deadline in one statement. No Python loop, no batching.

MySQL/SQLite: batched approach with Python UUID generation but direct JSON dict access instead of serde.deserialize() round-trip.

Also removes runtime module imports (airflow.serialization.serde, airflow.models.callback, airflow.models.deadline) from the upgrade path, hardcoding constant values instead.

Closes: #63532


Was generative AI tooling used to co-author this PR?
  • Yes — Claude Code (Opus 4, claude-opus-4-6)

Generated-by: Claude Code (Opus 4, claude-opus-4-6) following the guidelines


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.

…erialization

Replace row-by-row Python serde deserialization with pure SQL JSON
manipulation for PostgreSQL (writable CTE with gen_random_uuid()
and jsonb operations). For MySQL/SQLite, use batched approach with
Python UUID generation but direct JSON dict access instead of
importing and invoking serde.deserialize().

This eliminates the Python object instantiation bottleneck that
caused ~33 minute migration times for 10M deadline rows.

Also removes runtime module imports (airflow.serialization.serde,
airflow.models.callback, airflow.models.deadline) from the upgrade
path, hardcoding the constant values instead. This follows the
migration best practice of avoiding ORM/runtime imports.

Closes: apache#63532
@boring-cyborg boring-cyborg bot added area:db-migrations PRs with DB migration area:deadline-alerts AIP-86 (former AIP-57) labels Mar 15, 2026
@eladkal eladkal requested a review from vatsrahul1001 March 15, 2026 12:13
@eladkal eladkal added this to the Airflow 3.2.0 milestone Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:db-migrations PRs with DB migration area:deadline-alerts AIP-86 (former AIP-57)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migration 0094 upgrade is slow at scale due to row-by-row Python deserialization of deadline callbacks

2 participants