Speed up migrations 0094 and 0108 by removing ORM imports and batching large updates#63867
Closed
ephraimbuddy wants to merge 1 commit intoapache:mainfrom
Closed
Speed up migrations 0094 and 0108 by removing ORM imports and batching large updates#63867ephraimbuddy wants to merge 1 commit intoapache:mainfrom
ephraimbuddy wants to merge 1 commit intoapache:mainfrom
Conversation
…g large updates - Replace ORM model imports (CallbackState, CallbackType, etc.) with inline constants so migration loading doesn't pull in the full Airflow runtime - Inline Task SDK serde deserialization in 0094 to eliminate the airflow.sdk.serde import - Use keyset pagination for batch queries in 0094 instead of re-scanning unmigrated rows - Batch task_instance NULL backfill in 0108 (10k rows at a time) to reduce lock duration on large tables - Consolidate per-column UPDATE statements into single per-table UPDATEs to reduce database round-trips
Contributor
There was a problem hiding this comment.
Pull request overview
This PR optimizes two Alembic migrations (0094 and 0108) to reduce import overhead and improve performance on large metadata DBs by avoiding ORM/runtime imports and batching high-volume updates.
Changes:
- Refactors multiple per-column UPDATEs into consolidated per-table UPDATE statements (0108).
- Adds batched backfill for
task_instanceNULL columns to reduce long-running locks (0108). - Removes
airflow.sdk.serde/ORM dependencies by inlining constants + a minimal deserializer and switching to keyset pagination for batch scanning (0094).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| airflow-core/src/airflow/migrations/versions/0108_3_2_0_fix_migration_file_ORM_inconsistencies.py | Consolidates raw SQL updates and introduces batched task_instance backfill to reduce lock duration/import cost. |
| airflow-core/src/airflow/migrations/versions/0094_3_2_0_replace_deadline_inline_callback_with_fkey.py | Removes SDK serde import by inlining a minimal deserializer and uses keyset pagination to avoid rescanning rows. |
Comment on lines
+194
to
+197
| conn.execute( | ||
| task_instance_table.update() | ||
| .where(task_instance_table.c.id.in_(batch_ids)) | ||
| .values( |
|
|
||
| def _deserialize_task_sdk_value(value): | ||
| """Deserialize a minimal subset of Task SDK serde values used in callback kwargs.""" | ||
| if value is None or isinstance(value, bool | float | int | str): |
Contributor
Author
There was a problem hiding this comment.
Airflow requires Python >= 3.10, and isinstance(value, bool | float | int | str) works fine from 3.10 onward.
Comment on lines
+131
to
+141
| if isinstance(value, int): | ||
| return timezone(timedelta(seconds=value)) | ||
|
|
||
| if isinstance(value, str): | ||
| return ZoneInfo(value) | ||
|
|
||
| if isinstance(value, list) and len(value) == 3: | ||
| data, classname, _version = value | ||
| if classname in _SERDE_TIMEZONE_TYPES: | ||
| return _deserialize_task_sdk_timezone(data) | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Was generative AI tooling used to co-author this PR?
GPT-5.4
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.