Add static analysis tests for DB migration anti-patterns#64972
Add static analysis tests for DB migration anti-patterns#64972Dev-iL wants to merge 1 commit intoapache:mainfrom
Conversation
Lee-W
left a comment
There was a problem hiding this comment.
will do another round. but these are the nits found
Prevent migration bugs like apache#64876 (DML before disable_sqlite_fkeys silently breaking SQLite PRAGMA) from reaching production by adding: 1. AST-based static analysis tests (test_migration_patterns.py) that detect three anti-patterns across all migration files: - MIG001: DML before disable_sqlite_fkeys (triggers autobegin) - MIG002: DDL before disable_sqlite_fkeys (triggers autobegin) - MIG003: DML without context.is_offline_mode() guard 2. Unit tests for migration utility functions (test_migration_utils.py) covering disable_sqlite_fkeys, mysql_drop_foreignkey_if_exists, and ignore_sqlite_value_error with real database backends. 3. Fixes confirmed SQLite PRAGMA bugs in migrations 0100 and 0106 by moving all operations inside the disable_sqlite_fkeys block. 4. Adds # noqa: MIG0XX suppression comments (with reasons) to 20 old migrations with acknowledged violations, and configures ruff to preserve them via external = ["MIG"].
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds automated enforcement against common Alembic migration anti-patterns (SQLite FK PRAGMA/autobegin hazards and offline-mode DML) via a new prek hook + pytest harness, plus targeted regression fixes in a few migrations.
Changes:
- Introduce
check_migration_patterns.py(MIG001/MIG002/MIG003) and wire it into pre-commit/prek with accompanying unit tests. - Add DB-backed unit tests for key migration utility helpers (
disable_sqlite_fkeys,mysql_drop_foreignkey_if_exists,ignore_sqlite_value_error). - Fix SQLite FK-guard ordering in migrations 0100/0106 and add
# noqa: MIG003suppressions to acknowledged online-only DML across older migrations.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/tests/ci/prek/test_check_migration_patterns.py | Adds unit tests for the prek migration-pattern checker script. |
| scripts/ci/prek/check_new_airflow_exception_usage.py | Adjusts Rich console configuration used by an existing prek check. |
| scripts/ci/prek/check_migration_patterns.py | New AST-based checker implementing MIG001/MIG002/MIG003 with CLI. |
| pyproject.toml | Configures Ruff to preserve custom # noqa: MIG... suppressions. |
| airflow-core/tests/unit/migrations/test_migration_utils.py | Adds DB-backed tests for commonly used migration utilities. |
| airflow-core/tests/unit/migrations/test_migration_patterns.py | Adds pytest harness that runs MIG001/2/3 across all migration files. |
| airflow-core/src/airflow/migrations/versions/0108_3_2_0_fix_migration_file_ORM_inconsistencies.py | Adds # noqa: MIG003 suppressions for intentional DML backfills. |
| airflow-core/src/airflow/migrations/versions/0106_3_2_0_add_partition_key_to_backfill_dag_run.py | Moves op.add_column under disable_sqlite_fkeys; adds MIG003 suppression in downgrade DML. |
| airflow-core/src/airflow/migrations/versions/0100_3_2_0_add_timetable_type_to_dag_table_for_.py | Moves DDL/DML inside disable_sqlite_fkeys; adds MIG003 suppression for backfill. |
| airflow-core/src/airflow/migrations/versions/0097_3_2_0_enforce_log_event_and_dag_is_stale_not_null.py | Adds MIG003 suppressions for intentional backfill DML. |
| airflow-core/src/airflow/migrations/versions/0093_3_2_0_restructure_callback_table.py | Adds MIG003 suppressions for truncation DML. |
| airflow-core/src/airflow/migrations/versions/0082_3_1_0_make_bundle_name_not_nullable.py | Adds MIG003 suppressions for seed/backfill DML. |
| airflow-core/src/airflow/migrations/versions/0067_3_0_0_rename_is_active_to_is_stale_column_in_.py | Adds MIG003 suppressions for value inversion DML. |
| airflow-core/src/airflow/migrations/versions/0066_3_0_0_rename_dataset_triggered_to_asset_triggered.py | Adds MIG003 suppressions for rename-related DML. |
| airflow-core/src/airflow/migrations/versions/0064_3_0_0_support_bundles_in_.py | Adds MIG003 suppressions for truncation DML. |
| airflow-core/src/airflow/migrations/versions/0063_3_0_0_use_ti_id_as_fk_to_taskreschedule.py | Adds MIG003 suppressions and rewraps multi-line SQL into op.execute(...). |
| airflow-core/src/airflow/migrations/versions/0061_3_0_0_use_ti_id_as_primary_key_to_TINote.py | Adds MIG003 suppressions and rewraps multi-line SQL into op.execute(...). |
| airflow-core/src/airflow/migrations/versions/0060_3_0_0_add_try_id_to_ti_and_tih.py | Adds MIG003 suppressions and rewraps multi-line SQL into op.execute(...). |
| airflow-core/src/airflow/migrations/versions/0058_3_0_0_add_dagrun_run_after.py | Adds MIG003 suppression for backfill DML. |
| airflow-core/src/airflow/migrations/versions/0049_3_0_0_remove_pickled_data_from_xcom_table.py | Adds MIG003 suppression for downgrade conversion DML. |
| airflow-core/src/airflow/migrations/versions/0047_3_0_0_add_dag_versioning.py | Adds MIG003 suppressions for truncation DML. |
| airflow-core/src/airflow/migrations/versions/0042_3_0_0_add_uuid_primary_key_to_task_instance_.py | Adds MIG003 suppression and rewraps multi-line SQL into op.execute(...). |
| airflow-core/src/airflow/migrations/versions/0038_3_0_0_add_asset_active.py | Adds MIG003 suppressions for data migration DML. |
| airflow-core/src/airflow/migrations/versions/0036_3_0_0_add_name_field_to_dataset_model.py | Adds MIG003 suppressions for cleanup DML. |
| airflow-core/src/airflow/migrations/versions/0035_3_0_0_update_user_id_type.py | Adds MIG003 suppressions for downgrade cleanup DML. |
| airflow-core/src/airflow/migrations/versions/0030_3_0_0_rename_schedule_interval_to_timetable_.py | Adds MIG003 suppression for downgrade cleanup DML. |
| airflow-core/src/airflow/migrations/versions/0027_2_10_3_fix_dag_schedule_dataset_alias_reference_naming.py | Adds MIG003 suppressions for data-copy DML. |
| airflow-core/src/airflow/migrations/versions/0003_2_7_0_add_include_deferred_column_to_pool.py | Adds MIG003 suppression for backfill DML. |
| .pre-commit-config.yaml | Adds new prek hook to run the migration-pattern checker on migrations. |
| def _is_op_call(node: ast.Call) -> bool: | ||
| """Check if a Call node is ``op.<something>(...)``.""" | ||
| return ( | ||
| isinstance(node.func, ast.Attribute) | ||
| and isinstance(node.func.value, ast.Name) | ||
| and node.func.value.id == "op" | ||
| ) |
There was a problem hiding this comment.
MIG002 currently treats any op.<anything>(...) as DDL (via _is_op_call), which will also match non-DDL helpers like op.get_bind(), op.get_context(), etc. This will produce false positives and force noisy suppressions/refactors. Consider restricting MIG002 to a concrete allowlist of DDL-ish Alembic operations (e.g. add_column, drop_column, create_table, drop_table, create_index, drop_index, batch_alter_table, rename_table, alter_column, create_foreign_key, ...), and explicitly excluding known non-DDL methods such as get_bind.
| def _find_disable_sqlite_fkeys_line(func_node: ast.FunctionDef) -> int | None: | ||
| """Find the line number of the first ``with disable_sqlite_fkeys(op):`` in a function.""" | ||
| # Collect all matches then take min() — ast.walk BFS order does not guarantee | ||
| # line-number order when guards appear at different nesting depths. | ||
| found: list[int] = [] | ||
| for node in ast.walk(func_node): | ||
| if not isinstance(node, ast.With): | ||
| continue | ||
| for item in node.items: | ||
| ctx = item.context_expr | ||
| if isinstance(ctx, ast.Call): | ||
| func = ctx.func | ||
| if (isinstance(func, ast.Name) and func.id == "disable_sqlite_fkeys") or ( | ||
| isinstance(func, ast.Attribute) and func.attr == "disable_sqlite_fkeys" | ||
| ): | ||
| found.append(node.lineno) | ||
| return min(found) if found else None |
There was a problem hiding this comment.
MIG001/MIG002 rely on comparing node.lineno with guard_line, which misses a key SQLite hazard: with op.batch_alter_table(...), disable_sqlite_fkeys(op): (i.e., disable_sqlite_fkeys not first in the same with statement). In that case, the first context expression is evaluated before the PRAGMA is applied, but both calls share the same lineno, so the current >= guard_line logic won’t flag it. Suggestion: track the guard location more precisely (e.g., detect ast.With nodes and ensure disable_sqlite_fkeys(...) is the first item), and report MIG002 (and/or a dedicated rule) when it is not.
| def _has_offline_mode_check(func_node: ast.FunctionDef) -> bool: | ||
| """Check if function contains ``context.is_offline_mode()`` call.""" | ||
| for node in ast.walk(func_node): | ||
| if isinstance(node, ast.Call): | ||
| func = node.func | ||
| if isinstance(func, ast.Attribute) and func.attr == "is_offline_mode": | ||
| return True | ||
| return False |
There was a problem hiding this comment.
_has_offline_mode_check currently returns true for any call to *.is_offline_mode(), not specifically context.is_offline_mode(). This can cause false negatives for MIG003 if some other object in the migration happens to have an is_offline_mode() method. Tighten the check to require func.value is ast.Name with id == 'context' (and optionally also support the known Alembic patterns used in this repo, if any).
| DML_KEYWORDS = ("UPDATE ", "INSERT ", "DELETE ") | ||
|
|
||
|
|
||
| def _get_noqa_codes(line: str) -> set[str]: | ||
| """Extract noqa codes from a source line.""" | ||
| match = re.search(r"#\s*noqa:\s*([\w,\s]+)", line) | ||
| if match: | ||
| return {code.strip() for code in match.group(1).split(",")} | ||
| return set() | ||
|
|
||
|
|
||
| def _is_dml_string(node: ast.expr) -> bool: | ||
| """Check if an AST expression is a string literal starting with a DML keyword.""" | ||
| if isinstance(node, ast.Constant) and isinstance(node.value, str): | ||
| stripped = node.value.strip().upper() | ||
| return any(stripped.startswith(kw) for kw in DML_KEYWORDS) |
There was a problem hiding this comment.
DML detection uses startswith with keywords that include a trailing space (e.g. \"DELETE \"). This will miss common multi-line SQL styles such as \"DELETE\\nFROM ...\" / \"INSERT\\nINTO ...\", leading to false negatives for MIG001/MIG003. Consider switching to a regex like r'^(UPDATE|INSERT|DELETE)\\b' against the stripped/uppercased prefix so any whitespace formatting is handled.
| # /// script | ||
| # requires-python = ">=3.10,<3.11" | ||
| # dependencies = [ | ||
| # "rich>=13.6.0", | ||
| # ] |
There was a problem hiding this comment.
The PEP 723 requires-python = \">=3.10,<3.11\" upper-bound is unusually restrictive for a repo that typically runs tooling on newer Python versions too, and it can prevent running this script via PEP-723-capable runners (e.g. uv run) on Python 3.11+. If there isn’t a hard incompatibility, widen/remove the <3.11 cap (e.g. >=3.10) or align it with the project’s supported tooling Python versions.
| - id: check-migration-patterns | ||
| name: Check migration files for anti-patterns (MIG001/MIG002/MIG003) | ||
| entry: ./scripts/ci/prek/check_migration_patterns.py | ||
| language: python | ||
| pass_filenames: true | ||
| files: ^airflow-core/src/airflow/migrations/versions/.*\.py$ | ||
| additional_dependencies: [rich>=13.6.0] | ||
| require_serial: true |
There was a problem hiding this comment.
If check_migration_patterns.py truly requires a specific Python minor version (per its PEP 723 header), this prek hook should pin language_version accordingly; otherwise developers/CI may execute it under a different interpreter than intended. Either add an explicit language_version here to match the requirement, or (preferably) relax the script’s requires-python so the hook can run under the repo’s standard tooling Python.
| from rich.panel import Panel | ||
|
|
||
| console = Console() | ||
| console = Console(color_system="standard", width=200) |
There was a problem hiding this comment.
This console configuration change is unrelated to migration-pattern static analysis (MIG001/2/3) and isn’t mentioned in the PR description. Consider either updating the PR description to include the motivation for this change, or moving it into a separate PR to keep scope focused.
Summary
disable_sqlite_fkeys,mysql_drop_foreignkey_if_exists, andignore_sqlite_value_errorwith real database backends# noqa: MIG0XXsuppression comments to 20 old migrations with acknowledged violationsrelated: #64876
Motivation
PR #64876 fixed a bug in migration 0097 where
UPDATEstatements before thedisable_sqlite_fkeyscontext manager triggered SQLAlchemy autobegin, makingPRAGMA foreign_keys=offa no-op on SQLite. The same pattern existed in migrations 0100 and 0106 but had no automated detection.This PR adds two layers of protection:
Static analysis tests (
test_migration_patterns.py) — AST-based parametrized tests that scan every migration file for known anti-patterns:UPDATE/INSERT/DELETE) viaop.execute()beforedisable_sqlite_fkeys— triggers autobegin, making PRAGMA a no-opop.add_column,op.batch_alter_table, etc.) beforedisable_sqlite_fkeys— same autobegin issueop.execute()withoutcontext.is_offline_mode()guard — produces data-dependent SQL in offline modeUtils unit tests (
test_migration_utils.py) — tests for the 3 most-used migration utility functions using real database connections viapytest.mark.db_test.Changes
New files
airflow-core/tests/unit/migrations/test_migration_patterns.py— 336 parametrized tests (3 rules x 112 migration files), with# noqa: MIG0XXsuppression support and ruff-style rule documentation in the module docstringairflow-core/tests/unit/migrations/test_migration_utils.py— 8 tests fordisable_sqlite_fkeys(PRAGMA on/off on SQLite, no-op on others),mysql_drop_foreignkey_if_exists(MySQL-only, skipped elsewhere), andignore_sqlite_value_error(ValueError suppression on SQLite)Bug fixes
add_timetable_type): movedbatch_alter_table(add_column) andop.execute(UPDATE)inside thedisable_sqlite_fkeysblock — same pattern as the fix in fix(migrations): move UPDATEs inside disable_sqlite_fkeys in migration 0097 #64876add_partition_key): movedop.add_column("dag_run", ...)inside thedisable_sqlite_fkeysblockSuppressions
# noqa: MIG003 -- <reason>to 62 DML lines across 22 migration files where the DML is intentional and the migration is designed to run online onlyexternal = ["MIG"]to[tool.ruff.lint]inpyproject.tomlto preventRUF100from stripping the custom noqa commentsWas generative AI tooling used to co-author this PR?
Generated-by: Claude Opus 4.6 following the guidelines
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.