Skip to content

Add static analysis tests for DB migration anti-patterns#64972

Draft
Dev-iL wants to merge 1 commit intoapache:mainfrom
Dev-iL:2604/MIG_rules
Draft

Add static analysis tests for DB migration anti-patterns#64972
Dev-iL wants to merge 1 commit intoapache:mainfrom
Dev-iL:2604/MIG_rules

Conversation

@Dev-iL
Copy link
Copy Markdown
Collaborator

@Dev-iL Dev-iL commented Apr 9, 2026

Summary

  • Add AST-based static analysis tests that detect 3 migration anti-patterns (MIG001/MIG002/MIG003) across all migration files
  • Add unit tests for disable_sqlite_fkeys, mysql_drop_foreignkey_if_exists, and ignore_sqlite_value_error with real database backends
  • Fix confirmed SQLite PRAGMA bugs in migrations 0100 and 0106
  • Add # noqa: MIG0XX suppression comments to 20 old migrations with acknowledged violations

related: #64876

Motivation

PR #64876 fixed a bug in migration 0097 where UPDATE statements before the disable_sqlite_fkeys context manager triggered SQLAlchemy autobegin, making PRAGMA foreign_keys=off a no-op on SQLite. The same pattern existed in migrations 0100 and 0106 but had no automated detection.

This PR adds two layers of protection:

  1. Static analysis tests (test_migration_patterns.py) — AST-based parametrized tests that scan every migration file for known anti-patterns:

    • MIG001: DML (UPDATE/INSERT/DELETE) via op.execute() before disable_sqlite_fkeys — triggers autobegin, making PRAGMA a no-op
    • MIG002: DDL (op.add_column, op.batch_alter_table, etc.) before disable_sqlite_fkeys — same autobegin issue
    • MIG003: DML via op.execute() without context.is_offline_mode() guard — produces data-dependent SQL in offline mode
  2. Utils unit tests (test_migration_utils.py) — tests for the 3 most-used migration utility functions using real database connections via pytest.mark.db_test.

Changes

New files

  • airflow-core/tests/unit/migrations/test_migration_patterns.py — 336 parametrized tests (3 rules x 112 migration files), with # noqa: MIG0XX suppression support and ruff-style rule documentation in the module docstring
  • airflow-core/tests/unit/migrations/test_migration_utils.py — 8 tests for disable_sqlite_fkeys (PRAGMA on/off on SQLite, no-op on others), mysql_drop_foreignkey_if_exists (MySQL-only, skipped elsewhere), and ignore_sqlite_value_error (ValueError suppression on SQLite)

Bug fixes

Suppressions

  • Added # noqa: MIG003 -- <reason> to 62 DML lines across 22 migration files where the DML is intentional and the migration is designed to run online only
  • Added external = ["MIG"] to [tool.ruff.lint] in pyproject.toml to prevent RUF100 from stripping the custom noqa comments

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: Claude Opus 4.6 following the guidelines


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@boring-cyborg boring-cyborg bot added area:db-migrations PRs with DB migration area:dev-tools backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch labels Apr 9, 2026
@Dev-iL Dev-iL changed the title Add static analysis tests for migration anti-patterns Add static analysis tests for DB migration anti-patterns Apr 9, 2026
Copy link
Copy Markdown
Member

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do another round. but these are the nits found

Prevent migration bugs like apache#64876 (DML before disable_sqlite_fkeys
silently breaking SQLite PRAGMA) from reaching production by adding:

1. AST-based static analysis tests (test_migration_patterns.py) that
   detect three anti-patterns across all migration files:
   - MIG001: DML before disable_sqlite_fkeys (triggers autobegin)
   - MIG002: DDL before disable_sqlite_fkeys (triggers autobegin)
   - MIG003: DML without context.is_offline_mode() guard

2. Unit tests for migration utility functions (test_migration_utils.py)
   covering disable_sqlite_fkeys, mysql_drop_foreignkey_if_exists, and
   ignore_sqlite_value_error with real database backends.

3. Fixes confirmed SQLite PRAGMA bugs in migrations 0100 and 0106 by
   moving all operations inside the disable_sqlite_fkeys block.

4. Adds # noqa: MIG0XX suppression comments (with reasons) to 20 old
   migrations with acknowledged violations, and configures ruff to
   preserve them via external = ["MIG"].
@kaxil kaxil requested a review from Copilot April 10, 2026 19:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds automated enforcement against common Alembic migration anti-patterns (SQLite FK PRAGMA/autobegin hazards and offline-mode DML) via a new prek hook + pytest harness, plus targeted regression fixes in a few migrations.

Changes:

  • Introduce check_migration_patterns.py (MIG001/MIG002/MIG003) and wire it into pre-commit/prek with accompanying unit tests.
  • Add DB-backed unit tests for key migration utility helpers (disable_sqlite_fkeys, mysql_drop_foreignkey_if_exists, ignore_sqlite_value_error).
  • Fix SQLite FK-guard ordering in migrations 0100/0106 and add # noqa: MIG003 suppressions to acknowledged online-only DML across older migrations.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
scripts/tests/ci/prek/test_check_migration_patterns.py Adds unit tests for the prek migration-pattern checker script.
scripts/ci/prek/check_new_airflow_exception_usage.py Adjusts Rich console configuration used by an existing prek check.
scripts/ci/prek/check_migration_patterns.py New AST-based checker implementing MIG001/MIG002/MIG003 with CLI.
pyproject.toml Configures Ruff to preserve custom # noqa: MIG... suppressions.
airflow-core/tests/unit/migrations/test_migration_utils.py Adds DB-backed tests for commonly used migration utilities.
airflow-core/tests/unit/migrations/test_migration_patterns.py Adds pytest harness that runs MIG001/2/3 across all migration files.
airflow-core/src/airflow/migrations/versions/0108_3_2_0_fix_migration_file_ORM_inconsistencies.py Adds # noqa: MIG003 suppressions for intentional DML backfills.
airflow-core/src/airflow/migrations/versions/0106_3_2_0_add_partition_key_to_backfill_dag_run.py Moves op.add_column under disable_sqlite_fkeys; adds MIG003 suppression in downgrade DML.
airflow-core/src/airflow/migrations/versions/0100_3_2_0_add_timetable_type_to_dag_table_for_.py Moves DDL/DML inside disable_sqlite_fkeys; adds MIG003 suppression for backfill.
airflow-core/src/airflow/migrations/versions/0097_3_2_0_enforce_log_event_and_dag_is_stale_not_null.py Adds MIG003 suppressions for intentional backfill DML.
airflow-core/src/airflow/migrations/versions/0093_3_2_0_restructure_callback_table.py Adds MIG003 suppressions for truncation DML.
airflow-core/src/airflow/migrations/versions/0082_3_1_0_make_bundle_name_not_nullable.py Adds MIG003 suppressions for seed/backfill DML.
airflow-core/src/airflow/migrations/versions/0067_3_0_0_rename_is_active_to_is_stale_column_in_.py Adds MIG003 suppressions for value inversion DML.
airflow-core/src/airflow/migrations/versions/0066_3_0_0_rename_dataset_triggered_to_asset_triggered.py Adds MIG003 suppressions for rename-related DML.
airflow-core/src/airflow/migrations/versions/0064_3_0_0_support_bundles_in_.py Adds MIG003 suppressions for truncation DML.
airflow-core/src/airflow/migrations/versions/0063_3_0_0_use_ti_id_as_fk_to_taskreschedule.py Adds MIG003 suppressions and rewraps multi-line SQL into op.execute(...).
airflow-core/src/airflow/migrations/versions/0061_3_0_0_use_ti_id_as_primary_key_to_TINote.py Adds MIG003 suppressions and rewraps multi-line SQL into op.execute(...).
airflow-core/src/airflow/migrations/versions/0060_3_0_0_add_try_id_to_ti_and_tih.py Adds MIG003 suppressions and rewraps multi-line SQL into op.execute(...).
airflow-core/src/airflow/migrations/versions/0058_3_0_0_add_dagrun_run_after.py Adds MIG003 suppression for backfill DML.
airflow-core/src/airflow/migrations/versions/0049_3_0_0_remove_pickled_data_from_xcom_table.py Adds MIG003 suppression for downgrade conversion DML.
airflow-core/src/airflow/migrations/versions/0047_3_0_0_add_dag_versioning.py Adds MIG003 suppressions for truncation DML.
airflow-core/src/airflow/migrations/versions/0042_3_0_0_add_uuid_primary_key_to_task_instance_.py Adds MIG003 suppression and rewraps multi-line SQL into op.execute(...).
airflow-core/src/airflow/migrations/versions/0038_3_0_0_add_asset_active.py Adds MIG003 suppressions for data migration DML.
airflow-core/src/airflow/migrations/versions/0036_3_0_0_add_name_field_to_dataset_model.py Adds MIG003 suppressions for cleanup DML.
airflow-core/src/airflow/migrations/versions/0035_3_0_0_update_user_id_type.py Adds MIG003 suppressions for downgrade cleanup DML.
airflow-core/src/airflow/migrations/versions/0030_3_0_0_rename_schedule_interval_to_timetable_.py Adds MIG003 suppression for downgrade cleanup DML.
airflow-core/src/airflow/migrations/versions/0027_2_10_3_fix_dag_schedule_dataset_alias_reference_naming.py Adds MIG003 suppressions for data-copy DML.
airflow-core/src/airflow/migrations/versions/0003_2_7_0_add_include_deferred_column_to_pool.py Adds MIG003 suppression for backfill DML.
.pre-commit-config.yaml Adds new prek hook to run the migration-pattern checker on migrations.

Comment on lines +220 to +226
def _is_op_call(node: ast.Call) -> bool:
"""Check if a Call node is ``op.<something>(...)``."""
return (
isinstance(node.func, ast.Attribute)
and isinstance(node.func.value, ast.Name)
and node.func.value.id == "op"
)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MIG002 currently treats any op.<anything>(...) as DDL (via _is_op_call), which will also match non-DDL helpers like op.get_bind(), op.get_context(), etc. This will produce false positives and force noisy suppressions/refactors. Consider restricting MIG002 to a concrete allowlist of DDL-ish Alembic operations (e.g. add_column, drop_column, create_table, drop_table, create_index, drop_index, batch_alter_table, rename_table, alter_column, create_foreign_key, ...), and explicitly excluding known non-DDL methods such as get_bind.

Copilot uses AI. Check for mistakes.
Comment on lines +229 to +245
def _find_disable_sqlite_fkeys_line(func_node: ast.FunctionDef) -> int | None:
"""Find the line number of the first ``with disable_sqlite_fkeys(op):`` in a function."""
# Collect all matches then take min() — ast.walk BFS order does not guarantee
# line-number order when guards appear at different nesting depths.
found: list[int] = []
for node in ast.walk(func_node):
if not isinstance(node, ast.With):
continue
for item in node.items:
ctx = item.context_expr
if isinstance(ctx, ast.Call):
func = ctx.func
if (isinstance(func, ast.Name) and func.id == "disable_sqlite_fkeys") or (
isinstance(func, ast.Attribute) and func.attr == "disable_sqlite_fkeys"
):
found.append(node.lineno)
return min(found) if found else None
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MIG001/MIG002 rely on comparing node.lineno with guard_line, which misses a key SQLite hazard: with op.batch_alter_table(...), disable_sqlite_fkeys(op): (i.e., disable_sqlite_fkeys not first in the same with statement). In that case, the first context expression is evaluated before the PRAGMA is applied, but both calls share the same lineno, so the current >= guard_line logic won’t flag it. Suggestion: track the guard location more precisely (e.g., detect ast.With nodes and ensure disable_sqlite_fkeys(...) is the first item), and report MIG002 (and/or a dedicated rule) when it is not.

Copilot uses AI. Check for mistakes.
Comment on lines +248 to +255
def _has_offline_mode_check(func_node: ast.FunctionDef) -> bool:
"""Check if function contains ``context.is_offline_mode()`` call."""
for node in ast.walk(func_node):
if isinstance(node, ast.Call):
func = node.func
if isinstance(func, ast.Attribute) and func.attr == "is_offline_mode":
return True
return False
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_has_offline_mode_check currently returns true for any call to *.is_offline_mode(), not specifically context.is_offline_mode(). This can cause false negatives for MIG003 if some other object in the migration happens to have an is_offline_mode() method. Tighten the check to require func.value is ast.Name with id == 'context' (and optionally also support the known Alembic patterns used in this repo, if any).

Copilot uses AI. Check for mistakes.
Comment on lines +172 to +187
DML_KEYWORDS = ("UPDATE ", "INSERT ", "DELETE ")


def _get_noqa_codes(line: str) -> set[str]:
"""Extract noqa codes from a source line."""
match = re.search(r"#\s*noqa:\s*([\w,\s]+)", line)
if match:
return {code.strip() for code in match.group(1).split(",")}
return set()


def _is_dml_string(node: ast.expr) -> bool:
"""Check if an AST expression is a string literal starting with a DML keyword."""
if isinstance(node, ast.Constant) and isinstance(node.value, str):
stripped = node.value.strip().upper()
return any(stripped.startswith(kw) for kw in DML_KEYWORDS)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DML detection uses startswith with keywords that include a trailing space (e.g. \"DELETE \"). This will miss common multi-line SQL styles such as \"DELETE\\nFROM ...\" / \"INSERT\\nINTO ...\", leading to false negatives for MIG001/MIG003. Consider switching to a regex like r'^(UPDATE|INSERT|DELETE)\\b' against the stripped/uppercased prefix so any whitespace formatting is handled.

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +23
# /// script
# requires-python = ">=3.10,<3.11"
# dependencies = [
# "rich>=13.6.0",
# ]
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PEP 723 requires-python = \">=3.10,<3.11\" upper-bound is unusually restrictive for a repo that typically runs tooling on newer Python versions too, and it can prevent running this script via PEP-723-capable runners (e.g. uv run) on Python 3.11+. If there isn’t a hard incompatibility, widen/remove the <3.11 cap (e.g. >=3.10) or align it with the project’s supported tooling Python versions.

Copilot uses AI. Check for mistakes.
Comment on lines +1037 to +1044
- id: check-migration-patterns
name: Check migration files for anti-patterns (MIG001/MIG002/MIG003)
entry: ./scripts/ci/prek/check_migration_patterns.py
language: python
pass_filenames: true
files: ^airflow-core/src/airflow/migrations/versions/.*\.py$
additional_dependencies: [rich>=13.6.0]
require_serial: true
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If check_migration_patterns.py truly requires a specific Python minor version (per its PEP 723 header), this prek hook should pin language_version accordingly; otherwise developers/CI may execute it under a different interpreter than intended. Either add an explicit language_version here to match the requirement, or (preferably) relax the script’s requires-python so the hook can run under the repo’s standard tooling Python.

Copilot uses AI. Check for mistakes.
from rich.panel import Panel

console = Console()
console = Console(color_system="standard", width=200)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This console configuration change is unrelated to migration-pattern static analysis (MIG001/2/3) and isn’t mentioned in the PR description. Consider either updating the PR description to include the motivation for this change, or moving it into a separate PR to keep scope focused.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:db-migrations PRs with DB migration area:dev-tools backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants