fix: add SerializedDagModel.get_count() that raises on None instead of silently returning 0 by gingeekrishna · Pull Request #67572 · apache/airflow

gingeekrishna · 2026-05-26T19:27:11Z

Summary

SELECT COUNT() on an aggregate column always returns an integer — None from session.scalar() is only possible on a transient DB connectivity failure, not when the table is empty (which returns 0). Using or 0 silently swallows that failure and emits a zero count, which can trigger false on-call pages ("all DAGs disappeared!") while hiding the real problem.

Add SerializedDagModel.get_count() classmethod that raises RuntimeError when session.scalar() returns None, instead of masking it with or 0
Emit a serialized_dag.count gauge in dag_processing/manager.py's emit_metrics() using the new method, wrapped in try/except so a DB error logs a warning but never kills the parse loop

Changes

airflow-core/src/airflow/models/serialized_dag.py — add get_count() classmethod with None-guard
airflow-core/src/airflow/dag_processing/manager.py — import SerializedDagModel, add module logger, emit serialized_dag.count metric with error isolation

Test plan

tests/unit/models/test_serialized_dag.py::TestSerializedDagModel::test_get_count_returns_zero_on_empty_table
tests/unit/models/test_serialized_dag.py::TestSerializedDagModel::test_get_count_returns_correct_value
tests/unit/models/test_serialized_dag.py::TestSerializedDagModel::test_get_count_raises_on_none_result
tests/unit/dag_processing/test_manager.py::TestEmitMetrics::test_emit_metrics_emits_serialized_dag_count
tests/unit/dag_processing/test_manager.py::TestEmitMetrics::test_emit_metrics_does_not_raise_on_db_error

…f silently returning 0 COUNT queries on aggregate columns always return an integer — a None result from session.scalar() signals a transient DB connectivity failure, not an empty table. Replacing `or 0` with an explicit RuntimeError ensures the failure surfaces loudly rather than being silently emitted as a zero count that triggers false on-call pages. * Add SerializedDagModel.get_count() classmethod with None-guard * Emit serialized_dag.count metric in dag_processing/manager.py emit_metrics(), wrapped in try/except so a DB error logs a warning but never kills the parse loop * Add unit tests for get_count() and emit_metrics() serialized_dag.count path Fixes: apache#66796

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds visibility into how many serialized DAGs exist in the metadata DB by introducing a SerializedDagModel.get_count() helper and emitting a new serialized_dag.count gauge from the DAG processing manager, along with unit tests.

Changes:

Add SerializedDagModel.get_count() to retrieve the total row count from serialized_dag.
Emit a new StatsD gauge serialized_dag.count from emit_metrics(), swallowing failures.
Add unit tests for get_count() and emit_metrics() behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
airflow-core/src/airflow/models/serialized_dag.py	Adds `get_count()` classmethod with defensive `None` handling.
airflow-core/src/airflow/dag_processing/manager.py	Emits `serialized_dag.count` metric by querying the DB; logs on failure.
airflow-core/tests/unit/models/test_serialized_dag.py	Adds tests validating `get_count()` for empty/non-empty/`None` scalar results.
airflow-core/tests/unit/dag_processing/test_manager.py	Adds tests ensuring `emit_metrics()` emits the new gauge and swallows DB errors.

- Narrow except clause from Exception to (RuntimeError, SQLAlchemyError) so unexpected programming errors still propagate - Replace Unicode em dash with ASCII hyphen in RuntimeError message - Fix test_emit_metrics_emits_serialized_dag_count: mock get_count instead of relying on dag_maker rows being visible across a new session boundary - Update pytest.raises match string to reflect the corrected error message

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

- Switch get_count() from scalar()+None-check to scalar_one(): COUNT(*) always returns exactly one row, so scalar_one() is the correct API and lets SQLAlchemyError surface naturally on DB failure instead of returning None - Narrow except clause to SQLAlchemyError only (RuntimeError no longer raised) - Add performance comment on the COUNT(*) round-trip in emit_metrics() - Fix test_emit_metrics_logs_and_swallows_db_error: assert log.exception is called with the expected message, and use SQLAlchemyError as the side-effect - Rename test to reflect it now validates logging, not just absence of raise - Add session.flush() in test_get_count_returns_correct_value to make the dag_maker-written rows explicitly visible before asserting the count - Replace test_get_count_raises_on_none_result with test_get_count_propagates_db_error that mocks session.execute to raise SQLAlchemyError

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

- Rephrase COUNT(*) comment: remove the inaccurate "index scan on the PK" claim; note the query plan is DB-dependent and can be expensive, with throttling noted as a straightforward future follow-up - Use baseline+2 instead of hard-coded == 2 in test_get_count_returns_correct_value so the assertion holds even if other fixtures leave rows in the table - Replace dict-fold assertion with assert_any_call("serialized_dag.count", 3) to avoid masking duplicate emissions

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

- Fix get_count() docstring: scalar_one() is justified by "exactly-one-row semantics", not by connectivity-error behavior (both scalar() and scalar_one() raise on connectivity failures) - Narrow except clause from SQLAlchemyError to OperationalError: the intent is to swallow transient connectivity failures only; schema/programming errors from SQLAlchemy should still propagate - Update tests to use OperationalError consistently with the new except scope - Add docstring note to test_get_count_returns_zero_on_empty_table explaining it relies on the autouse setup_test_cases fixture for the empty-table state

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

…b clear in test - Change `except OperationalError` to `except SQLAlchemyError` in emit_metrics() so all DB failures (disconnects, timeouts, etc.) are swallowed, not just OperationalError - Update test_emit_metrics_logs_and_swallows_db_error to use SQLAlchemyError side-effect to match the exception actually caught by the production code - Add explicit db.clear_db_serialized_dags() in test_get_count_returns_zero_on_empty_table to make the empty-table precondition self-contained and not rely solely on autouse fixture

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

+    # COUNT(*) on the serialized_dag table adds one DB round-trip per parse loop.
+    # This can be expensive on large deployments (query plan is DB-dependent and
+    # may involve a full table scan).  The call is isolated so that a transient
+    # DB error never kills the parse loop; throttling this metric in the future
+    # is a straightforward follow-up if the round-trip becomes a bottleneck.
+    try:
+        with create_session() as session:
+            stats.gauge("serialized_dag.count", SerializedDagModel.get_count(session=session))
+    except SQLAlchemyError:
+        log.exception("Failed to emit serialized_dag.count metric")


…omment - Add stats.incr("serialized_dag.count_error") in the except block so dashboards receive an explicit failure signal instead of silently showing a stale gauge value - Tighten the inline comment to mention the error-counter rationale - Update test to assert the error counter is incremented on SQLAlchemyError

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

…ationalError orig - Wrap the COUNT(*) query in a conf.getboolean guard (scheduler.emit_serialized_dag_count_metric, default True) so large deployments can opt out of the extra DB round-trip without code changes - Patch create_session in both TestEmitMetrics tests so they are pure unit tests with no real DB session creation - Pass Exception("db failure") as the orig arg to OperationalError in test_get_count_propagates_db_error to match real-world error construction

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

…ix OperationalError params - Mock conf.getboolean explicitly to True in existing emit_metrics tests so they are deterministic regardless of the runtime Airflow config - Add test_emit_metrics_skips_serialized_dag_count_when_disabled to cover the config=False branch: asserts get_count is not called and no gauge is emitted - Pass {} (not None) as the params arg to OperationalError to match real SQLAlchemy error construction and avoid brittle stringification behaviour

Copilot AI review requested due to automatic review settings May 26, 2026 19:27

gingeekrishna requested review from XD-DENG, ashb, ephraimbuddy and jedcunningham as code owners May 26, 2026 19:27

boring-cyborg Bot added the area:DAG-processing label May 26, 2026