Skip to content

fix: add SerializedDagModel.get_count() that raises on None instead of silently returning 0#67572

Open
gingeekrishna wants to merge 13 commits into
apache:mainfrom
gingeekrishna:fix-66796-serialized-dag-model-get-count
Open

fix: add SerializedDagModel.get_count() that raises on None instead of silently returning 0#67572
gingeekrishna wants to merge 13 commits into
apache:mainfrom
gingeekrishna:fix-66796-serialized-dag-model-get-count

Conversation

@gingeekrishna
Copy link
Copy Markdown

@gingeekrishna gingeekrishna commented May 26, 2026

Summary

Fixes #66796

SELECT COUNT() on an aggregate column always returns an integer — None from session.scalar() is only possible on a transient DB connectivity failure, not when the table is empty (which returns 0). Using or 0 silently swallows that failure and emits a zero count, which can trigger false on-call pages ("all DAGs disappeared!") while hiding the real problem.

  • Add SerializedDagModel.get_count() classmethod that raises RuntimeError when session.scalar() returns None, instead of masking it with or 0
  • Emit a serialized_dag.count gauge in dag_processing/manager.py's emit_metrics() using the new method, wrapped in try/except so a DB error logs a warning but never kills the parse loop

Changes

  • airflow-core/src/airflow/models/serialized_dag.py — add get_count() classmethod with None-guard
  • airflow-core/src/airflow/dag_processing/manager.py — import SerializedDagModel, add module logger, emit serialized_dag.count metric with error isolation

Test plan

  • tests/unit/models/test_serialized_dag.py::TestSerializedDagModel::test_get_count_returns_zero_on_empty_table
  • tests/unit/models/test_serialized_dag.py::TestSerializedDagModel::test_get_count_returns_correct_value
  • tests/unit/models/test_serialized_dag.py::TestSerializedDagModel::test_get_count_raises_on_none_result
  • tests/unit/dag_processing/test_manager.py::TestEmitMetrics::test_emit_metrics_emits_serialized_dag_count
  • tests/unit/dag_processing/test_manager.py::TestEmitMetrics::test_emit_metrics_does_not_raise_on_db_error

…f silently returning 0

COUNT queries on aggregate columns always return an integer — a None result from
session.scalar() signals a transient DB connectivity failure, not an empty table.
Replacing `or 0` with an explicit RuntimeError ensures the failure surfaces loudly
rather than being silently emitted as a zero count that triggers false on-call pages.

* Add SerializedDagModel.get_count() classmethod with None-guard
* Emit serialized_dag.count metric in dag_processing/manager.py emit_metrics(),
  wrapped in try/except so a DB error logs a warning but never kills the parse loop
* Add unit tests for get_count() and emit_metrics() serialized_dag.count path

Fixes: apache#66796
Copilot AI review requested due to automatic review settings May 26, 2026 19:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds visibility into how many serialized DAGs exist in the metadata DB by introducing a SerializedDagModel.get_count() helper and emitting a new serialized_dag.count gauge from the DAG processing manager, along with unit tests.

Changes:

  • Add SerializedDagModel.get_count() to retrieve the total row count from serialized_dag.
  • Emit a new StatsD gauge serialized_dag.count from emit_metrics(), swallowing failures.
  • Add unit tests for get_count() and emit_metrics() behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
airflow-core/src/airflow/models/serialized_dag.py Adds get_count() classmethod with defensive None handling.
airflow-core/src/airflow/dag_processing/manager.py Emits serialized_dag.count metric by querying the DB; logs on failure.
airflow-core/tests/unit/models/test_serialized_dag.py Adds tests validating get_count() for empty/non-empty/None scalar results.
airflow-core/tests/unit/dag_processing/test_manager.py Adds tests ensuring emit_metrics() emits the new gauge and swallows DB errors.

Comment thread airflow-core/src/airflow/dag_processing/manager.py Outdated
Comment thread airflow-core/tests/unit/models/test_serialized_dag.py
Comment thread airflow-core/tests/unit/dag_processing/test_manager.py Outdated
Comment thread airflow-core/src/airflow/models/serialized_dag.py Outdated
- Narrow except clause from Exception to (RuntimeError, SQLAlchemyError)
  so unexpected programming errors still propagate
- Replace Unicode em dash with ASCII hyphen in RuntimeError message
- Fix test_emit_metrics_emits_serialized_dag_count: mock get_count instead
  of relying on dag_maker rows being visible across a new session boundary
- Update pytest.raises match string to reflect the corrected error message
@gingeekrishna gingeekrishna requested a review from Copilot May 26, 2026 19:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment thread airflow-core/src/airflow/models/serialized_dag.py Outdated
Comment thread airflow-core/src/airflow/dag_processing/manager.py Outdated
Comment thread airflow-core/tests/unit/dag_processing/test_manager.py Outdated
- Switch get_count() from scalar()+None-check to scalar_one(): COUNT(*) always
  returns exactly one row, so scalar_one() is the correct API and lets
  SQLAlchemyError surface naturally on DB failure instead of returning None
- Narrow except clause to SQLAlchemyError only (RuntimeError no longer raised)
- Add performance comment on the COUNT(*) round-trip in emit_metrics()
- Fix test_emit_metrics_logs_and_swallows_db_error: assert log.exception is
  called with the expected message, and use SQLAlchemyError as the side-effect
- Rename test to reflect it now validates logging, not just absence of raise
- Add session.flush() in test_get_count_returns_correct_value to make the
  dag_maker-written rows explicitly visible before asserting the count
- Replace test_get_count_raises_on_none_result with test_get_count_propagates_db_error
  that mocks session.execute to raise SQLAlchemyError
@gingeekrishna gingeekrishna requested a review from Copilot May 26, 2026 19:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comment thread airflow-core/src/airflow/dag_processing/manager.py Outdated
Comment thread airflow-core/src/airflow/dag_processing/manager.py Outdated
Comment thread airflow-core/tests/unit/models/test_serialized_dag.py Outdated
Comment thread airflow-core/tests/unit/dag_processing/test_manager.py Outdated
- Rephrase COUNT(*) comment: remove the inaccurate "index scan on the PK"
  claim; note the query plan is DB-dependent and can be expensive, with
  throttling noted as a straightforward future follow-up
- Use baseline+2 instead of hard-coded == 2 in test_get_count_returns_correct_value
  so the assertion holds even if other fixtures leave rows in the table
- Replace dict-fold assertion with assert_any_call("serialized_dag.count", 3)
  to avoid masking duplicate emissions
@gingeekrishna gingeekrishna requested a review from Copilot May 26, 2026 20:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment thread airflow-core/src/airflow/models/serialized_dag.py Outdated
Comment thread airflow-core/src/airflow/dag_processing/manager.py Outdated
Comment thread airflow-core/tests/unit/models/test_serialized_dag.py
gingeekrishna and others added 2 commits May 27, 2026 01:37
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
- Fix get_count() docstring: scalar_one() is justified by "exactly-one-row
  semantics", not by connectivity-error behavior (both scalar() and
  scalar_one() raise on connectivity failures)
- Narrow except clause from SQLAlchemyError to OperationalError: the intent
  is to swallow transient connectivity failures only; schema/programming
  errors from SQLAlchemy should still propagate
- Update tests to use OperationalError consistently with the new except scope
- Add docstring note to test_get_count_returns_zero_on_empty_table explaining
  it relies on the autouse setup_test_cases fixture for the empty-table state
@gingeekrishna gingeekrishna requested a review from Copilot May 26, 2026 20:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread airflow-core/src/airflow/dag_processing/manager.py Outdated
Comment thread airflow-core/src/airflow/dag_processing/manager.py Outdated
…b clear in test

- Change `except OperationalError` to `except SQLAlchemyError` in emit_metrics()
  so all DB failures (disconnects, timeouts, etc.) are swallowed, not just OperationalError
- Update test_emit_metrics_logs_and_swallows_db_error to use SQLAlchemyError side-effect
  to match the exception actually caught by the production code
- Add explicit db.clear_db_serialized_dags() in test_get_count_returns_zero_on_empty_table
  to make the empty-table precondition self-contained and not rely solely on autouse fixture
@gingeekrishna gingeekrishna requested a review from Copilot May 26, 2026 20:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment on lines +1574 to +1583
# COUNT(*) on the serialized_dag table adds one DB round-trip per parse loop.
# This can be expensive on large deployments (query plan is DB-dependent and
# may involve a full table scan). The call is isolated so that a transient
# DB error never kills the parse loop; throttling this metric in the future
# is a straightforward follow-up if the round-trip becomes a bottleneck.
try:
with create_session() as session:
stats.gauge("serialized_dag.count", SerializedDagModel.get_count(session=session))
except SQLAlchemyError:
log.exception("Failed to emit serialized_dag.count metric")
Comment thread airflow-core/src/airflow/dag_processing/manager.py Outdated
…omment

- Add stats.incr("serialized_dag.count_error") in the except block so dashboards
  receive an explicit failure signal instead of silently showing a stale gauge value
- Tighten the inline comment to mention the error-counter rationale
- Update test to assert the error counter is incremented on SQLAlchemyError
@gingeekrishna gingeekrishna requested a review from Copilot May 26, 2026 20:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comment thread airflow-core/src/airflow/dag_processing/manager.py Outdated
Comment thread airflow-core/tests/unit/dag_processing/test_manager.py Outdated
Comment thread airflow-core/tests/unit/dag_processing/test_manager.py Outdated
Comment thread airflow-core/tests/unit/models/test_serialized_dag.py Outdated
…ationalError orig

- Wrap the COUNT(*) query in a conf.getboolean guard
  (scheduler.emit_serialized_dag_count_metric, default True) so large deployments
  can opt out of the extra DB round-trip without code changes
- Patch create_session in both TestEmitMetrics tests so they are pure unit tests
  with no real DB session creation
- Pass Exception("db failure") as the orig arg to OperationalError in
  test_get_count_propagates_db_error to match real-world error construction
@gingeekrishna gingeekrishna requested a review from Copilot May 26, 2026 20:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment thread airflow-core/tests/unit/dag_processing/test_manager.py Outdated
Comment thread airflow-core/src/airflow/dag_processing/manager.py
Comment thread airflow-core/tests/unit/models/test_serialized_dag.py Outdated
…ix OperationalError params

- Mock conf.getboolean explicitly to True in existing emit_metrics tests so they
  are deterministic regardless of the runtime Airflow config
- Add test_emit_metrics_skips_serialized_dag_count_when_disabled to cover the
  config=False branch: asserts get_count is not called and no gauge is emitted
- Pass {} (not None) as the params arg to OperationalError to match real SQLAlchemy
  error construction and avoid brittle stringification behaviour
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SerializedDagModel.get_count() should raise on None scalar result instead of silently returning 0

2 participants