Skip to content

Conversation

@dstandish
Copy link
Contributor

In SchedulerJobRunner, in _create_dag_runs, we do except Exception: ... continue but this does not work when the exception is the result of a db error; in that case you need to roll the transaction back.

To fix this we need either smaller transactions (e.g. one per dag run create attempt) or savepoints. either way it's a bit of a refactor.

Issue: #59120


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

In SchedulerJobRunner, in _create_dag_runs, we do except Exception: ... continue but this does not work when the exception is the result of a db error; in that case you need to roll the transaction back.

To fix this we need either smaller transactions (e.g. one per dag run create attempt) or savepoints. either way it's a bit of a refactor.

Issue: apache#59120
@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Dec 5, 2025
Arunodoy18 added a commit to Arunodoy18/airflow that referenced this pull request Dec 6, 2025
Add savepoint-based transaction management in SchedulerJobRunner to prevent
transaction corruption when database errors occur during dag run creation.

Previously, catching exceptions with 'except Exception' and continuing would
not rollback the transaction when the exception was caused by a database
error (e.g., IntegrityError, OperationalError). This left the session in
an invalid state and prevented subsequent dag runs from being created.

Changes:
- Wrap each dag run creation attempt in _create_dag_runs with a savepoint
- Wrap each asset-triggered dag run creation in _create_dag_runs_asset_triggered
  with a savepoint
- Roll back the savepoint on any exception to isolate failures
- Add test to verify database errors are handled without corrupting the session

This ensures that if one dag fails to create a run due to a database error,
other dags in the same batch can still create their runs successfully.

Fixes: apache#59121
@dstandish dstandish merged commit c9e190f into apache:main Dec 11, 2025
68 checks passed
@dstandish dstandish deleted the add-todo-for-scheduler-exc-handling branch December 11, 2025 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants