Skip to content

fix(backfill): Handle SQLite lock errors with atomic creation and graceful 503#67900

Open
lohitkolluri wants to merge 1 commit into
apache:mainfrom
lohitkolluri:fix/66726-backfill-sqlite-lock-error
Open

fix(backfill): Handle SQLite lock errors with atomic creation and graceful 503#67900
lohitkolluri wants to merge 1 commit into
apache:mainfrom
lohitkolluri:fix/66726-backfill-sqlite-lock-error

Conversation

@lohitkolluri
Copy link
Copy Markdown

@lohitkolluri lohitkolluri commented Jun 2, 2026

Closes: #66726

Summary

Creating a backfill via the API on SQLite can fail under concurrent access (Scheduler, DAG Processor, and API Server sharing the same database).

Currently, this may result in:

  • Partially-created backfills without DagRuns
  • SQLite errors from unsupported row-lock operations
  • HTTP 500 responses exposing internal database errors

This PR improves the experience by making backfill creation atomic, handling SQLite-specific locking behavior correctly, and returning a retryable error when the database is temporarily locked.

Changes

  • Atomic Backfill Creation

    • Replace session.commit() with session.flush() during backfill creation
    • Allows the outer transaction to commit or roll back the entire operation atomically
    • Prevents partially-created backfills when DagRun creation fails
  • SQLite Compatibility

    • Skip with_for_update() for SQLite in with_row_locks()
    • Avoids generating unsupported SELECT ... FOR UPDATE statements
  • Better Error Handling

    • Catch SQLite "database is locked" OperationalErrors
    • Return HTTP 503 Service Unavailable with a clear retry message
    • Continue propagating other OperationalErrors unchanged

Result

  • No partial backfill state on failure
  • No SQLite row-lock compilation issues
  • Clear, retryable API response instead of a generic 500
  • No behavior changes for PostgreSQL or MySQL

Testing

  • Added test_create_backfill_database_locked
  • Verified 503 response and expected error message
  • Existing backfill endpoint tests continue to pass
  • Ruff and formatting checks pass

AI Assistance

  • Yes

Copy link
Copy Markdown
Contributor

@23tae 23tae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.

#66726 also reports that the backfill may still be created even though the request fails.

This PR seems to improve the error response from 500 to 503, but it is not clear whether it changes the partial-success behavior. Could we clarify this in the PR description?

Comment thread airflow-core/src/airflow/api_fastapi/core_api/routes/public/backfills.py Outdated
@lohitkolluri lohitkolluri force-pushed the fix/66726-backfill-sqlite-lock-error branch from d9ed03e to 8962d5e Compare June 3, 2026 06:56
@lohitkolluri lohitkolluri requested review from XD-DENG and ashb as code owners June 3, 2026 06:56
@lohitkolluri lohitkolluri force-pushed the fix/66726-backfill-sqlite-lock-error branch from 8962d5e to 2c511e9 Compare June 3, 2026 07:03
@lohitkolluri lohitkolluri requested a review from 23tae June 3, 2026 07:06
@lohitkolluri lohitkolluri changed the title fix(backfill): Return 503 instead of 500 on SQLite lock error fix(backfill): Handle SQLite lock errors with atomic creation and graceful 503 Jun 3, 2026
Copy link
Copy Markdown
Contributor

@23tae 23tae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I left a few comments.

Comment thread airflow-core/src/airflow/api_fastapi/core_api/routes/public/backfills.py Outdated
@lohitkolluri lohitkolluri force-pushed the fix/66726-backfill-sqlite-lock-error branch from c1af990 to cc00931 Compare June 3, 2026 16:01
@lohitkolluri lohitkolluri requested a review from 23tae June 3, 2026 16:05
Copy link
Copy Markdown
Contributor

@henry3260 henry3260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

Comment thread airflow-core/src/airflow/models/backfill.py Outdated
Comment thread airflow-core/newsfragments/67900.bugfix.rst Outdated
@lohitkolluri lohitkolluri requested a review from henry3260 June 3, 2026 17:49
…ceful 503

Closes: apache#66726

- Catch OperationalError in create_backfill, retry 3x with exponential
  backoff before returning HTTP 503 with a clear retry message
- Add same retry+503 handling to create_backfill_dry_run
- Add session.commit() to unpause_backfill for proper state persistence
- Add SQLite 'database is locked' detection to is_lock_not_available_error()
  and skip FOR UPDATE in with_row_locks() for SQLite
- Add test coverage for lock error paths in create and dry_run endpoints
- Keep session.commit() in _create_backfill to prevent duplicate active
  backfills from concurrent requests

Signed-off-by: Lohit Kolluri <lohitkolluri@gmail.com>
@lohitkolluri lohitkolluri force-pushed the fix/66726-backfill-sqlite-lock-error branch from db990c3 to a88b72b Compare June 3, 2026 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Creating a backfill via Airflow API results in internal server error if SQLite is used

3 participants