fix(backfill): Handle SQLite lock errors with atomic creation and graceful 503#67900
Open
lohitkolluri wants to merge 1 commit into
Open
fix(backfill): Handle SQLite lock errors with atomic creation and graceful 503#67900lohitkolluri wants to merge 1 commit into
lohitkolluri wants to merge 1 commit into
Conversation
23tae
reviewed
Jun 3, 2026
Contributor
23tae
left a comment
There was a problem hiding this comment.
Thanks for working on this.
#66726 also reports that the backfill may still be created even though the request fails.
This PR seems to improve the error response from 500 to 503, but it is not clear whether it changes the partial-success behavior. Could we clarify this in the PR description?
d9ed03e to
8962d5e
Compare
8962d5e to
2c511e9
Compare
23tae
suggested changes
Jun 3, 2026
Contributor
23tae
left a comment
There was a problem hiding this comment.
Thanks for the update. I left a few comments.
c1af990 to
cc00931
Compare
henry3260
reviewed
Jun 3, 2026
…ceful 503 Closes: apache#66726 - Catch OperationalError in create_backfill, retry 3x with exponential backoff before returning HTTP 503 with a clear retry message - Add same retry+503 handling to create_backfill_dry_run - Add session.commit() to unpause_backfill for proper state persistence - Add SQLite 'database is locked' detection to is_lock_not_available_error() and skip FOR UPDATE in with_row_locks() for SQLite - Add test coverage for lock error paths in create and dry_run endpoints - Keep session.commit() in _create_backfill to prevent duplicate active backfills from concurrent requests Signed-off-by: Lohit Kolluri <lohitkolluri@gmail.com>
db990c3 to
a88b72b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes: #66726
Summary
Creating a backfill via the API on SQLite can fail under concurrent access (Scheduler, DAG Processor, and API Server sharing the same database).
Currently, this may result in:
This PR improves the experience by making backfill creation atomic, handling SQLite-specific locking behavior correctly, and returning a retryable error when the database is temporarily locked.
Changes
Atomic Backfill Creation
session.commit()withsession.flush()during backfill creationSQLite Compatibility
with_for_update()for SQLite inwith_row_locks()SELECT ... FOR UPDATEstatementsBetter Error Handling
"database is locked"OperationalErrorsOperationalErrors unchangedResult
Testing
test_create_backfill_database_lockedAI Assistance