Add TOCTOU race recovery for DocumentPath uniqueness conflicts#1236
Add TOCTOU race recovery for DocumentPath uniqueness conflicts#1236
Conversation
The unique_active_path_per_corpus partial unique index added in migration 0023 already enforces (corpus, path) uniqueness for active paths, but DocumentFolderService callers handled the resulting IntegrityError by giving up: move_document_to_folder returned a "Path conflict, please retry" error to the caller, while move_documents_to_folder and delete_folder rolled back the entire batch. Under concurrent moves of different documents to the same target folder, two transactions could observe a candidate path as free in _disambiguate_path's SELECT and race to insert it; the loser saw a user-visible failure even though an immediate retry against a freshly disambiguated path would have succeeded. Add _create_successor_path_with_retry, a helper that wraps the old-path deactivation and successor insert in a savepoint and retries up to MAX_PATH_CREATE_RETRIES + 1 times on IntegrityError, treating each lost path as occupied so the next disambiguation picks a fresh suffix. Refactor move_document_to_folder, move_documents_to_folder, and delete_folder to route their successor inserts through the helper. The bulk move now executes sequentially (each commit visible to the next disambiguation) instead of pre-computing all paths up front; the all-or-nothing atomicity guarantee is preserved by the outer transaction.atomic() block. Cover the new behavior with TestRetry_MoveDocumentIntegrityRecovery, TestRetry_BulkMoveIntegrityRecovery, and TestRetry_DeleteFolderIntegrityRecovery, exercising: - Transient single-failure recovery - Disambiguated retry path selection - Exhausted-retry rollback preserving the original active path - Bulk-move recovery without aborting the batch - delete_folder relocation recovery Closes #1200
Code ReviewThis is a well-scoped fix for a real concurrency hazard. The savepoint-plus-retry pattern is the correct tool for TOCTOU races on a partial unique index, and the refactoring to a single Medium —
|
…, discard explicitly - Replace `assert last_exc is not None` with a proper RuntimeError guard that survives `python -O` (review point 1) - Remove conditional _disambiguate_path call shape that existed only to accommodate 3-arg test mocks; always pass extra_occupied and update the test mock to match the real interface (review point 2) - Explicitly assign `_ =` for the discarded return value of _create_successor_path_with_retry in delete_folder (review point 3)
PR Review: Add TOCTOU race recovery for DocumentPath uniqueness conflictsOverall this is a solid, well-thought-out fix for a genuine concurrency hazard. The core idea — wrapping the deactivate+create pair in a savepoint-retried helper — is correct, and the documentation is excellent. A few items worth addressing before merge: Issues1. Redundant local import (
|
Move the hardcoded "Path conflict" string to a named constant in opencontractserver/constants/document_processing.py and update all references in folder_service.py and its test file to use it.
Code ReviewOverall this is a clean, well-reasoned solution to a real TOCTOU concurrency problem. The design — savepoint + retry + in-memory occupied set — is the correct approach for this class of race. Docstrings are thorough and test coverage is good. A few issues worth addressing before merge: Critical:
|
…name test classes - Only retry IntegrityError when it mentions unique_active_path_per_corpus; other IntegrityErrors (null, FK violations) now propagate immediately - Add comment explaining deferred import for circular import avoidance - Simplify set(extra_occupied or set()) to set(extra_occupied or ()) - Rename TestRetry_* test classes to PascalCase (TestMoveDocumentIntegrityRecovery, etc.) - Add test_non_constraint_integrity_error_is_not_retried for constraint filtering - Add test_persistent_failure_preserves_folder_and_documents for delete_folder exhausted retries
Code Review — TOCTOU Race Recovery (#1236)This is a well-executed fix. The savepoint-plus-retry pattern is correct, the constraint name filter narrows the retry scope appropriately, and the three affected call sites are consistently updated. The ✅ Confirmed Fixed (from prior reviews)
Issues to AddressMinor — CHANGELOG references non-existent class names The CHANGELOG entry names Minor — # test_persistent_failure_returns_error_after_exhausting_retries (line ~629):
from opencontractserver.constants.document_processing import (
MAX_PATH_CREATE_RETRIES,
)
# test_persistent_failure_preserves_folder_and_documents (line ~846):
from opencontractserver.constants.document_processing import (
MAX_PATH_CREATE_RETRIES,
)Both should be hoisted to the module-level import block alongside the existing Nit — The test verifies that a non-constraint Nit — The comment says exactly "First doc retried (2 attempts) + second doc (1 attempt) = 3 inserts." If that arithmetic is correct, No New ConcernsThe Overall this is ready to merge once the CHANGELOG class names and the two inline imports are fixed. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…-LGMjQ # Conflicts: # CHANGELOG.md # opencontractserver/corpuses/folder_service.py
- Hoist MAX_PATH_CREATE_RETRIES and MAX_PATH_DISAMBIGUATION_SUFFIX imports to module level alongside existing PATH_CONFLICT_MSG import - Use assertEqual instead of assertGreaterEqual for exact insert count assertion in test_bulk_move_recovers_from_transient_integrity_error
…o claude/fix-issue-1200-LGMjQ
PR Review: TOCTOU Race Recovery for DocumentPath Uniqueness ConflictsOverall this is a well-structured fix for a real race condition. The savepoint-plus-retry pattern is the right tool for this problem, the constants are properly extracted, and the test coverage is thorough. A few issues worth addressing before merge: 1.
|
- constants/document_processing.py: Clarify MAX_PATH_CREATE_RETRIES comment to explicitly document that total attempts = retries + 1, addressing the off-by-one naming confusion raised in review. - folder_service.py (_create_successor_path_with_retry): Strengthen IntegrityError constraint guard with psycopg2 pgcode check (23505) before string-matching the constraint name. This prevents unrelated IntegrityErrors containing the constraint name as a substring from being retried, and avoids matching on non-English Postgres messages. - folder_service.py (move_documents_to_folder): Add comment explaining why only committed paths are added to batch_claimed — failed intermediate paths are rolled back and not in the DB, so subsequent batch items can reuse them naturally. - test_document_folder_service.py: Rename all TestCoverageGap_* classes to pure PascalCase (TestCoverageGapFoo) to match the rest of the file. - test_document_folder_service.py: Hoist inline MAX_PATH_CREATE_RETRIES import from test_persistent_failure_preserves_folder_and_documents body to module level (already imported there). - test_document_folder_service.py: Strengthen assertGreaterEqual to assertEqual for deterministic attempt counts in test_first_attempt_fails_second_succeeds and test_delete_folder_recovers_from_transient_integrity_error — both mocks fail exactly once then succeed, so exactly 2 creates run. - test_document_folder_service.py: Add error-string assertion to test_non_constraint_integrity_error_is_not_retried so the returned error value is verified, not just the success flag.
PR Review: TOCTOU Race Recovery for DocumentPath UniquenessOverall this is a well-reasoned fix for a real concurrency issue, with good documentation and a solid architectural approach. One significant issue in the test/guard interaction needs addressing before merge, plus a few smaller notes. Critical:
|
…chaining
The pgcode guard in _create_successor_path_with_retry checks
exc.__cause__.pgcode == "23505" before retrying. Test mocks were raising
bare IntegrityError("unique_active_path_per_corpus") with no __cause__,
so pgcode was None and the guard re-raised immediately instead of
retrying. Add _make_constraint_error() helper that constructs a properly
chained IntegrityError matching what Django/psycopg2 produces for
UniqueViolation, and use it in all 10 mock call sites.
Code Review: TOCTOU Race Recovery for DocumentPath UniquenessOverall this is a well-thought-out change that addresses a real concurrency issue. The retry helper is cleanly designed, well-documented, and the test coverage is solid. A few things worth discussing before merge: Medium: Non-constraint
|
Summary
This PR adds automatic retry logic to recover from transient
IntegrityErrorexceptions caused by TOCTOU (Time-of-Check-Time-of-Use) races on theunique_active_path_per_corpuspartial unique constraint. When two concurrent transactions attempt to create aDocumentPathwith the same filename in the same folder, the second one now automatically retries with a freshly disambiguated path instead of failing.Key Changes
New helper method
_create_successor_path_with_retry(): Encapsulates the atomic operation of deactivating an oldDocumentPathand creating a successor, with built-in retry logic onIntegrityError. The method:MAX_PATH_CREATE_RETRIES + 1attemptsIntegrityError, treats the losing path as occupied and re-disambiguatestransaction.atomic()savepoint so failures don't poison the outer transactionDocumentPathrecord and the final committed path stringRefactored
move_document_to_folder(): Now delegates path creation to_create_successor_path_with_retry(), eliminating inline savepoint logic and gaining automatic retry behavior.Refactored
move_documents_to_folder(): Changed from pre-computing all paths upfront to executing moves sequentially, with each move using the retry helper. This simplifies within-batch conflict detection by leveraging the sequential nature of the transaction.Refactored
delete_folder(): Updated document relocation logic to use the new retry helper, inheriting the same race recovery behavior.New constant
MAX_PATH_CREATE_RETRIES: Configurable limit (default 5) on retry attempts before surfacing theIntegrityErrorto the caller.Implementation Details
occupied_after_lossset to track paths that failed on previous attempts, ensuring each retry attempts a different disambiguated suffix.select_for_update()lock on the currentDocumentPathto prevent races on the same document; this helper only handles races on the target path slot.extra_occupiedparameter allows batch operations to pass in paths already claimed by earlier items, ensuring within-batch filename collisions are resolved against each other.current.is_current = Truein the database, but the in-memory attribute is manually reset to ensure the next iteration'ssave()actually writesFalse.IntegrityErroronly after exhausting retries, with improved logging at each attempt and final failure.https://claude.ai/code/session_01HK8jbwCLYut9kqRm6uNonc