Skip to content

Fix collection delete timeout by batching license pool deletions (PP-3683)#3034

Merged
jonathangreen merged 2 commits intomainfrom
bugfix/delete-timeout
Feb 11, 2026
Merged

Fix collection delete timeout by batching license pool deletions (PP-3683)#3034
jonathangreen merged 2 commits intomainfrom
bugfix/delete-timeout

Conversation

@jonathangreen
Copy link
Member

Description

Refactor Collection.delete() and the collection_delete celery task to process license pools in batches using the task.replace() re-queueing pattern, preventing task timeouts when deleting large collections.

Motivation and Context

The collection_delete celery task was hitting the 1800s (30-minute) hard time limit when deleting large collections. The root cause was that Collection.delete() processed all license pools in a single transaction. For collections with hundreds of thousands of pools, this easily exceeded 30 minutes.

Changes

  • Collection.delete(): Accepts a batch_size parameter (default 1000), processes only that many pools per call, and returns a bool indicating whether deletion is complete. Removed the @inject/ExternalSearchIndex dependency (orphaned works are cleaned up by the existing work_reaper task) and removed the inline commit() call so the caller manages the transaction.
  • collection_delete task: Added batch_size parameter and task.replace() re-queueing when more pools remain. Changed queue from high to default since this is now a background batch job.
  • collection_reaper: Now queues collection_delete tasks for each marked collection instead of calling Collection.delete() directly (avoiding the same timeout risk).
  • Collection.delete() internals: Uses an explicit SELECT query instead of the self.licensepools relationship attribute, which ensures a fresh result from the database across transaction boundaries.

How Has This Been Tested?

  • Added test_collection_delete_batched: verifies re-queueing with small batch_size across multiple rounds
  • Added test_collection_delete_with_child_records: verifies ORM cascade cleanup of loans, holds, and works
  • Updated TestCollectionReaper to verify delegation to collection_delete
  • Updated test_delete in test_collection.py to match new method signature
  • All 53 tests across the three affected test files pass
  • mypy passes on all changed files

Checklist

  • I have updated the documentation accordingly.
  • All new and existing tests passed.

@jonathangreen jonathangreen added the bug Something isn't working label Feb 10, 2026
@codecov
Copy link

codecov bot commented Feb 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.06%. Comparing base (e979861) to head (0984358).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3034   +/-   ##
=======================================
  Coverage   93.06%   93.06%           
=======================================
  Files         479      479           
  Lines       43651    43650    -1     
  Branches     6026     6026           
=======================================
  Hits        40625    40625           
  Misses       1960     1960           
+ Partials     1066     1065    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Collection.delete() was processing all license pools in a single
transaction, causing the collection_delete celery task to exceed its
30-minute hard time limit for large collections.

- Refactor Collection.delete() to process a configurable batch_size of
  pools per call and return a bool indicating completion
- Add task.replace() re-queueing to collection_delete so it processes
  pools across multiple short-lived transactions
- Delegate collection_reaper to queue collection_delete tasks instead
  of deleting inline (avoiding the same timeout risk)
- Remove ExternalSearchIndex dependency from Collection.delete(); orphaned
  works are cleaned up by the existing work_reaper task
- Use explicit SELECT query instead of relationship attribute for reliable
  cross-transaction batch selection
# https://docs.sqlalchemy.org/en/14/orm/cascades.html#notes-on-delete-deleting-objects-referenced-from-collections-and-scalar-relationships
work.license_pools.remove(pool)
if not work.license_pools:
work.delete(search_index=search_index)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orphaned works are cleaned up by the existing work_reaper task. This avoids complicating this task, and seems like a better separation of concerns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor - this is mentioned in a test comment, as well. Should it be mentioned in a "NOTE:" here, as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its mentioned in the doc comment for delete above, so I think this is covered.

@jonathangreen jonathangreen requested a review from a team February 10, 2026 16:54
Copy link
Contributor

@tdilauro tdilauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 🚀

# application where LicensePools are permanently deleted.
# We use an explicit query rather than the relationship attribute so
# that each call gets a fresh result from the database.
pool_query = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor - Assuming that we're ordering by license pool id here to avoid deadlocks. Might be worth noting that here, if it's important.

# https://docs.sqlalchemy.org/en/14/orm/cascades.html#notes-on-delete-deleting-objects-referenced-from-collections-and-scalar-relationships
work.license_pools.remove(pool)
if not work.license_pools:
work.delete(search_index=search_index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor - this is mentioned in a test comment, as well. Should it be mentioned in a "NOTE:" here, as well?

@jonathangreen jonathangreen merged commit 6e280a3 into main Feb 11, 2026
19 checks passed
@jonathangreen jonathangreen deleted the bugfix/delete-timeout branch February 11, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants