Add startup task to force re-import OPDS for Distributors collections (PP-3684)#3055
Merged
jonathangreen merged 20 commits intomainfrom Feb 18, 2026
Merged
Conversation
Introduce a general-purpose registry for one-time startup tasks that are automatically discovered and dispatched to Celery on the first application start after deployment. This is useful for data backfills, re-imports, or reindexing that are too long-running for database migrations. - Add StartupTask SQLAlchemy model and migration for tracking queued tasks - Add auto-discovery system that scans startup_tasks/ for Python files defining a create_signature() callable - Add StartupTaskRunner integrated into InstanceInitializationScript, protected by the existing advisory lock - On fresh database installs, tasks are stamped without dispatching (no existing data to migrate) - Add bin/create_startup_task scaffolding command - Include example task for force-harvesting OPDS for Distributors
- Revert bin/util/initialize_instance to pass config_file instead of repo_root, matching the InstanceInitializationScript constructor - Resolve tasks_dir=None default to STARTUP_TASKS_DIR in run_startup_tasks and stamp_startup_tasks - Fix test_initialization.py tests to match actual API signatures and return value semantics for initialize_database and run_startup_tasks - Remove tests for nonexistent error-handling behavior - Update run_startup_tasks docstring to accurately describe error handling - Record task execution in the same transaction as the task itself to prevent duplicate Celery dispatches on crash - Rename queued_at column to recorded_at - Replace run boolean column with state enum (RUN, MARKED) - Fix StartupTaskCallable type alias to include logger parameter - Fix test task module signatures to match actual 3-arg call convention
Move apply_async() inside the transaction so broker failures roll back the task record, allowing retry on next startup. Add enum type cleanup to the migration downgrade to match project conventions. Add migration tests and a Celery dispatch retry test.
Move the run-vs-stamp branching from initialization.py into startup.py behind a single run_startup_tasks() function with an already_initialized parameter. Trim the module docstring to a README pointer. Add slug length truncation to create_startup_task.
The glob results are already iterated in sorted order, so the dict is built in key order and the final sorted() call was unnecessary.
- Fix template rendering crash on descriptions with braces by using str.replace instead of str.format - Rename ambiguous local variable db_initialized to already_initialized in initialization script - Document that startup tasks block application startup and should dispatch heavy work via Celery - Fix docstring to say "sorted by filename" instead of "sorted by key"
- Replace DatabaseTransactionFixture + Session patching with function_database for realistic engine-based test isolation - Extract RunStartupTasksFixture to share mocked discover_startup_tasks and services across all tests - Merge duplicate failure test into test_run_handles_failure_gracefully - Remove _engine context manager and related imports
Replace str.replace with a Jinja2 template for the startup task scaffolding to avoid ambiguity with brace-style placeholders. Also add an else branch so non-Celery tasks log "Executed" while Celery tasks only log "dispatched", avoiding a redundant double log.
Replace per-task "already executed; skipping" log lines with a single summary count to avoid spamming logs on every startup as tasks accumulate.
Fix grammar, correct the example filename format, remove the code example with an incorrect function signature, and simplify the documentation.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3055 +/- ##
==========================================
+ Coverage 93.16% 93.18% +0.01%
==========================================
Files 487 489 +2
Lines 44787 44943 +156
Branches 6173 6191 +18
==========================================
+ Hits 41726 41880 +154
- Misses 1985 1986 +1
- Partials 1076 1077 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
86d1de9 to
5c109eb
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a startup task that dispatches a forced re-harvest of all OPDS for Distributors collections on the next deployment.
Motivation and Context
After streaming media support landed (PR #3015), existing OPDS for Distributors collections need to be re-imported with the new parsing logic to pick up the changes.
Resolves PP-3684
How Has This Been Tested?
The startup task follows the established pattern and delegates to the existing
import_allCelery task withforce=True. Theimport_alltask is already covered by existing tests.Checklist