Skip to content

Add startup task to force re-import OPDS for Distributors collections (PP-3684)#3055

Merged
jonathangreen merged 20 commits intomainfrom
bugfix/opds-for-distributors-reimport
Feb 18, 2026
Merged

Add startup task to force re-import OPDS for Distributors collections (PP-3684)#3055
jonathangreen merged 20 commits intomainfrom
bugfix/opds-for-distributors-reimport

Conversation

@jonathangreen
Copy link
Member

@jonathangreen jonathangreen commented Feb 17, 2026

Description

Note: #3035 needs to get merged before this can go in.

Adds a startup task that dispatches a forced re-harvest of all OPDS for Distributors collections on the next deployment.

Motivation and Context

After streaming media support landed (PR #3015), existing OPDS for Distributors collections need to be re-imported with the new parsing logic to pick up the changes.

Resolves PP-3684

How Has This Been Tested?

The startup task follows the established pattern and delegates to the existing import_all Celery task with force=True. The import_all task is already covered by existing tests.

Checklist

  • I have updated the documentation accordingly.
  • All new and existing tests passed.

Introduce a general-purpose registry for one-time startup tasks that are
automatically discovered and dispatched to Celery on the first application
start after deployment. This is useful for data backfills, re-imports, or
reindexing that are too long-running for database migrations.

- Add StartupTask SQLAlchemy model and migration for tracking queued tasks
- Add auto-discovery system that scans startup_tasks/ for Python files
  defining a create_signature() callable
- Add StartupTaskRunner integrated into InstanceInitializationScript,
  protected by the existing advisory lock
- On fresh database installs, tasks are stamped without dispatching
  (no existing data to migrate)
- Add bin/create_startup_task scaffolding command
- Include example task for force-harvesting OPDS for Distributors
- Revert bin/util/initialize_instance to pass config_file instead of
  repo_root, matching the InstanceInitializationScript constructor
- Resolve tasks_dir=None default to STARTUP_TASKS_DIR in
  run_startup_tasks and stamp_startup_tasks
- Fix test_initialization.py tests to match actual API signatures and
  return value semantics for initialize_database and run_startup_tasks
- Remove tests for nonexistent error-handling behavior
- Update run_startup_tasks docstring to accurately describe error handling
- Record task execution in the same transaction as the task itself to
  prevent duplicate Celery dispatches on crash
- Rename queued_at column to recorded_at
- Replace run boolean column with state enum (RUN, MARKED)
- Fix StartupTaskCallable type alias to include logger parameter
- Fix test task module signatures to match actual 3-arg call convention
Move apply_async() inside the transaction so broker failures roll back
the task record, allowing retry on next startup. Add enum type cleanup
to the migration downgrade to match project conventions. Add migration
tests and a Celery dispatch retry test.
Move the run-vs-stamp branching from initialization.py into
startup.py behind a single run_startup_tasks() function with an
already_initialized parameter. Trim the module docstring to a
README pointer. Add slug length truncation to create_startup_task.
The glob results are already iterated in sorted order, so the dict
is built in key order and the final sorted() call was unnecessary.
- Fix template rendering crash on descriptions with braces by using
  str.replace instead of str.format
- Rename ambiguous local variable db_initialized to already_initialized
  in initialization script
- Document that startup tasks block application startup and should
  dispatch heavy work via Celery
- Fix docstring to say "sorted by filename" instead of "sorted by key"
- Replace DatabaseTransactionFixture + Session patching with
  function_database for realistic engine-based test isolation
- Extract RunStartupTasksFixture to share mocked discover_startup_tasks
  and services across all tests
- Merge duplicate failure test into test_run_handles_failure_gracefully
- Remove _engine context manager and related imports
Replace str.replace with a Jinja2 template for the startup task
scaffolding to avoid ambiguity with brace-style placeholders. Also
add an else branch so non-Celery tasks log "Executed" while Celery
tasks only log "dispatched", avoiding a redundant double log.
Replace per-task "already executed; skipping" log lines with a single
summary count to avoid spamming logs on every startup as tasks
accumulate.
Fix grammar, correct the example filename format, remove the code
example with an incorrect function signature, and simplify the
documentation.
@jonathangreen jonathangreen added the bug Something isn't working label Feb 17, 2026
@jonathangreen jonathangreen requested a review from a team February 17, 2026 16:19
@codecov
Copy link

codecov bot commented Feb 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.18%. Comparing base (7bd958a) to head (661bdaa).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3055      +/-   ##
==========================================
+ Coverage   93.16%   93.18%   +0.01%     
==========================================
  Files         487      489       +2     
  Lines       44787    44943     +156     
  Branches     6173     6191      +18     
==========================================
+ Hits        41726    41880     +154     
- Misses       1985     1986       +1     
- Partials     1076     1077       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@dbernstein dbernstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Looks good.

Base automatically changed from feature/startup-tasks to main February 18, 2026 18:23
@jonathangreen jonathangreen merged commit 1bfe439 into main Feb 18, 2026
16 checks passed
@jonathangreen jonathangreen deleted the bugfix/opds-for-distributors-reimport branch February 18, 2026 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants