fix(api): v1/v1.1 normalize dispatch + stuck-row reconciler (#2870)#2873
Merged
Conversation
…uck rows (#2870) * Wire normalize-task dispatch into v1 and v1.1 ``AssetListView.post`` via a shared ``dispatch_pending_normalize`` helper; before this, v1/v1.1 video uploads landed in whatever codec the operator sent and the per-board passthrough set silently dropped non-h264/hevc files from rotation forever. * Set ``_pending_normalize='video'`` / ``is_processing=1`` on local video uploads in ``CreateAssetSerializerV1_1``. Image normalisation stays opt-in via v1.2 / v2 — promoting it onto the legacy endpoints would shift synchronous availability semantics for older clients. * Add ``reconcile_stuck_processing`` celery-beat task (10-min cadence, 60-min threshold) to recover rows whose normalize task was lost (worker SIGKILL, OOM, backup restore that bypassed dispatch). Ages rows via ``metadata.processing_started_at`` — stamped by each dispatch helper, granted on first sight for legacy / restored rows. * Drop ``hurry.filesize`` for ``django.template.defaultfilters.filesizeformat``. ``free_space`` on ``/api/v1/info``, ``/api/v1.1/info``, ``/api/v2/info`` changes from ``"15G"`` → ``"15.0 GB"`` (NBSP separator, decimal, full unit). Same change on home-page disk fields. * Bump version to 2026.05.1 in pyproject.toml, package.json, uv.lock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI failures on the new dispatch tests:
* v1 / v1.1 video and image tests hit ``validate_uri('/data/…')`` →
500 ``Invalid file path``. The mixin patch only covered
``serializers.mixins.validate_uri``; v1 / v1.1 import the same
helper as ``serializers.v1_1.validate_uri``. Patch both binding
sites.
* v1.2 / v2 image test expected ``dispatch_normalize_image`` once but
got zero. The mixin's rename writes ``<asset_id><ext>``; without
``ext`` in the payload the file lands extensionless and
``needs_image_normalisation`` returns False. Pass ``'ext': '.heic'``.
* Also pass ``'ext': '.mp4'`` on the video payload — mixin uses it for
the rename destination too; v1 / v1.1 ignore it.
* Re-run ``ruff format`` on tests/test_processing.py — pre-push run
used ``ruff check`` only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR closes #2870 by ensuring legacy API endpoints (v1/v1.1) dispatch video normalization consistently with v1.2/v2, and by adding a periodic Celery reconciler to recover (or at least unstick) assets that remain is_processing=True after dispatch/task failures. It also removes the hurry.filesize dependency by switching to Django’s filesizeformat, and bumps the project version to 2026.05.1.
Changes:
- Centralize “pending normalize” dispatch via
dispatch_pending_normalize()and wire it into v1/v1.1/v1.2/v2 create flows. - Add
reconcile_stuck_processingperiodic task plusmetadata.processing_started_atstamping to support age-based recovery. - Replace
hurry.filesizewithdjango.template.defaultfilters.filesizeformatand update tests + version pins/locks.
Reviewed changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Bumps version and removes hurry-filesize (and related lock entries). |
| pyproject.toml | Bumps version; drops hurry.filesize dependency and mypy ignore entry. |
| package.json | Bumps version. |
| src/anthias_server/processing.py | Adds processing-start stamp + shared dispatch_pending_normalize() helper; stamps on normalize dispatch. |
| src/anthias_server/celery_tasks.py | Adds periodic reconcile_stuck_processing task and ISO timestamp parser; schedules the task. |
| src/anthias_server/api/views/v1.py | Ensures v1 asset create dispatches pending normalization. |
| src/anthias_server/api/views/v1_1.py | Ensures v1.1 asset create dispatches pending normalization. |
| src/anthias_server/api/views/v1_2.py | Switches to shared pending-normalize dispatcher. |
| src/anthias_server/api/views/v2.py | Switches to shared pending-normalize dispatcher; replaces hurry.filesize with filesizeformat. |
| src/anthias_server/api/views/mixins.py | Replaces hurry.filesize with filesizeformat for info endpoints. |
| src/anthias_server/api/serializers/v1_1.py | Flags local video uploads for normalization (_pending_normalize='video') and sets is_processing=1. |
| src/anthias_server/app/views.py | Routes UI uploads through normalize dispatch helpers so stamping occurs consistently. |
| src/anthias_server/app/page_context.py | Switches disk size formatting to Django filesizeformat. |
| src/anthias_common/youtube.py | Stamps processing_started_at when dispatching YouTube downloads. |
| tests/test_processing.py | Expands unit tests for stamping + pending-normalize routing behavior. |
| tests/test_celery_tasks.py | Adds coverage for reconcile_stuck_processing behavior. |
| src/anthias_server/api/tests/test_assets.py | Adds regression tests ensuring all API versions dispatch video normalization; guards image-normalize behavior by version. |
| src/anthias_server/api/tests/test_info_endpoints.py | Updates mocks/expectations for filesizeformat output shape (NBSP, units). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Addresses Copilot review on #2873: * ``_parse_processing_started_at`` now returns ``None`` for naive (no-tzinfo) ISO-8601 strings. The dispatch helpers stamp via ``timezone.now()`` (tz-aware under ``USE_TZ=True``), so a naive value is by definition a hand-edit of metadata — comparing it to the tz-aware ``cutoff`` would raise ``TypeError`` and abort the sweep for every other stuck row. Route it through the stamp-on-first-sight branch instead. * Wrap the reconciler body in a SETNX + Lua-compare-and-delete lock on ``RECONCILE_STUCK_LOCK_KEY`` — same pattern as ``revalidate_asset_urls``. Default Anthias runs one worker with embedded beat (so single-flighted today), but the lock bounds blast radius if a future deploy ever runs two workers with ``-B``. * Rewrite the ``dispatch_download`` docstring: stamping ``processing_started_at`` does *not* let the reconciler re-download — it routes via mimetype='video' through ``normalize_video_asset``, whose ``path.isfile`` guard fails fast and ``on_failure`` writes ``metadata.error_message`` so the operator sees a "Failed" pill instead of a row stuck on "Processing". Tests: naive-timestamp recovery + lock-prevents-overlap + lock-released-after-clean-run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses second Copilot pass on #2873: * Rename ``_stamp_processing_start`` → ``stamp_processing_start``. The single underscore marked it as module-private but it's already called cross-package (anthias_common.youtube.dispatch_download, anthias_server.celery_tasks.reconcile_stuck_processing). Promoting to a public symbol matches its actual API surface; docstring now states the cross-module contract explicitly. * Wrap the read-modify-write in ``transaction.atomic()``. SQLite is single-writer at the file level so the wrap is mostly documenting intent today, but on Postgres / MySQL it produces an actual transaction boundary against a near-simultaneous operator PATCH or worker landing ``error_message``. * Fix the docstring claim of "no-op via the filter().update() pattern" — the code is a SELECT-then-UPDATE; the no-op semantics come from the ``filter().first() is None`` guard, not from the update query shape. * Tighten ``_parse_processing_started_at`` return type from ``Any`` to ``datetime | None``. Pull ``datetime`` into the module-level ``from datetime import …`` so the annotation resolves; drop the redundant local imports inside the parser and reconciler bodies. * Fix the reconciler docstring's "10-min grace window" — the actual grace is ``RECONCILE_STUCK_THRESHOLD_S`` (60 min) from the first-sweep stamp, with the 10-min cadence just dictating how often the row is re-checked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-3 Copilot review on #2873: * ``stamp_processing_start`` docstring: replace the implied "transaction.atomic() makes this safe" framing with a clear acknowledgement that this is a read-modify-write race against arbitrary other metadata writers (operator PATCH, normalize task landing error_message). Document why we accept it: stamps fire at *dispatch* time, before the worker picks the task up, so the practical window is microscopic — matches the same race the wider codebase already accepts in ``_set_processing_error`` et al. ``select_for_update()`` would close it on Postgres / MySQL but is a no-op on SQLite (Anthias's only deployed backend), so we'd pay complexity for zero practical safety today. The ``transaction.atomic()`` wrap stays — cheap savepoint that survives a future multi-writer-backend switch. * ``InfoViewMixin`` OpenAPI ``example`` field: update ``free_space`` from ``'10G'`` to ``'10.0 GB'`` so the generated docs match the new ``filesizeformat`` shape (NBSP separator, decimal, full unit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-4 Copilot review on #2873: * ``RECONCILE_STUCK_INTERVAL_S`` comment: the previous "one SELECT, one UPDATE per stuck row" undercounted — each per-row branch itself does SELECT + UPDATE inside ``stamp_processing_start``, and the re-dispatch branch may do more on top. Rewrite to reflect the actual cost shape: the initial SELECT plus per-row stamp / dispatch work, with the observation that a healthy fleet has an empty filtered set so the tick costs one SELECT total. * ``RECONCILE_STUCK_THRESHOLD_S`` comment: removed the claim that a row past the threshold "has had its worker time-limit expire" — rows can be stuck for reasons that never involved a worker picking the task up (Redis flake during ``.delay()``, container restart between enqueue and accept). Rewrite to call out both classes of failure so the threshold's role is honest: "if a row has carried is_processing=True for over an hour, something went wrong — recover it." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Issues Fixed
Closes #2870.
Description
Two-line defence against the v1 / v1.1 video-upload normalize gap.
1. Root cause — v1 / v1.1 never dispatched normalize.
A shared
dispatch_pending_normalize(serializer, asset_id)helper nowruns after
Asset.objects.create(...)in all fourAssetListView.postmethods, branching on the serializer's
_pending_normalizeflag.CreateAssetSerializerV1_1was also updated to set_pending_normalize='video'andis_processing=1on local videouploads — pre-fix, those rows landed in whatever codec the operator
sent (MPEG-1 in the issue's repro) and the per-board passthrough set
silently dropped them from rotation forever. Image normalisation stays
opt-in via v1.2 / v2 only; promoting it onto the legacy endpoints
would shift synchronous-availability semantics for older clients.
2. Reconciler safety-net.
New
reconcile_stuck_processingcelery-beat task (10-min cadence,60-min threshold — longer than
NORMALIZE_VIDEO_TIME_LIMIT_S=30min)recovers rows whose normalize task was lost mid-flight (worker
SIGKILL, OOM, backup restore that bypassed dispatch, Redis flake
during
.delay()). Ages rows viametadata.processing_started_at,stamped by each dispatch helper. Rows missing the stamp (legacy /
restored) get stamped on first sight for the full grace window; the
next sweep applies the threshold uniformly.
Bonus: drop
hurry.filesize.Five call sites swapped to
django.template.defaultfilters.filesizeformat.Wire-format change on
/api/v*/info.free_spaceand home-page diskfields:
"15G"→"15.0 GB"(NBSP separator, decimal, full unit).The
hurry.filesizepin is dropped frompyproject.toml(deps +mypy override) and
uv.lock.Version bumped to 2026.05.1 in
pyproject.toml/package.json/uv.lock.Checklist