fix(celery): catch the asset-probe timeout instead of hard-killing the worker#3017
Merged
Conversation
…e worker - revalidate_asset_url's 30s hard limit was reachable by a legitimate probe (DNS stall + HEAD 10s + GET 10s), and tripping it SIGKILLs the pool child — three Sentry issues per occurrence (ANTHIAS-A, ANTHIAS-9, ANTHIAS-B) - Add soft_time_limit=60 / time_limit=90: the soft limit raises inside the task, which records the verdict an HTTP timeout gets (unreachable) instead of dying - Give the periodic sweep the same treatment: abort cleanly a minute before its hard limit, releasing the singleton lock - Add regression tests for limits and soft-timeout behaviour Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adjusts Celery task time limits and handling for asset reachability probes so that slow/hung probes are handled via soft time limits (caught in-task) instead of reaching the hard time limit that SIGKILLs the pool worker process, reducing the related Sentry noise and improving operational stability.
Changes:
- Introduces explicit soft/hard time-limit constants for the on-demand asset probe and the periodic sweep.
- Catches
SoftTimeLimitExceededinside both tasks to abort/record outcomes cleanly (instead of worker SIGKILL). - Adds unit tests asserting time-limit configuration and verifying soft-limit behavior (unreachable verdict / sweep abort + lock release).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/anthias_server/celery_tasks.py |
Adds soft/hard time limits and catches SoftTimeLimitExceeded in asset revalidation tasks. |
tests/test_celery_tasks.py |
Adds tests validating time-limit configuration and soft-limit handling behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… catch - The soft signal is delivered asynchronously, so it can land during the row UPDATE as well as the probe; cover the whole task body in both the on-demand recheck and the sweep - Re-raise SoftTimeLimitExceeded past the sweep's blanket per-asset handler so the outer abort path sees it - Satisfy strict mypy on the Optional time-limit comparisons; reword a misleading test comment Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
5 tasks
vpetersson
added a commit
that referenced
this pull request
Jun 9, 2026
- CalVer (YYYY.0M.MICRO); still June 2026, micro 2 -> 3 - Gives Sentry a real release boundary: every build since 2026.6.2 reported the same base version (only the +git-hash differed), so resolved-in-next-release never stuck and fixed issues kept reopening on the next event. A version bump lets the deployed fixes actually clear from the board. - Ships the crash/noise fixes merged since 2026.6.2: SQLite WAL + busy timeout (#3015), celery migration-gate (#3016) and asset-probe soft limits (#3017), transient-redis/CancelledError Sentry filtering + redis healthcheck (#3018/#3028), GitHub update-check log level (#3019), webview respawn on D-Bus death at setup and mid-play (#3020/#3031), resilient static-file scan (#3026), Wayland-socket wait (#3030), and Sentry release/board triage tags (#3021/#3025) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Issues Fixed
Sentry: ANTHIAS-A (
Hard time limit (30s) exceeded for revalidate_asset_url), ANTHIAS-9 (TimeLimitExceeded), ANTHIAS-B (ForkPoolWorker exited with signal 9 (SIGKILL)) — all three are the same hard-kill.Description
revalidate_asset_url's 30s hard time limit was reachable by a legitimately slow probe:url_failscan burn a hanginggetaddrinfoagainst a broken resolver (no timeout knob exists for it), then an HTTP HEAD (10s) plus the GET fallback (10s). Tripping the hard limit SIGKILLs the pool child, which surfaces as three separate Sentry issues per occurrence.soft_time_limit=60/time_limit=90on the on-demand probe — the soft limit raisesSoftTimeLimitExceededinside the task, which now records the same verdict an HTTP timeout gets (unreachable,last_reachability_checkstamped) instead of dyingChecklist
🤖 Generated with Claude Code