Skip to content

fix(server): survive unreadable static files instead of crash-looping#3026

Merged
vpetersson merged 3 commits into
masterfrom
fix/whitenoise-resilient-scan
Jun 7, 2026
Merged

fix(server): survive unreadable static files instead of crash-looping#3026
vpetersson merged 3 commits into
masterfrom
fix/whitenoise-resilient-scan

Conversation

@vpetersson

Copy link
Copy Markdown
Contributor

Issues Fixed

Sentry ANTHIAS-YOSError: [Errno 117] Structure needs cleaning in WhiteNoise's startup scan; 400+ events from a single crash-looping pi3.

Description

Timeline correlation confirmed this surfaced with the 2026.6.2+rev1 OTA (fleet release 14:26 UTC → first event 15:40 UTC): the deploy rewrote the staticfiles image layer onto a device whose ext4 metadata is corrupted, and the first read of admin/js/vendor/select2/i18n/gl.js returned EUCLEAN. The file is stock Django-admin vendor JS — the corruption is the device's storage, not the release's code.

The failure mode is ours though: stock WhiteNoise lets the scan exception propagate out of ASGI import, so one unreadable vendor file kills uvicorn entirely — no API, no WebSocket, no asset serving. A signage appliance should degrade, not brick.

  • ResilientWhiteNoiseMiddleware mirrors whitenoise 6.x's update_files_dictionary/scantree (both stable one-screen helpers) with per-entry OSError tolerance
  • Skipped entries are collected and logged as one ERROR per startup (with up to 3 example paths) — the storage fault still lands in Sentry once per boot, actionable without flooding
  • Minimal stubs/whitenoise-stubs/ (same pattern as channels-stubs) so the subclass passes strict mypy

The affected device itself still needs ops attention (fsck/reflash) — event geo says Jonzac, France.

Checklist

  • I have performed a self-review of my own code.
  • New and existing unit tests pass locally and on CI with my changes.
  • I have done an end-to-end test for Raspberry Pi devices.
  • I have tested my changes for x86 devices.
  • I added a documentation for the changes I have made (when necessary).

🤖 Generated with Claude Code

- A balena OTA rewrote the staticfiles layer onto a device with
  corrupted ext4 metadata; WhiteNoise's one-shot startup scan raised
  OSError 117 (Structure needs cleaning) at ASGI import and uvicorn
  crash-looped, bricking the device over one unreadable Django-admin
  vendor file (Sentry ANTHIAS-Y, 400+ events from one device)
- Subclass the middleware with a per-entry fault-tolerant scan: skip
  what the filesystem refuses, serve the rest, and emit one ERROR per
  startup so the storage fault still reaches Sentry once per boot
- Add a minimal whitenoise stub (channels-stubs pattern) so the
  subclass type-checks under strict mypy

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vpetersson vpetersson requested a review from a team as a code owner June 7, 2026 18:24
@vpetersson vpetersson self-assigned this Jun 7, 2026
@vpetersson vpetersson requested a review from Copilot June 7, 2026 18:24

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a fault-tolerant WhiteNoise middleware variant so the server can start and continue serving API/WebSocket traffic even if one or more static files/directories under STATIC_ROOT are unreadable (e.g., ext4 EUCLEAN), avoiding uvicorn crash-loops during ASGI import.

Changes:

  • Added ResilientWhiteNoiseMiddleware that tolerates per-entry OSError during WhiteNoise’s startup scan and logs a single aggregated error per boot.
  • Switched Django MIDDLEWARE to use the resilient subclass instead of whitenoise.middleware.WhiteNoiseMiddleware.
  • Added regression tests and minimal local mypy stubs for whitenoise.middleware.

Reviewed changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_whitenoise_resilient.py Adds regression tests ensuring unreadable static entries don’t crash startup and are logged once.
stubs/whitenoise-stubs/py.typed Marks the local stub package as PEP 561 typed.
stubs/whitenoise-stubs/middleware.pyi Provides minimal type surface for WhiteNoiseMiddleware used by the subclass under strict mypy.
stubs/whitenoise-stubs/init.pyi Declares the stub package module root.
src/anthias_server/lib/whitenoise.py Implements the resilient startup scan and aggregated error logging.
src/anthias_server/django_project/settings.py Replaces the middleware entry to use ResilientWhiteNoiseMiddleware.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/anthias_server/lib/whitenoise.py Outdated
Comment thread src/anthias_server/lib/whitenoise.py
Comment thread src/anthias_server/lib/whitenoise.py Outdated
Comment thread tests/test_whitenoise_resilient.py Outdated
- Normalise root with a trailing separator before slicing so a root
  without one can't yield a '/static//css/app.css' double-slash URL
  that fails to match requests (this was also the CI test failure)
- Use a module-level logger instead of the root logger, per the
  codebase convention
- Tests now assert the full canonical /static/ URLs so a URL-join
  regression can't slip through

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 6 changed files in this pull request and generated 5 comments.

Comment thread src/anthias_server/lib/whitenoise.py Outdated
Comment thread tests/test_whitenoise_resilient.py Outdated
Comment thread tests/test_whitenoise_resilient.py
Comment thread stubs/whitenoise-stubs/middleware.pyi
Comment thread stubs/whitenoise-stubs/middleware.pyi
…esponses

- Iterate os.scandir under a with-block so directory FDs close
  promptly on large/deep trees
- Drive the middleware tests through update_files_dictionary directly
  (the test settings enable WHITENOISE_AUTOREFRESH, so __init__ never
  scans) and assert the full canonical /static/ URLs — this is also
  what the prior CI failure was telling us
- get_response returns a real HttpResponse per Django's contract;
  tighten the stub's get_response to HttpResponseBase (no None) and
  mark py.typed partial, matching channels-stubs

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sonarqubecloud

sonarqubecloud Bot commented Jun 7, 2026

Copy link
Copy Markdown

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated no new comments.

@vpetersson vpetersson merged commit 757feae into master Jun 7, 2026
10 checks passed
vpetersson added a commit that referenced this pull request Jun 9, 2026
- CalVer (YYYY.0M.MICRO); still June 2026, micro 2 -> 3
- Gives Sentry a real release boundary: every build since 2026.6.2
  reported the same base version (only the +git-hash differed), so
  resolved-in-next-release never stuck and fixed issues kept
  reopening on the next event. A version bump lets the deployed
  fixes actually clear from the board.
- Ships the crash/noise fixes merged since 2026.6.2: SQLite WAL +
  busy timeout (#3015), celery migration-gate (#3016) and
  asset-probe soft limits (#3017), transient-redis/CancelledError
  Sentry filtering + redis healthcheck (#3018/#3028), GitHub
  update-check log level (#3019), webview respawn on D-Bus death at
  setup and mid-play (#3020/#3031), resilient static-file scan
  (#3026), Wayland-socket wait (#3030), and Sentry release/board
  triage tags (#3021/#3025)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants