fix(server): survive unreadable static files instead of crash-looping#3026
Merged
Conversation
- A balena OTA rewrote the staticfiles layer onto a device with corrupted ext4 metadata; WhiteNoise's one-shot startup scan raised OSError 117 (Structure needs cleaning) at ASGI import and uvicorn crash-looped, bricking the device over one unreadable Django-admin vendor file (Sentry ANTHIAS-Y, 400+ events from one device) - Subclass the middleware with a per-entry fault-tolerant scan: skip what the filesystem refuses, serve the rest, and emit one ERROR per startup so the storage fault still reaches Sentry once per boot - Add a minimal whitenoise stub (channels-stubs pattern) so the subclass type-checks under strict mypy Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a fault-tolerant WhiteNoise middleware variant so the server can start and continue serving API/WebSocket traffic even if one or more static files/directories under STATIC_ROOT are unreadable (e.g., ext4 EUCLEAN), avoiding uvicorn crash-loops during ASGI import.
Changes:
- Added
ResilientWhiteNoiseMiddlewarethat tolerates per-entryOSErrorduring WhiteNoise’s startup scan and logs a single aggregated error per boot. - Switched Django
MIDDLEWAREto use the resilient subclass instead ofwhitenoise.middleware.WhiteNoiseMiddleware. - Added regression tests and minimal local mypy stubs for
whitenoise.middleware.
Reviewed changes
Copilot reviewed 4 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_whitenoise_resilient.py | Adds regression tests ensuring unreadable static entries don’t crash startup and are logged once. |
| stubs/whitenoise-stubs/py.typed | Marks the local stub package as PEP 561 typed. |
| stubs/whitenoise-stubs/middleware.pyi | Provides minimal type surface for WhiteNoiseMiddleware used by the subclass under strict mypy. |
| stubs/whitenoise-stubs/init.pyi | Declares the stub package module root. |
| src/anthias_server/lib/whitenoise.py | Implements the resilient startup scan and aggregated error logging. |
| src/anthias_server/django_project/settings.py | Replaces the middleware entry to use ResilientWhiteNoiseMiddleware. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Normalise root with a trailing separator before slicing so a root without one can't yield a '/static//css/app.css' double-slash URL that fails to match requests (this was also the CI test failure) - Use a module-level logger instead of the root logger, per the codebase convention - Tests now assert the full canonical /static/ URLs so a URL-join regression can't slip through Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…esponses - Iterate os.scandir under a with-block so directory FDs close promptly on large/deep trees - Drive the middleware tests through update_files_dictionary directly (the test settings enable WHITENOISE_AUTOREFRESH, so __init__ never scans) and assert the full canonical /static/ URLs — this is also what the prior CI failure was telling us - get_response returns a real HttpResponse per Django's contract; tighten the stub's get_response to HttpResponseBase (no None) and mark py.typed partial, matching channels-stubs Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
This was referenced Jun 8, 2026
vpetersson
added a commit
that referenced
this pull request
Jun 9, 2026
- CalVer (YYYY.0M.MICRO); still June 2026, micro 2 -> 3 - Gives Sentry a real release boundary: every build since 2026.6.2 reported the same base version (only the +git-hash differed), so resolved-in-next-release never stuck and fixed issues kept reopening on the next event. A version bump lets the deployed fixes actually clear from the board. - Ships the crash/noise fixes merged since 2026.6.2: SQLite WAL + busy timeout (#3015), celery migration-gate (#3016) and asset-probe soft limits (#3017), transient-redis/CancelledError Sentry filtering + redis healthcheck (#3018/#3028), GitHub update-check log level (#3019), webview respawn on D-Bus death at setup and mid-play (#3020/#3031), resilient static-file scan (#3026), Wayland-socket wait (#3030), and Sentry release/board triage tags (#3021/#3025) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Issues Fixed
Sentry ANTHIAS-Y —
OSError: [Errno 117] Structure needs cleaningin WhiteNoise's startup scan; 400+ events from a single crash-looping pi3.Description
Timeline correlation confirmed this surfaced with the
2026.6.2+rev1OTA (fleet release 14:26 UTC → first event 15:40 UTC): the deploy rewrote the staticfiles image layer onto a device whose ext4 metadata is corrupted, and the first read ofadmin/js/vendor/select2/i18n/gl.jsreturned EUCLEAN. The file is stock Django-admin vendor JS — the corruption is the device's storage, not the release's code.The failure mode is ours though: stock WhiteNoise lets the scan exception propagate out of ASGI import, so one unreadable vendor file kills uvicorn entirely — no API, no WebSocket, no asset serving. A signage appliance should degrade, not brick.
ResilientWhiteNoiseMiddlewaremirrors whitenoise 6.x'supdate_files_dictionary/scantree(both stable one-screen helpers) with per-entryOSErrortolerancestubs/whitenoise-stubs/(same pattern as channels-stubs) so the subclass passes strict mypyThe affected device itself still needs ops attention (fsck/reflash) — event geo says Jonzac, France.
Checklist
🤖 Generated with Claude Code