Skip to content

fix(webview): add page-load watchdog so a stalled fetch can't freeze the display#3003

Merged
vpetersson merged 1 commit into
masterfrom
fix-webpage-load-watchdog
Jun 7, 2026
Merged

fix(webview): add page-load watchdog so a stalled fetch can't freeze the display#3003
vpetersson merged 1 commit into
masterfrom
fix-webpage-load-watchdog

Conversation

@vpetersson

Copy link
Copy Markdown
Contributor

Issues Fixed

Fixes #2999

Description

QtWebEngine has no overall navigation timeout. A webpage fetch interrupted mid-flight (WiFi AP dropout — the TCP socket stays ESTABLISHED, no FIN/RST ever arrives) keeps the navigation pending forever: loadFinished never fires, the dual-view swap never happens, and the screen freezes on the previous asset until the container is restarted. The Python scheduler keeps cycling normally (loadPage is fire-and-forget D-Bus), which is exactly the symptom set reported in the issue. Worse, view_webpage() only re-sends loadPage when the URL changes, so a single-webpage playlist can never recover even after connectivity returns.

This adds the missing safety net on the C++ side, analogous to the old VIDEO_TIMEOUT for stalled videos:

  • A single-shot page-load watchdog armed per load attempt (default 30 s; tunable via ANTHIAS_WEBPAGE_TIMEOUT_S on the viewer container, clamped to 5–3600 s).
  • On timeout it stop()s the wedged navigation — cancelling the pending network I/O so dead sockets don't accumulate in Chromium's connection pool — and retries the same URI on a fresh request.
  • Fast failures (DNS / connection refused while the network is down) reuse the watchdog as a paced retry tick, so single-webpage playlists self-heal too.
  • The watchdog is disarmed by every path that supersedes the pending load (successful swap, loadImage, playVideo); the existing generation-ID guard keeps stale retries from racing newer assets.

Repro/validation (x86 viewer image on the dev host, webview built from this branch, offscreen QPA, driven over D-Bus like the real viewer; tarpit HTTP server hangs the first /stall-once fetch mid-body and serves it normally on retry):

  • Unpatched: stalled load shows no loadFinished for 90 s+, screen frozen on previous asset, no recovery path.
  • Patched (10 s watchdog for the test): Webpage load did not finish within 10 seconds — cancelling and retrying → retry succeeds → Switching to next web view, with no further D-Bus traffic; healthy pages never trip the watchdog; rotating to an image disarms it (no stale retry).

Also verified the Qt5 (armv7) path still builds: tools.image_builder --build-target pi2 --service viewer succeeds and the resulting binary contains the watchdog. Existing webview QtTest suite passes 9/9.

Checklist

  • I have performed a self-review of my own code.
  • New and existing unit tests pass locally and on CI with my changes.
  • I have done an end-to-end test for Raspberry Pi devices.
  • I have tested my changes for x86 devices.
  • I added a documentation for the changes I have made (when necessary).

🤖 Generated with Claude Code

…the display

- Chromium has no overall navigation timeout: a fetch whose packets
  stop arriving mid-load (WiFi dropout, no FIN/RST) keeps loadFinished
  from ever firing, so the dual-view swap never happens and the screen
  freezes on the previous asset until the container is restarted
- Arm a single-shot QTimer per load attempt (default 30s, tunable via
  ANTHIAS_WEBPAGE_TIMEOUT_S, clamped to 5-3600s)
- On timeout, stop() the wedged navigation (cancelling its pending
  network I/O so dead sockets don't pile up in the connection pool)
  and retry the same URI on a fresh request
- Failed-fast loads (DNS / connection refused) reuse the watchdog as
  a paced retry tick - the viewer only re-sends loadPage when the URL
  changes, so a single-webpage playlist gets no other retry
- Disarm the watchdog on every path that supersedes the pending load
  (success, loadImage, playVideo)

Fixes #2999

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vpetersson vpetersson requested a review from a team as a code owner June 6, 2026 20:35
@vpetersson vpetersson self-assigned this Jun 6, 2026
@vpetersson vpetersson requested a review from Copilot June 6, 2026 20:35
@sonarqubecloud

sonarqubecloud Bot commented Jun 6, 2026

Copy link
Copy Markdown

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a C++/Qt-side navigation watchdog to AnthiasWebview so stalled QtWebEngine page loads don’t block the dual-view swap indefinitely (fixing the “display freezes on prior asset after network dropout” failure mode described in #2999).

Changes:

  • Introduces a single-shot QTimer page-load watchdog with a configurable timeout (ANTHIAS_WEBPAGE_TIMEOUT_S, clamped to 5–3600s; default 30s).
  • Refactors webpage loading into startPageLoad() to (re)arm the watchdog per attempt, cancel in-flight navigations safely, and attach a one-shot loadFinished handler guarded by generation ID.
  • On watchdog expiry, cancels the wedged navigation and retries the same URI; stops/clears the watchdog when rotating away to image/video.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/anthias_webview/src/view.h Adds watchdog-related members and new helper methods (startPageLoad, handlePageLoadTimeout) to support safe retries.
src/anthias_webview/src/view.cpp Implements timeout parsing/clamping, watchdog lifecycle, refactored page-load flow, and timeout-triggered cancel+retry behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@vpetersson vpetersson merged commit 6f01e0a into master Jun 7, 2026
8 checks passed
@vpetersson vpetersson mentioned this pull request Jun 7, 2026
5 tasks
vpetersson added a commit that referenced this pull request Jun 7, 2026
- CalVer (YYYY.0M.MICRO); still June 2026, micro 1 -> 2
- Ships the Qt 6 video audio fix (#3001) — PulseAudio in the viewer
  container; videos were silent on pi4-64/pi5/x86/arm64 since the
  QtMultimedia migration
- Adds the arm64/Qt6 pi3-64 board and the Rock Pi 4 fleet (#2985)
- Page-load watchdog so a stalled fetch can't freeze the display
  (#3003), Sentry error tracking for the Django services (#3007)
- Redis data persisted to the mounted volume so device identity
  survives recreation (#2983); unpinner also rolls OS + supervisor
  updates (#2984)
- Streamed backup downloads (#3005), 12-hour AM/PM asset times
  (#3002), BuildKit frontend via mirror.gcr.io (#3008)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] x86: Renderer hangs indefinitely on network interruption mid-page-load — no webpage load timeout

2 participants