Skip to content

fix: resolve prefetch timeout and retry path failures (#67)#75

Merged
danmunz merged 1 commit into
mainfrom
fix/prefetch-timeout
Jun 4, 2026
Merged

fix: resolve prefetch timeout and retry path failures (#67)#75
danmunz merged 1 commit into
mainfrom
fix/prefetch-timeout

Conversation

@danmunz
Copy link
Copy Markdown
Owner

@danmunz danmunz commented Jun 4, 2026

Problem

Nitrowolf's 525-artwork collection gets ~150 cached thumbnails but the remaining ~375 always show "Tap to Retry". v1.3.0 addressed connection management but the overnight prefetch still fails 100%.

Root Causes Identified

  1. _PREFETCH_TIMEOUT = 5 too short — The D2D socket protocol (WS request → TV processing → D2D connection info → TCP socket → file transfer) needs ~10-18s. The 5s timeout caused every prefetch attempt to fail immediately. The ~150 cached thumbnails came from get_thumbnail_list (batch call, 18s timeout) — the individual prefetch path has likely never succeeded.

  2. "Tap to retry" used batch endpointshowThumbErrorloadThumbnailBatch([cid]) → POST /api/thumbnails, subject to circuit breaker. Should use GET /api/thumbnail/{cid} (18s timeout, 3 retries, no circuit breaker).

  3. Invisible logging — All prefetch failures logged at DEBUG level, invisible at default INFO. Zero visibility into what's happening.

  4. Startup retry chain breaks permanently — Auto-retry only fired for source="startup", so user-triggered prefetches never scheduled retries. The retry also silently skipped if another prefetch was running.

  5. retryAllThumbnails race — Fire-and-forget /thumbnails/retry + simultaneous loadThumbnailBatch → first batch immediately re-trips circuit breaker that was just reset.

Changes

File Change
server.py Remove _PREFETCH_TIMEOUT — use default 18s timeout
server.py Prefetch logging: failures→WARNING, successes→INFO, zero-cached→WARNING
server.py Auto-retry fires regardless of source, waits for in-progress prefetch
server.py Add GET /api/thumbnails/diag — cache count, prefetch state, circuit breaker status
index.html "Tap to retry" → retrySingleThumb(cid) using GET /api/thumbnail/{cid}
index.html retryAllThumbnails → async, await retry before polling, set fallback mode
tests/ 5 new tests: timeout removal, constant absence, diag endpoint fields/cache/breaker

Tradeoff

Worst-case lock hold before abort increases from 55s to 120s. This is acceptable because:

Testing

PYTHONPATH=. uv run pytest -x -q
120 passed, 6 xfailed

Known Limitations

  • If the SSL EOF error (UNEXPECTED_EOF_WHILE_READING) affects individual get_thumbnail() calls too (not just get_thumbnail_list), then the timeout fix alone won't resolve Nitrowolf's issue. The logging upgrade will make this visible.
  • retryAllThumbnails still has a timing window: 8 retries × 3s = 24s polling, but 400 thumbnails × 2-18s = much longer prefetch. Each "Retry All" click will show some more thumbnails as they cache.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8ff5c3a769

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread index.html Outdated
function retryAllThumbnails() {
async function retrySingleThumb(cid) {
try {
const resp = await fetch(BASE + '/thumbnail/' + encodeURIComponent(cid));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Define the API URL for single-thumbnail retries

When a user clicks a failed thumbnail, this new path evaluates BASE before issuing the request, but there is no BASE definition anywhere in index.html (the existing api() helper hardcodes the /api prefix). In that retry scenario the click handler throws a ReferenceError, so the intended GET /api/thumbnail/{cid} request is never sent and “Tap to retry” remains broken.

Useful? React with 👍 / 👎.

- Remove _PREFETCH_TIMEOUT=5 constant — was too short for D2D socket
  protocol, causing 100% prefetch failure. Prefetch now uses the default
  TV_TIMEOUT+8 (18s), matching the working single-thumbnail endpoint.

- Upgrade prefetch logging from DEBUG to WARNING/INFO so failures are
  visible at default log level. Add warning when 0 thumbnails cached.

- Fix startup retry chain — auto-retry now fires regardless of source
  (was startup-only) and waits for in-progress prefetch instead of
  silently abandoning the retry loop.

- Add GET /api/thumbnails/diag endpoint for remote diagnostics (cache
  count, prefetch state, circuit breaker status). No TV connection needed.

- Fix 'Tap to retry' to use GET /api/thumbnail/{cid} (full timeout,
  3 retries, bypasses circuit breaker) instead of the batch endpoint.

- Fix retryAllThumbnails race: await /thumbnails/retry before polling
  to prevent immediately re-tripping the circuit breaker.

Worst-case arithmetic: 5 failures × 18s = 90s lock hold before abort,
vs previous 5 × 5s = 25s — acceptable tradeoff since 5s caused 0%
success rate on the D2D transfer protocol.
@danmunz danmunz force-pushed the fix/prefetch-timeout branch from 8ff5c3a to 4650503 Compare June 4, 2026 03:31
@danmunz danmunz merged commit 7818d6e into main Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant