fix: resolve prefetch timeout and retry path failures (#67)#75
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8ff5c3a769
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| function retryAllThumbnails() { | ||
| async function retrySingleThumb(cid) { | ||
| try { | ||
| const resp = await fetch(BASE + '/thumbnail/' + encodeURIComponent(cid)); |
There was a problem hiding this comment.
Define the API URL for single-thumbnail retries
When a user clicks a failed thumbnail, this new path evaluates BASE before issuing the request, but there is no BASE definition anywhere in index.html (the existing api() helper hardcodes the /api prefix). In that retry scenario the click handler throws a ReferenceError, so the intended GET /api/thumbnail/{cid} request is never sent and “Tap to retry” remains broken.
Useful? React with 👍 / 👎.
- Remove _PREFETCH_TIMEOUT=5 constant — was too short for D2D socket
protocol, causing 100% prefetch failure. Prefetch now uses the default
TV_TIMEOUT+8 (18s), matching the working single-thumbnail endpoint.
- Upgrade prefetch logging from DEBUG to WARNING/INFO so failures are
visible at default log level. Add warning when 0 thumbnails cached.
- Fix startup retry chain — auto-retry now fires regardless of source
(was startup-only) and waits for in-progress prefetch instead of
silently abandoning the retry loop.
- Add GET /api/thumbnails/diag endpoint for remote diagnostics (cache
count, prefetch state, circuit breaker status). No TV connection needed.
- Fix 'Tap to retry' to use GET /api/thumbnail/{cid} (full timeout,
3 retries, bypasses circuit breaker) instead of the batch endpoint.
- Fix retryAllThumbnails race: await /thumbnails/retry before polling
to prevent immediately re-tripping the circuit breaker.
Worst-case arithmetic: 5 failures × 18s = 90s lock hold before abort,
vs previous 5 × 5s = 25s — acceptable tradeoff since 5s caused 0%
success rate on the D2D transfer protocol.
8ff5c3a to
4650503
Compare
Problem
Nitrowolf's 525-artwork collection gets ~150 cached thumbnails but the remaining ~375 always show "Tap to Retry". v1.3.0 addressed connection management but the overnight prefetch still fails 100%.
Root Causes Identified
_PREFETCH_TIMEOUT = 5too short — The D2D socket protocol (WS request → TV processing → D2D connection info → TCP socket → file transfer) needs ~10-18s. The 5s timeout caused every prefetch attempt to fail immediately. The ~150 cached thumbnails came fromget_thumbnail_list(batch call, 18s timeout) — the individual prefetch path has likely never succeeded."Tap to retry" used batch endpoint —
showThumbError→loadThumbnailBatch([cid])→ POST /api/thumbnails, subject to circuit breaker. Should use GET /api/thumbnail/{cid} (18s timeout, 3 retries, no circuit breaker).Invisible logging — All prefetch failures logged at
DEBUGlevel, invisible at defaultINFO. Zero visibility into what's happening.Startup retry chain breaks permanently — Auto-retry only fired for
source="startup", so user-triggered prefetches never scheduled retries. The retry also silently skipped if another prefetch was running.retryAllThumbnailsrace — Fire-and-forget/thumbnails/retry+ simultaneousloadThumbnailBatch→ first batch immediately re-trips circuit breaker that was just reset.Changes
_PREFETCH_TIMEOUT— use default 18s timeoutsource, waits for in-progress prefetchGET /api/thumbnails/diag— cache count, prefetch state, circuit breaker statusretrySingleThumb(cid)using GET /api/thumbnail/{cid}retryAllThumbnails→ async, await retry before polling, set fallback modeTradeoff
Worst-case lock hold before abort increases from 55s to 120s. This is acceptable because:
Testing
Known Limitations
UNEXPECTED_EOF_WHILE_READING) affects individualget_thumbnail()calls too (not justget_thumbnail_list), then the timeout fix alone won't resolve Nitrowolf's issue. The logging upgrade will make this visible.retryAllThumbnailsstill has a timing window: 8 retries × 3s = 24s polling, but 400 thumbnails × 2-18s = much longer prefetch. Each "Retry All" click will show some more thumbnails as they cache.