Report
From u/Feastweasel (v1.0.1, 524 artworks, Docker):
Now it doesn't work at all. It reads the cached thumbs, but that's it.
Every TV operation (select, matte, thumbnails) fails with ConnectionFailure and {'reason': 'socket closed'}. The "TV is listening" indicator shows green, the TV is awake, and AI analysis (which doesn't touch the TV) works fine. Docker stop also works now.
Additionally, the UI freezes for 10-45 seconds when clicking buttons (e.g., Settings) while thumbnail fetches are in progress. Operations execute out of order — a stalled Settings dialog pops up over a later-opened modal.
Root Cause Analysis
There are 6 interconnected bugs, with #1 as the likely primary trigger.
Bug 1 (Critical): Background thumbnail pre-fetch bypasses _tv_lock
_fetch_thumbnails_sync() (~line 239) runs in a thread pool executor (~line 451) and opens its own WebSocket connection without acquiring _tv_lock. This creates a concurrent WebSocket to the TV while _tv_lock-protected operations are also connecting.
The Samsung Frame's WebSocket server does not handle concurrent connections well — it drops one or both, causing {'reason': 'socket closed'}. Every subsequent _tv_op then fails because the TV's WebSocket server is in a confused/recovering state. The retry logic makes this worse by hammering the TV with more connection attempts.
This is the smoking gun. The background pre-fetch races with the frontend's /api/thumbnails batch requests, and the TV kills all sockets.
Bug 2 (Critical): Every _tv_op opens and closes a new WebSocket
_tv_op() (~line 166) creates a brand new SamsungTVWS instance, opens a WebSocket (art.open()), runs the operation, and calls tv.close() — every single time.
With 524 artworks on first load:
/api/info → 1 WebSocket open/close
/api/art → 1 WebSocket open/close
/api/mattes → 1 WebSocket open/close
- Background
_fetch_thumbnails_sync → 1 long-lived WebSocket (no lock!)
- ~53
/api/thumbnails batches → 53 WebSocket open/close cycles
- = ~56+ WebSocket connect/disconnect cycles in rapid succession
The TV's WebSocket server likely rate-limits or crashes under this churn.
Bug 3 (High): Global _tv_lock serializes ALL TV ops — lock starvation
_tv_lock is a single asyncio.Lock() (~line 88). Every TV operation competes for it: thumbnails, mattes, select, info, art list, favorites, slideshow, filters — everything.
With 53 thumbnail batches queued, a user's "change matte" request sits at position 54 in the queue. Each batch takes 2-4 seconds (connection + fetch + close), so the user waits 60+ seconds for their click to execute.
This explains the 10-45 second stall on the Settings button and the out-of-order execution: the settings fetch was queued behind thumbnail batches, and by the time it completed, the user had opened another modal.
Bug 4 (Medium): "TV is listening" indicator never updates
The frontend calls /api/info once during init() (~line 2405 in index.html). If it succeeds, the green dot is set permanently. There is no heartbeat, no periodic recheck, no update on failure. The user sees "TV is listening" while every subsequent operation fails.
Bug 5 (High): Failed retries hold the lock for up to ~58 seconds
When a _tv_op fails and retries (3 attempts × TV_TIMEOUT + WoL delay), it holds _tv_lock the entire time. Worst case: 3 × (timeout + 8s) + 2 × 2s = ~58 seconds of lock hold per single failed operation. With multiple failed operations queued, the total lockout can exceed several minutes.
Bug 6 (High): Frontend fires ~53 serial thumbnail requests for 524 artworks
The IntersectionObserver + loadThumbnailBatch() (~line 2773 in index.html) slices visible thumbnails into batches of 10 and fires them sequentially. Each batch → one _tv_op() → one WebSocket round-trip.
Timeline of Failure (524 artworks, fresh start)
0.0s /api/info → lock acquired, WebSocket #1 open/close → GREEN DOT ✅
0.5s /api/art → lock acquired, WebSocket #2 open/close → 524 items
1.0s Background prefetch → WebSocket #3 opened WITHOUT LOCK ⚠️
1.0s /api/mattes → waiting for lock...
1.2s /api/thumbnails batch 1 → waiting for lock...
1.4s /api/thumbnails batch 2 → waiting for lock...
...
/api/mattes finally gets lock → WebSocket #4 vs concurrent #3 → SOCKET CLOSED 💥
All subsequent _tv_ops fail — TV WebSocket is confused
Each failure triggers 3 retries × 2s delay = lock held 10-20s per failure
User clicks Settings → queued at position 50+ → waits 60+ seconds
User clicks Matte → queued behind Settings → opens after Settings, out of order
Proposed Fixes
Immediate (P0)
-
Make _fetch_thumbnails_sync acquire _tv_lock — or better, remove it entirely and let the existing /api/thumbnails endpoint handle all fetches. No concurrent unguarded connections.
-
Reuse a persistent WebSocket connection instead of open/close per operation. Create a connection pool (size 1) or a long-lived SamsungTVWS instance that reconnects on failure. This eliminates the 56+ connect/disconnect churn.
Short-term (P1)
-
Separate thumbnail lock from user-action lock — or use a priority queue so user-initiated actions (select, matte, settings) jump ahead of background thumbnail fetches. Alternatively, cancel in-progress thumbnail batches when a user action arrives.
-
Larger thumbnail batches — fetch all missing thumbnails in a single _tv_op call instead of batches of 10. One WebSocket round-trip instead of 53.
-
Update "TV is listening" on failure — if any _tv_op fails after exhausting retries, flip the indicator to red/yellow. Add a periodic heartbeat (every 30s).
Medium-term (P2)
-
Non-blocking retry — release _tv_lock between retry attempts so other operations can proceed while waiting for the TV to wake up.
-
Request deduplication — if the user clicks a button while thumbnails are loading, cancel or deprioritize the thumbnail queue.
Affected code
| Location |
Issue |
server.py ~L239-268 |
_fetch_thumbnails_sync — unguarded concurrent WebSocket |
server.py ~L166-173 |
_tv_op — new WebSocket per call |
server.py ~L88 |
_tv_lock — single global lock |
server.py ~L175-190 |
Retry logic holds lock during delays |
server.py ~L451-452 |
Background pre-fetch launch (no lock) |
index.html ~L2773-2807 |
Thumbnail batching (groups of 10) |
index.html ~L2405-2416 |
TV status indicator (set once, never updated) |
Environment
- Docent v1.0.1 (Docker)
- 524 artworks on Samsung Frame
- TV awake, on local network
- AI analysis (OpenAI) works fine (no TV connection needed)
Local Reproduction (61 artworks)
Reproduced on a local setup with only 61 artworks (vs Feastweasel's 524). Results confirm all root causes.
Test: Simulated page load + 6 concurrent thumbnail batches + user action
=== PAGE LOAD (3 concurrent requests) ===
19:59:48 → 19:59:54 (6 seconds for 3 requests that should take ~1s each)
Requests serialized through _tv_lock — each waited for the previous one.
=== THUMBNAIL STORM (6 concurrent batches of 10) ===
batch1 = 3.12s ← queued behind batches that arrived first
batch2 = 2.08s
batch3 = 1.04s ← first to acquire lock
batch4 = 4.16s
batch5 = 5.18s
batch6 = 6.22s ← last in queue, waited for all 5 before it
Total wall time: 6 seconds (perfectly serialized staircase)
=== USER SELECT (queued behind all batches) ===
select = 1.05s ← waited until all 6 batches drained
Started at 20:00:00, right after storm ended at 20:00:00
Key findings
| Metric |
61 artworks (local) |
524 artworks (projected) |
| WebSocket connections opened |
11 in 12 seconds |
~56+ in rapid succession |
| Thumbnail batch requests |
6 |
~53 |
| Lock starvation (user action delayed) |
~7 seconds |
~53 seconds |
| Background pre-fetch race |
Not triggered (0 new IDs) |
Fires on every fresh start |
All 6 thumbnail batch fetches failed
WARNING docent: Batch thumbnail fetch failed: `get_thumbnail_list` request failed with error number -1
WARNING docent: Batch thumbnail fetch failed: `get_thumbnail_list` request failed with error number -1
WARNING docent: Batch thumbnail fetch failed: `get_thumbnail_list` request failed with error number -1
WARNING docent: Batch thumbnail fetch failed: `get_thumbnail_list` request failed with error number -1
WARNING docent: Batch thumbnail fetch failed: `get_thumbnail_list` request failed with error number -1
WARNING docent: Batch thumbnail fetch failed: `get_thumbnail_list` request failed with error number -1
Every batch hit get_thumbnail_list error -1. Thumbnails only appear to work because they're served from disk cache. The live TV thumbnail fetch path is broken even on a small catalog.
Conclusion
The bug is fully reproducible at 61 artworks. At 524 artworks, the lock starvation scales linearly (~53s), and the background pre-fetch race (Bug #1) would also trigger on fresh starts, compounding into the total failure Feastweasel reported.
Report
From u/Feastweasel (v1.0.1, 524 artworks, Docker):
Every TV operation (
select,matte,thumbnails) fails withConnectionFailureand{'reason': 'socket closed'}. The "TV is listening" indicator shows green, the TV is awake, and AI analysis (which doesn't touch the TV) works fine. Docker stop also works now.Additionally, the UI freezes for 10-45 seconds when clicking buttons (e.g., Settings) while thumbnail fetches are in progress. Operations execute out of order — a stalled Settings dialog pops up over a later-opened modal.
Root Cause Analysis
There are 6 interconnected bugs, with #1 as the likely primary trigger.
Bug 1 (Critical): Background thumbnail pre-fetch bypasses
_tv_lock_fetch_thumbnails_sync()(~line 239) runs in a thread pool executor (~line 451) and opens its own WebSocket connection without acquiring_tv_lock. This creates a concurrent WebSocket to the TV while_tv_lock-protected operations are also connecting.The Samsung Frame's WebSocket server does not handle concurrent connections well — it drops one or both, causing
{'reason': 'socket closed'}. Every subsequent_tv_opthen fails because the TV's WebSocket server is in a confused/recovering state. The retry logic makes this worse by hammering the TV with more connection attempts.This is the smoking gun. The background pre-fetch races with the frontend's
/api/thumbnailsbatch requests, and the TV kills all sockets.Bug 2 (Critical): Every
_tv_opopens and closes a new WebSocket_tv_op()(~line 166) creates a brand newSamsungTVWSinstance, opens a WebSocket (art.open()), runs the operation, and callstv.close()— every single time.With 524 artworks on first load:
/api/info→ 1 WebSocket open/close/api/art→ 1 WebSocket open/close/api/mattes→ 1 WebSocket open/close_fetch_thumbnails_sync→ 1 long-lived WebSocket (no lock!)/api/thumbnailsbatches → 53 WebSocket open/close cyclesThe TV's WebSocket server likely rate-limits or crashes under this churn.
Bug 3 (High): Global
_tv_lockserializes ALL TV ops — lock starvation_tv_lockis a singleasyncio.Lock()(~line 88). Every TV operation competes for it: thumbnails, mattes, select, info, art list, favorites, slideshow, filters — everything.With 53 thumbnail batches queued, a user's "change matte" request sits at position 54 in the queue. Each batch takes 2-4 seconds (connection + fetch + close), so the user waits 60+ seconds for their click to execute.
This explains the 10-45 second stall on the Settings button and the out-of-order execution: the settings fetch was queued behind thumbnail batches, and by the time it completed, the user had opened another modal.
Bug 4 (Medium): "TV is listening" indicator never updates
The frontend calls
/api/infoonce duringinit()(~line 2405 in index.html). If it succeeds, the green dot is set permanently. There is no heartbeat, no periodic recheck, no update on failure. The user sees "TV is listening" while every subsequent operation fails.Bug 5 (High): Failed retries hold the lock for up to ~58 seconds
When a
_tv_opfails and retries (3 attempts × TV_TIMEOUT + WoL delay), it holds_tv_lockthe entire time. Worst case:3 × (timeout + 8s) + 2 × 2s = ~58 secondsof lock hold per single failed operation. With multiple failed operations queued, the total lockout can exceed several minutes.Bug 6 (High): Frontend fires ~53 serial thumbnail requests for 524 artworks
The
IntersectionObserver+loadThumbnailBatch()(~line 2773 in index.html) slices visible thumbnails into batches of 10 and fires them sequentially. Each batch → one_tv_op()→ one WebSocket round-trip.Timeline of Failure (524 artworks, fresh start)
Proposed Fixes
Immediate (P0)
Make
_fetch_thumbnails_syncacquire_tv_lock— or better, remove it entirely and let the existing/api/thumbnailsendpoint handle all fetches. No concurrent unguarded connections.Reuse a persistent WebSocket connection instead of open/close per operation. Create a connection pool (size 1) or a long-lived
SamsungTVWSinstance that reconnects on failure. This eliminates the 56+ connect/disconnect churn.Short-term (P1)
Separate thumbnail lock from user-action lock — or use a priority queue so user-initiated actions (select, matte, settings) jump ahead of background thumbnail fetches. Alternatively, cancel in-progress thumbnail batches when a user action arrives.
Larger thumbnail batches — fetch all missing thumbnails in a single
_tv_opcall instead of batches of 10. One WebSocket round-trip instead of 53.Update "TV is listening" on failure — if any
_tv_opfails after exhausting retries, flip the indicator to red/yellow. Add a periodic heartbeat (every 30s).Medium-term (P2)
Non-blocking retry — release
_tv_lockbetween retry attempts so other operations can proceed while waiting for the TV to wake up.Request deduplication — if the user clicks a button while thumbnails are loading, cancel or deprioritize the thumbnail queue.
Affected code
server.py~L239-268_fetch_thumbnails_sync— unguarded concurrent WebSocketserver.py~L166-173_tv_op— new WebSocket per callserver.py~L88_tv_lock— single global lockserver.py~L175-190server.py~L451-452index.html~L2773-2807index.html~L2405-2416Environment
Local Reproduction (61 artworks)
Reproduced on a local setup with only 61 artworks (vs Feastweasel's 524). Results confirm all root causes.
Test: Simulated page load + 6 concurrent thumbnail batches + user action
Key findings
All 6 thumbnail batch fetches failed
Every batch hit
get_thumbnail_list error -1. Thumbnails only appear to work because they're served from disk cache. The live TV thumbnail fetch path is broken even on a small catalog.Conclusion
The bug is fully reproducible at 61 artworks. At 524 artworks, the lock starvation scales linearly (~53s), and the background pre-fetch race (Bug #1) would also trigger on fresh starts, compounding into the total failure Feastweasel reported.