Add multi-worker serve mode for Sendspin Party by maximmaxim345 · Pull Request #199 · Sendspin/sendspin-cli

maximmaxim345 · 2026-03-27T14:24:29Z

Summary

This PR adds multi-worker support to sendspin serve so multiple worker processes can accept Sendspin Party listeners while sharing the
same decoded audio timeline.

It introduces a coordinator/worker architecture where:

the coordinator decodes the source once and fans out timestamped PCM chunks
each worker runs its own HTTP/WebSocket server and serves the embedded web player
the web UI can report total connected listeners across workers

Changes

add --workers to sendspin serve
add multi-worker serve coordinator/worker implementation
add IPC message types for coordinator/worker communication
add /api/status endpoint for total listener count
update the embedded web player to poll and display listener count
reject --client when used with multi-worker mode
pause decoding when all listeners disconnect in multi-worker mode
handle worker startup failures and unexpected worker crashes correctly
fix worker reconnect behavior after the last client disconnects
document multi-worker mode in README.md

Notes

Multi-worker mode runs one HTTP/WebSocket server per worker
place a reverse proxy/load balancer in front of the worker ports
--client remains supported in single-worker serve mode only

asyncio.wait_for + run_in_executor with a blocking queue.get() leaves orphaned threads that consume messages and discard them when the future is cancelled. Use queue.get(timeout=0.5) instead so the thread exits cleanly on timeout.

The web player on a worker port (e.g. :9002) fetches the coordinator's /api/status (e.g. :9000) which is cross-origin. Add Access-Control-Allow-Origin: * to allow the request.

Cancel the running task on SIGINT so it breaks out of blocking audio decode. Ignore repeated Ctrl+C after the first one.

When the last client disconnects from a worker, the stream is stopped. Catch StreamStoppedError and clear the stream reference so the worker skips audio chunks until a new client connects and creates a fresh stream.

- Add worker crash detection: coordinator checks process liveness every 30s and removes dead workers from the redirect pool. Shuts down if all workers crash. - Validate --workers >= 1 in CLI - Add CORS header to single-worker /api/status for consistency - Use _queue.Empty instead of bare Exception in _drain_status_queue - Remove unused TYPE_CHECKING import and empty pass block - Redirect handler now uses live _active_worker_ports (reflects crash removal) - Frontend: hide listener count when stopped, show min 1 when listening

…count

Track already-reported crashes so _check_worker_health doesn't print the same crash message every 30s for workers that are already dead.

Clear _active_group alongside _stream on StreamStoppedError so that the next connecting client creates a fresh group and stream instead of being added to a stale group with no active stream.

The coordinator was pacing itself to real-time (sleeping when ahead > 0), resulting in only ~250ms of buffer on workers. Allow up to 5s of buffer ahead, matching single-worker mode's max_buffer_us=5_000_000.

Workers now use consecutive ports starting at --port instead of --port+1.

Prevents reconnect race where a client joins a dead group whose stream was already stopped by aiosendspin.

Multi-worker mode does not support outbound client connections. Previously the --client flag was silently ignored.

Track successful vs failed workers separately. Exit if all workers fail. Remove dead audio queues for failed workers.

Previously only shut down when all workers crashed.

Coordinator now loops between waiting for clients and streaming, instead of continuously decoding when nobody is listening.

djwmarcx · 2026-04-06T08:44:56Z

Bug: sendspin serve crashes on Python 3.12 with:

TypeError: type 'Synchronized' is not subscriptable

Fix: Add from __future__ import annotations at the top of sendspin/serve/server.py:

"""Custom SendspinServer with embedded web player."""

from __future__ import annotations

from importlib.resources import files

Copilot

Pull request overview

Adds a multi-worker execution mode to sendspin serve, introducing a coordinator/worker model where the coordinator decodes once and fans out timestamped PCM to multiple worker HTTP/WebSocket servers, and the embedded web UI can display total listener count across workers.

Changes:

Add --workers to sendspin serve and route to a new multi-worker coordinator (run_server_multi).
Introduce coordinator/worker subprocess implementation plus IPC message types for audio and status reporting.
Add /api/status endpoint and update embedded web UI to poll and display listener count; add tests for IPC/coordinator/worker behavior.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`sendspin/cli.py`	Adds `--workers` flag and dispatches to multi-worker serve mode; rejects `--client` when `--workers > 1`.
`sendspin/serve/__init__.py`	Adds `run_server_multi()` entrypoint to run the coordinator.
`sendspin/serve/coordinator.py`	New coordinator that spawns workers, decodes audio, fans out PCM chunks, aggregates listener counts, and handles shutdown/health.
`sendspin/serve/worker.py`	New worker subprocess that runs a server on an assigned port, manages client groups/streams, and consumes audio IPC.
`sendspin/serve/ipc.py`	New picklable dataclasses for coordinator↔worker IPC.
`sendspin/serve/server.py`	Adds `/api/status` endpoint and optional shared total listener counter support.
`sendspin/serve/web/index.html`	Adds listener count UI element.
`sendspin/serve/web/styles.css`	Styles listener count element and visibility transitions.
`sendspin/serve/web/app.js`	Polls `/api/status` while listening and updates/hides the listener count UI.
`tests/serve/test_ipc.py`	Validates IPC message types are picklable.
`tests/serve/test_coordinator.py`	Unit tests for coordinator listener aggregation and worker startup/crash handling.
`tests/serve/test_worker.py`	Unit tests for worker startup signaling, chunk processing, and disconnect cleanup behavior.
`tests/serve/test_integration.py`	Spawns workers and validates `/api/status` responds on each worker port.
`README.md`	Documents multi-worker serve mode and reverse-proxy/load-balancer guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-28T13:50:03Z

+    async def _wait_for_workers_listening(self) -> int:
+        """Wait for all workers to report status. Returns count of healthy workers."""
+        loop = asyncio.get_running_loop()
+        listening_count = 0
+        error_count = 0
+        failed_workers: set[int] = set()
+
+        while (listening_count + error_count) < self.workers:
+            msg = await loop.run_in_executor(None, self._status_queue.get)
+


_wait_for_workers_listening() waits on status_queue.get() with no timeout and no worker liveness checks. If a worker dies before it can post WorkerListening/WorkerError (e.g., import error, hard crash), the coordinator can hang indefinitely (and tests that call this can hang as well). Consider adding a startup timeout and/or periodically checking Process.is_alive() to fail fast and mark that worker as failed.

Copilot · 2026-04-28T13:50:04Z

+    """Entry point for the worker subprocess."""
+    logging.basicConfig(
+        level=getattr(logging, log_level.upper(), logging.INFO),
+        format=f"%(asctime)s %(levelname)s [W{worker_id}] %(message)s",
+    )
+    worker = ServeWorker(


Multi-worker mode doesn’t apply the Windows ConnectionResetError suppression that single-worker run_server() uses (sendspin/serve/__init__.py sets a custom exception handler on win32). Since workers use asyncio.run() with the default event loop, client disconnects may reintroduce noisy WinError 10054 logs in multi-worker mode. Consider setting the same exception handler (or equivalent) in worker_main on Windows before running the worker loop.

Copilot · 2026-04-28T13:50:04Z

+    if (resp.ok) {
+      const data = await resp.json();
+      const count = Math.max(data.total_clients ?? 0, 1);
+      elements.listenerCountValue.textContent = String(count);
+      elements.listenerCount.setAttribute("aria-hidden", "false");
+    }
+  } catch {
+    // Silently ignore - server may be unavailable


updateListenerCount() forces the displayed total to be at least 1 (Math.max(..., 1)). This can over-report listeners when the API returns 0 (e.g., brief race before coordinator updates counts, or if the socket disconnects but polling continues). Consider displaying the API value as-is (allow 0) and handling the “not yet available” case by hiding the element until a valid count is received.

Suggested change

if (resp.ok) {

const data = await resp.json();

const count = Math.max(data.total_clients ?? 0, 1);

elements.listenerCountValue.textContent = String(count);

elements.listenerCount.setAttribute("aria-hidden", "false");

}

} catch {

// Silently ignore - server may be unavailable

if (!resp.ok) {

elements.listenerCount.setAttribute("aria-hidden", "true");

return;

}

const data = await resp.json();

const rawCount = data.total_clients;

const count =

typeof rawCount === "number" && Number.isFinite(rawCount) && rawCount >= 0

? rawCount

: null;

if (count === null) {

elements.listenerCount.setAttribute("aria-hidden", "true");

return;

}

elements.listenerCountValue.textContent = String(count);

elements.listenerCount.setAttribute("aria-hidden", "false");

} catch {

// Silently ignore - server may be unavailable

elements.listenerCount.setAttribute("aria-hidden", "true");

Copilot · 2026-04-28T13:50:05Z

+import pytest
+from aiohttp import ClientSession
+
+from sendspin.serve.coordinator import ServeCoordinator
+
+
+@pytest.mark.asyncio
+async def test_multi_worker_starts_and_serves_status() -> None:
+    """Start coordinator with 2 workers, verify worker /api/status returns shared count."""
+    coordinator = ServeCoordinator(
+        source="http://retro.dancewave.online/retrodance.mp3",
+        source_format=None,
+        port=19800,
+        name="Integration Test",
+        workers=2,
+        log_level="WARNING",
+    )
+
+    coordinator._spawn_workers()
+
+    try:
+        await coordinator._wait_for_workers_listening()


This integration test binds to hard-coded ports (19800/19801). In CI or when running tests in parallel, these ports can already be in use, causing worker startup failures and potentially hanging indefinitely because _wait_for_workers_listening() has no timeout. Consider selecting free ports at runtime (e.g., bind a temporary socket to port 0 to pick an available base port) and/or wrapping the wait in asyncio.wait_for to ensure the test fails fast instead of hanging.

Suggested change

import pytest

from aiohttp import ClientSession

from sendspin.serve.coordinator import ServeCoordinator

@pytest.mark.asyncio

async def test_multi_worker_starts_and_serves_status() -> None:

"""Start coordinator with 2 workers, verify worker /api/status returns shared count."""

coordinator = ServeCoordinator(

source="http://retro.dancewave.online/retrodance.mp3",

source_format=None,

port=19800,

name="Integration Test",

workers=2,

log_level="WARNING",

)

coordinator._spawn_workers()

try:

await coordinator._wait_for_workers_listening()

import asyncio

import socket

import pytest

from aiohttp import ClientSession

from sendspin.serve.coordinator import ServeCoordinator

def _find_free_port_range(count: int) -> int:

"""Return the first port in a consecutive localhost port range that is currently free."""

while True:

probe = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

probe.bind(("127.0.0.1", 0))

start_port = probe.getsockname()[1]

probe.close()

sockets: list[socket.socket] = []

try:

for port in range(start_port, start_port + count):

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

sock.bind(("127.0.0.1", port))

sockets.append(sock)

except OSError:

for sock in sockets:

sock.close()

continue

for sock in sockets:

sock.close()

return start_port

@pytest.mark.asyncio

async def test_multi_worker_starts_and_serves_status() -> None:

"""Start coordinator with 2 workers, verify worker /api/status returns shared count."""

workers = 2

coordinator = ServeCoordinator(

source="http://retro.dancewave.online/retrodance.mp3",

source_format=None,

port=_find_free_port_range(workers + 1),

name="Integration Test",

workers=workers,

log_level="WARNING",

)

coordinator._spawn_workers()

try:

await asyncio.wait_for(coordinator._wait_for_workers_listening(), timeout=10)

Copilot · 2026-04-28T13:50:05Z

+                    chunk_msg = AudioChunk(
+                        pcm_bytes=pcm_chunk,
+                        sample_rate=fmt.sample_rate,
+                        bit_depth=fmt.bit_depth,
+                        channels=fmt.channels,
+                        play_start_us=play_start_us,
+                    )
+                    for queue in self._audio_queues:
+                        queue.put(chunk_msg)
+


Inside the async decode loop, multiprocessing.Queue.put() is called directly for each worker. Queue.put() can block (pipe backpressure, pickling cost), which will block the event loop and can delay status handling / shutdown responsiveness. Consider offloading fan-out to a thread (asyncio.to_thread/executor) or using a non-blocking/backpressured design (e.g., put_nowait with drop/slowdown) to keep the coordinator loop responsive.

Copilot · 2026-04-28T13:50:05Z

+
+    def _handle_sigint(self) -> None:
+        if self._shutdown_requested:
+            # Second Ctrl+C — force exit


In _handle_sigint, the comment says a second Ctrl+C will “force exit”, but the code just returns and keeps the custom SIGINT handler installed. This makes repeated Ctrl+C a no-op and can leave the process stuck if shutdown hangs. Consider either implementing an actual forced exit on the second SIGINT (e.g., raise KeyboardInterrupt/SystemExit or os._exit) or adjust the comment/behavior so the second signal is not ignored.

Suggested change

# Second Ctrl+C — force exit

# Second Ctrl+C — force exit

signal.signal(signal.SIGINT, signal.default_int_handler)

signal.raise_signal(signal.SIGINT)

maximmaxim345 added 30 commits March 26, 2026 17:18

feat(serve): add IPC message types for multi-worker mode

6d1a22f

feat(serve): add ServeWorker subprocess for multi-worker mode

90615ec

feat(serve): add coordinator URL injection and /api/status endpoint

b9f8dba

feat(serve): add ServeCoordinator for multi-worker orchestration

2b3b531

feat(serve): wire up --workers flag to multi-worker mode

ee4087e

feat(serve): add listener count display to web player

0127d72

test(serve): add integration test for multi-worker mode

6ad459f

fix(serve): add CORS header to coordinator /api/status endpoint

6162be7

The web player on a worker port (e.g. :9002) fetches the coordinator's /api/status (e.g. :9000) which is cross-origin. Add Access-Control-Allow-Origin: * to allow the request.

fix(serve): make Ctrl+C immediately cancel the coordinator

0358b69

Cancel the running task on SIGINT so it breaks out of blocking audio decode. Ignore repeated Ctrl+C after the first one.

feat(serve): log per-worker client counts every 30s

38e7ff3

fix(serve): handle StreamStoppedError when all clients disconnect

6b93907

When the last client disconnects from a worker, the stream is stopped. Catch StreamStoppedError and clear the stream reference so the worker skips audio chunks until a new client connects and creates a fresh stream.

refactor(serve): remove coordinator HTTP server, add shared listener …

a5b565b

…count

refactor(serve): replace coordinator_url with total_listeners in worker

8da8b4a

refactor(serve): replace coordinator_url with total_listeners in server

9ce4bec

refactor(serve): remove coordinator URL from web player

0b3d6c9

test(serve): update integration test for proxy-friendly architecture

f9a79d7

fix(serve): deduplicate worker crash messages in health check

c87ba90

Track already-reported crashes so _check_worker_health doesn't print the same crash message every 30s for workers that are already dead.

chore: fix stale comment referencing coordinator in web player

bfcbefc

fix(serve): reset active group when all clients disconnect from worker

b054ae5

Clear _active_group alongside _stream on StreamStoppedError so that the next connecting client creates a fresh group and stream instead of being added to a stale group with no active stream.

fix(serve): increase coordinator buffer to 5s for multi-worker mode

8fcb729

The coordinator was pacing itself to real-time (sleeping when ahead > 0), resulting in only ~250ms of buffer on workers. Allow up to 5s of buffer ahead, matching single-worker mode's max_buffer_us=5_000_000.

fix(serve): fix port off-by-one in multi-worker mode

62c8091

Workers now use consecutive ports starting at --port instead of --port+1.

fix(serve): clear worker group on last client disconnect

81a5bee

Prevents reconnect race where a client joins a dead group whose stream was already stopped by aiosendspin.

fix(serve): error when --client used with --workers

dc27b5f

Multi-worker mode does not support outbound client connections. Previously the --client flag was silently ignored.

fix(serve): handle worker startup failures properly

a28fc56

Track successful vs failed workers separately. Exit if all workers fail. Remove dead audio queues for failed workers.

fix(serve): shut down all workers if any single worker crashes

12bedb5

Previously only shut down when all workers crashed.

feat(serve): pause audio decoding when no clients connected

fd7cb39

Coordinator now loops between waiting for clients and streaming, instead of continuously decoding when nobody is listening.

chore(serve): remove dead _client_connected_event, update buffer comment

8251954

chore(serve): change listener count text to 'x tuned in'

0dac80e

maximmaxim345 added 2 commits March 27, 2026 15:09

docs: add multi-worker mode documentation to README

a762343

fix: serve partial worker startup handling

80be94b

balloob marked this pull request as ready for review April 28, 2026 13:43

Copilot AI review requested due to automatic review settings April 28, 2026 13:43

balloob added the new-feature Request or implement a new feature label Apr 28, 2026

balloob merged commit 9e91ded into main Apr 28, 2026
2 checks passed

balloob deleted the wip/scalable-demo branch April 28, 2026 13:43

Copilot started reviewing on behalf of balloob April 28, 2026 13:44 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-worker serve mode for Sendspin Party#199

Add multi-worker serve mode for Sendspin Party#199
balloob merged 32 commits intomainfrom
wip/scalable-demo

maximmaxim345 commented Mar 27, 2026

Uh oh!

djwmarcx commented Apr 6, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-import pytest
-from aiohttp import ClientSession
-from sendspin.serve.coordinator import ServeCoordinator
-@pytest.mark.asyncio
-async def test_multi_worker_starts_and_serves_status() -> None:
-    """Start coordinator with 2 workers, verify worker /api/status returns shared count."""
-    coordinator = ServeCoordinator(
-        source="http://retro.dancewave.online/retrodance.mp3",
-        source_format=None,
-        port=19800,
-        name="Integration Test",
-        workers=2,
-        log_level="WARNING",
-    )
-    coordinator._spawn_workers()
-    try:
-        await coordinator._wait_for_workers_listening()
+import asyncio
+import socket
+import pytest
+from aiohttp import ClientSession
+from sendspin.serve.coordinator import ServeCoordinator
+def _find_free_port_range(count: int) -> int:
+    """Return the first port in a consecutive localhost port range that is currently free."""
+    while True:
+        probe = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+        probe.bind(("127.0.0.1", 0))
+        start_port = probe.getsockname()[1]
+        probe.close()
+        sockets: list[socket.socket] = []
+        try:
+            for port in range(start_port, start_port + count):
+                sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+                sock.bind(("127.0.0.1", port))
+                sockets.append(sock)
+        except OSError:
+            for sock in sockets:
+                sock.close()
+            continue
+        for sock in sockets:
+            sock.close()
+        return start_port
+@pytest.mark.asyncio
+async def test_multi_worker_starts_and_serves_status() -> None:
+    """Start coordinator with 2 workers, verify worker /api/status returns shared count."""
+    workers = 2
+    coordinator = ServeCoordinator(
+        source="http://retro.dancewave.online/retrodance.mp3",
+        source_format=None,
+        port=_find_free_port_range(workers + 1),
+        name="Integration Test",
+        workers=workers,
+        log_level="WARNING",
+    )
+    coordinator._spawn_workers()
+    try:
+        await asyncio.wait_for(coordinator._wait_for_workers_listening(), timeout=10)

Conversation

maximmaxim345 commented Mar 27, 2026

Summary

Changes

Notes

Uh oh!

djwmarcx commented Apr 6, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants