make sessions fully transactional with cleanup#222
Conversation
The server never closed a created StreamingSession when the WebSocket init handshake failed. StreamingSession.create() allocates the engine session, prepares the source, extracts upload stems, encodes conditioning, and creates the stream before ws_adapter sends the ready/audio/stem frames. That handshake still has plenty of ways to fail: the client can disconnect, the socket send can raise, or send_stem_payload() can trip while copying and writing the initial stems. Before this change, the only teardown path that closed the stream and engine session lived in StreamingSession.run(). If the connection failed while ws_adapter was still sending the init handshake, run() was never entered, so session.close() was skipped and the process could keep ModelContext/TRT/PyTorch GPU allocations alive until an external watchdog killed it. That showed up in production as one session failing during initial stem payload delivery, followed by the user's immediate retry OOMing in stem extraction because the same process still had nearly all VRAM in use. Add a shared StreamingSession.close() helper, use it from run(), and register a ws_adapter ExitStack cleanup immediately after create(). The callback closes created sessions that fail before run() starts, while normal sessions still use run()'s finalizer. Also move PipelineRunner construction inside run()'s try/finally so constructor failures take the same cleanup path.
Session startup could fail before StreamingSession.create() returned, leaving the cleanup hook in ws_adapter with nothing to close. The observed failure happened inside ModelContext._place_dit(): a partially-meta DiT module raised Cannot copy out of meta tensor while Session.__init__ was still constructing the engine. Clean up at the constructor boundaries that can fail before a StreamingSession exists. ModelContext now releases partially loaded model state, runs gc.collect(), and empties the CUDA cache when _load_models() raises. Session.__init__ initializes its handle fields up front and calls close() if ModelContext succeeds but later backend/TRT wiring fails. Also make the pre-ready 1011 WebSocket message generic. A 1011 during init can be any server-side startup failure, not necessarily a supervisor restart to clear memory, so the browser now says the session failed while starting and asks the user to refresh.
|
Also clean up another session failure path, this time on the constructor: fix: clean up failed session constructors Clean up at the constructor boundaries that can fail before a StreamingSession exists. ModelContext now releases partially loaded model state, runs gc.collect(), and empties the CUDA cache when _load_models() raises. Session.init initializes its handle fields up front and calls close() if ModelContext succeeds but later backend/TRT wiring fails. Also make the pre-ready 1011 WebSocket message generic. A 1011 during init can be any server-side startup failure, not necessarily a supervisor restart to clear memory, so the browser now says the session failed while starting and asks the user to refresh. |
StreamingSession.create() could still leak a fully constructed engine Session when startup failed after Session(...) returned but before the StreamingSession object existed. In that window the WebSocket adapter has no StreamingSession handle to close, so typed init failures like fatal stem extraction, conditioning errors, stream creation errors, or later setup exceptions could leave GPU state behind. Wrap the resource-acquisition tail of create() in an ExitStack. Each owned resource is registered for cleanup immediately after it is acquired: engine_session first, then the stream handle, then the audio engine. Ownership transfers only after cls(...) succeeds, via pop_all(); after that the existing StreamingSession.close() path remains the single owner for handshake failure, live-session exit, and runner teardown. Cleanup callbacks suppress and log their own failures so they do not mask the original startup exception. Add lightweight unit coverage that stubs the heavy model path and verifies failures after Session creation close the engine session, and failures after stream creation close both stream and engine session.
The cleanup stack is intentionally layered: ModelContext and Session clean up constructor failures, StreamingSession.create() owns partial resources before a StreamingSession exists, and ws_adapter owns the post-create/pre-run handshake window. Document those ownership boundaries at the two places that look overlapping so future cleanup work does not accidentally remove one of the still-needed paths.
Keep the append-only history, but undo the explanatory-only comments added while evaluating whether earlier cleanup paths could be removed. The actual constructor, transactional create, and pre-run handshake cleanup paths remain because they cover distinct startup windows.
The cleanup refactor moved PipelineRunner construction inside run()'s teardown guard, but also accidentally put it under the broad pipeline_error handler. That changed the existing contract: constructor failures used to propagate, which lets warmup report failure instead of marking itself done. Keep the constructor inside the outer try/finally so StreamingSession.close() still runs, but narrow the pipeline_error catch to runner.run(). Runtime pipeline failures keep the previous log-and-stop behavior; constructor/setup failures clean up and still bubble to the caller.
StreamingSession.create() now treats AudioEngine as an owned startup resource and stops it if partial setup fails. Mirror that ownership in the successful-session close path too. The WebSocket backend does not start local playback, so this is normally a no-op there, but it keeps StreamingSession.close() complete and safe for warmup or any future caller that starts the audio engine before teardown.
|
The transactional cleanup looks solid — the ExitStack ordering (audio→stream→session),
Completeness nit: the early-close path ( Otherwise this reads well — nice to see the ownership-transfer pattern done properly. |
ryanontheinside
left a comment
There was a problem hiding this comment.
Claude says there is one test bug: test_create_closes_stream_and_engine_session_when_late_setup_fails stubs encode_cond_pair returning a single object() but create() unpacks a 2-tuple, so it fails with TypeError before reaching the path it tests. Fix: lambda *_args, **_kwargs: (object(), object())
The early-init cleanup path closes a returned StreamingSession when the WebSocket handshake/setup fails before streaming.run() takes over. If the failure happened after session_registry.register() but before run(), the existing run-finally unregister block was never active, so the HTTP/MCP session registry could retain a stale handle. Track whether registration has happened and let the same pre-run cleanup callback unregister before closing the session. Normal sessions still unregister in the existing run-finally path because streaming_entered_run is true.
The late-setup cleanup test stubbed encode_cond_pair with a single object, but StreamingSession.create() unpacks that helper as a two-value pair. That made the test fail early with a TypeError before reaching the intended AudioEngine failure path. Return a two-object tuple so the test exercises stream and engine-session cleanup after stream creation.
|
thanks for the reviews @gioelecerati @ryanontheinside
Fixed in 5e4032d
From Codex: The audio_eng.stop() issue does not appear to exist in the current tree. AudioEngine already has the proposed fix both on this branch and on origin/main:
Fixed in e021b21 . Note that there seem to be a number of failing LoRA tests separate from this, also failing on main. From Codex: Ran it on the Vast instance: cd /workspace/DEMON && uv run pytest tests/unit Result: 126 collected AttributeError: 'TRTLoRAManager' object has no attribute '_transpose_for_engine' mgr = TRTLoRAManager.new(TRTLoRAManager) |
Summary
This PR makes streaming session startup and teardown transactional so failed session starts do not leave GPU/model resources alive in the pod process.
The production symptom was a session-start failure followed by retries OOMing because the previous failed start had allocated model/session resources but never reached the normal
StreamingSession.run()teardown path.Problem
Before this branch, most cleanup lived in
StreamingSession.run(). That missed several startup failure windows:ModelContext._load_models()could fail after partially loading model state.Session.__init__could fail after creating aModelContextbut before returning a usableSession.StreamingSession.create()could fail afterSession(...)succeeded but before aStreamingSessionobject existed.StreamingSession.create()could succeed, but the WebSocket init handshake could fail beforerun()started.In cases 3 and 4, the adapter either had nothing to close yet or had not entered the normal run finalizer. That could leave PyTorch/TRT/model allocations behind until an external watchdog killed the process.
Changes
1. Clean up partial
ModelContextloadsModelContext.__init__now wraps_load_models().If loading fails, it:
gc.collect(),This covers failures such as partially loaded/meta model state during model placement.
2. Clean up failed
Session.__init__Session.__init__now initializes close-safe fields up front, then wrapsModelContextcreation and backend/TRT wiring.If anything fails after construction begins, it calls
Session.close()and re-raises.This covers failures after
ModelContextexists but beforeSession(...)returns.3. Make
StreamingSession.create()transactionalStreamingSession.create()now uses anExitStackfor resources acquired before the finalStreamingSessionobject exists.As each resource is acquired, it is registered immediately:
engine_session.closestream.closeaudio_eng.stopOwnership transfers only after
cls(...)succeeds viacleanup.pop_all().This covers startup failures during source resolution, stem extraction, conditioning, stream creation, audio engine setup, or late session-state construction.
4. Close returned sessions that fail during the WebSocket init handshake
ws_adapternow registers a cleanup callback immediately afterStreamingSession.create()returns.If the ready frame, initial audio buffer, or initial stem payload fails before
streaming.run()starts, the adapter closes the returned session.Once
run()starts, the normalStreamingSession.close()finalizer owns teardown.5. Preserve runtime contracts
PipelineRunnerconstruction is inside the cleanupfinally, but constructor failures still propagate. Onlyrunner.run()failures keep the previous log-and-stop behavior.This matters for warmup: setup failures should still be visible as warmup failures, not logged and silently treated as success.
6. Complete
StreamingSession.close()StreamingSession.close()now also stops the ownedAudioEngine.For the WebSocket backend this is normally a no-op because local playback is not started, but it keeps the ownership contract complete for warmup/future callers.
7. Make the frontend startup failure message generic
The pre-ready 1011 close message no longer says the server restarted. A 1011 during startup can be any server-side init failure, so the message now says:
Tests / Verification
StreamingSession.create()cleanup when:Session(...)succeeds,git diff --check.npm run typecheck.Note: local pytest could not be run on this Mac because the local env lacks
pytest/torch, anduvattempts to bootstrap TensorRT CUDA wheels for macOS arm64 before running tests.Deployment Note
These are DEMON server-side changes. After merge, they will not affect fleet pods until the warm pod image is rebuilt/deployed.