Skip to content

make sessions fully transactional with cleanup#222

Merged
j0sh merged 9 commits into
mainfrom
ja/handshake-close
Jun 5, 2026
Merged

make sessions fully transactional with cleanup#222
j0sh merged 9 commits into
mainfrom
ja/handshake-close

Conversation

@j0sh
Copy link
Copy Markdown
Contributor

@j0sh j0sh commented Jun 4, 2026

Summary

This PR makes streaming session startup and teardown transactional so failed session starts do not leave GPU/model resources alive in the pod process.

The production symptom was a session-start failure followed by retries OOMing because the previous failed start had allocated model/session resources but never reached the normal StreamingSession.run() teardown path.

Problem

Before this branch, most cleanup lived in StreamingSession.run(). That missed several startup failure windows:

  1. ModelContext._load_models() could fail after partially loading model state.
  2. Session.__init__ could fail after creating a ModelContext but before returning a usable Session.
  3. StreamingSession.create() could fail after Session(...) succeeded but before a StreamingSession object existed.
  4. StreamingSession.create() could succeed, but the WebSocket init handshake could fail before run() started.

In cases 3 and 4, the adapter either had nothing to close yet or had not entered the normal run finalizer. That could leave PyTorch/TRT/model allocations behind until an external watchdog killed the process.

Changes

1. Clean up partial ModelContext loads

ModelContext.__init__ now wraps _load_models().

If loading fails, it:

  • closes any partially created diffusion engine/model handles,
  • drops tensor-bearing attributes,
  • runs gc.collect(),
  • empties the CUDA cache when available,
  • re-raises the original error.

This covers failures such as partially loaded/meta model state during model placement.

2. Clean up failed Session.__init__

Session.__init__ now initializes close-safe fields up front, then wraps ModelContext creation and backend/TRT wiring.

If anything fails after construction begins, it calls Session.close() and re-raises.

This covers failures after ModelContext exists but before Session(...) returns.

3. Make StreamingSession.create() transactional

StreamingSession.create() now uses an ExitStack for resources acquired before the final StreamingSession object exists.

As each resource is acquired, it is registered immediately:

  • engine_session.close
  • stream.close
  • audio_eng.stop

Ownership transfers only after cls(...) succeeds via cleanup.pop_all().

This covers startup failures during source resolution, stem extraction, conditioning, stream creation, audio engine setup, or late session-state construction.

4. Close returned sessions that fail during the WebSocket init handshake

ws_adapter now registers a cleanup callback immediately after StreamingSession.create() returns.

If the ready frame, initial audio buffer, or initial stem payload fails before streaming.run() starts, the adapter closes the returned session.

Once run() starts, the normal StreamingSession.close() finalizer owns teardown.

5. Preserve runtime contracts

PipelineRunner construction is inside the cleanup finally, but constructor failures still propagate. Only runner.run() failures keep the previous log-and-stop behavior.

This matters for warmup: setup failures should still be visible as warmup failures, not logged and silently treated as success.

6. Complete StreamingSession.close()

StreamingSession.close() now also stops the owned AudioEngine.

For the WebSocket backend this is normally a no-op because local playback is not started, but it keeps the ownership contract complete for warmup/future callers.

7. Make the frontend startup failure message generic

The pre-ready 1011 close message no longer says the server restarted. A 1011 during startup can be any server-side init failure, so the message now says:

Session failed while starting — refresh the page to retry.

Tests / Verification

  • Added lightweight unit coverage for StreamingSession.create() cleanup when:
    • conditioning fails after Session(...) succeeds,
    • late setup fails after stream creation.
  • Ran Python compile checks for changed Python files.
  • Ran git diff --check.
  • Ran web npm run typecheck.

Note: local pytest could not be run on this Mac because the local env lacks pytest/torch, and uv attempts to bootstrap TensorRT CUDA wheels for macOS arm64 before running tests.

Deployment Note

These are DEMON server-side changes. After merge, they will not affect fleet pods until the warm pod image is rebuilt/deployed.

The server never closed a created StreamingSession when the WebSocket init handshake failed. StreamingSession.create() allocates the engine session, prepares the source, extracts upload stems, encodes conditioning, and creates the stream before ws_adapter sends the ready/audio/stem frames. That handshake still has plenty of ways to fail: the client can disconnect, the socket send can raise, or send_stem_payload() can trip while copying and writing the initial stems.

Before this change, the only teardown path that closed the stream and engine session lived in StreamingSession.run(). If the connection failed while ws_adapter was still sending the init handshake, run() was never entered, so session.close() was skipped and the process could keep ModelContext/TRT/PyTorch GPU allocations alive until an external watchdog killed it.

That showed up in production as one session failing during initial stem payload delivery, followed by the user's immediate retry OOMing in stem extraction because the same process still had nearly all VRAM in use.

Add a shared StreamingSession.close() helper, use it from run(), and register a ws_adapter ExitStack cleanup immediately after create(). The callback closes created sessions that fail before run() starts, while normal sessions still use run()'s finalizer. Also move PipelineRunner construction inside run()'s try/finally so constructor failures take the same cleanup path.
Session startup could fail before StreamingSession.create() returned, leaving the cleanup hook in ws_adapter with nothing to close. The observed failure happened inside ModelContext._place_dit(): a partially-meta DiT module raised Cannot copy out of meta tensor while Session.__init__ was still constructing the engine.

Clean up at the constructor boundaries that can fail before a StreamingSession exists. ModelContext now releases partially loaded model state, runs gc.collect(), and empties the CUDA cache when _load_models() raises. Session.__init__ initializes its handle fields up front and calls close() if ModelContext succeeds but later backend/TRT wiring fails.

Also make the pre-ready 1011 WebSocket message generic. A 1011 during init can be any server-side startup failure, not necessarily a supervisor restart to clear memory, so the browser now says the session failed while starting and asks the user to refresh.
@j0sh
Copy link
Copy Markdown
Contributor Author

j0sh commented Jun 4, 2026

Also clean up another session failure path, this time on the constructor:

fix: clean up failed session constructors
Session startup could fail before StreamingSession.create() returned, leaving the cleanup hook in ws_adapter with nothing to close. The observed failure happened inside ModelContext._place_dit(): a partially-meta DiT module raised Cannot copy out of meta tensor while Session.init was still constructing the engine.

Clean up at the constructor boundaries that can fail before a StreamingSession exists. ModelContext now releases partially loaded model state, runs gc.collect(), and empties the CUDA cache when _load_models() raises. Session.init initializes its handle fields up front and calls close() if ModelContext succeeds but later backend/TRT wiring fails.

Also make the pre-ready 1011 WebSocket message generic. A 1011 during init can be any server-side startup failure, not necessarily a supervisor restart to clear memory, so the browser now says the session failed while starting and asks the user to refresh.

j0sh added 5 commits June 3, 2026 17:43
StreamingSession.create() could still leak a fully constructed engine Session when startup failed after Session(...) returned but before the StreamingSession object existed. In that window the WebSocket adapter has no StreamingSession handle to close, so typed init failures like fatal stem extraction, conditioning errors, stream creation errors, or later setup exceptions could leave GPU state behind.

Wrap the resource-acquisition tail of create() in an ExitStack. Each owned resource is registered for cleanup immediately after it is acquired: engine_session first, then the stream handle, then the audio engine. Ownership transfers only after cls(...) succeeds, via pop_all(); after that the existing StreamingSession.close() path remains the single owner for handshake failure, live-session exit, and runner teardown.

Cleanup callbacks suppress and log their own failures so they do not mask the original startup exception. Add lightweight unit coverage that stubs the heavy model path and verifies failures after Session creation close the engine session, and failures after stream creation close both stream and engine session.
The cleanup stack is intentionally layered: ModelContext and Session clean up constructor failures, StreamingSession.create() owns partial resources before a StreamingSession exists, and ws_adapter owns the post-create/pre-run handshake window.

Document those ownership boundaries at the two places that look overlapping so future cleanup work does not accidentally remove one of the still-needed paths.
Keep the append-only history, but undo the explanatory-only comments added while evaluating whether earlier cleanup paths could be removed. The actual constructor, transactional create, and pre-run handshake cleanup paths remain because they cover distinct startup windows.
The cleanup refactor moved PipelineRunner construction inside run()'s teardown guard, but also accidentally put it under the broad pipeline_error handler. That changed the existing contract: constructor failures used to propagate, which lets warmup report failure instead of marking itself done.

Keep the constructor inside the outer try/finally so StreamingSession.close() still runs, but narrow the pipeline_error catch to runner.run(). Runtime pipeline failures keep the previous log-and-stop behavior; constructor/setup failures clean up and still bubble to the caller.
StreamingSession.create() now treats AudioEngine as an owned startup resource and stops it if partial setup fails. Mirror that ownership in the successful-session close path too.

The WebSocket backend does not start local playback, so this is normally a no-op there, but it keeps StreamingSession.close() complete and safe for warmup or any future caller that starts the audio engine before teardown.
@j0sh j0sh changed the title fix: close streaming session on handshake failure make sessions fully transactional with cleanup Jun 4, 2026
@gioelecerati
Copy link
Copy Markdown
Collaborator

The transactional cleanup looks solid — the ExitStack ordering (audio→stream→session), pop_all() only after cls() succeeds, and the idempotent/null-safe close() paths all hold up. Two small things, both low:

audio_eng.stop() isn't actually a no-op on the WS path. AudioEngine._stream is only assigned in start(), which the WebSocket backend never calls — so the new self.audio_eng.stop() in StreamingSession.close() (and the create() ExitStack callback) raises AttributeError: 'AudioEngine' object has no attribute '_stream'. It's caught, so non-fatal, but it logs audio_engine_stop_raised on every clean session teardown — i.e. the PR description's "normally a no-op for the WebSocket backend" is inverted; it always throws. One-line fix: set self._stream = None in AudioEngine.__init__ and guard stop() with if self._stream is not None: so it's a real no-op when playback was never started. Keeps the teardown log clean, which is the spirit of the parallel #313/#318 work.

Completeness nit: the early-close path (_close_streaming_if_init_fails) calls streaming.close() but doesn't session_registry.unregister(). If the post-register()/pre-run() window raises (the LoRA-staging block), the registry leaks the handle. Trigger probability is near-zero (only a lock + list-append in that window) and it's pre-existing, so very low priority — but since the PR's goal is complete teardown on partial failure, the early-close callback could unregister too.

Otherwise this reads well — nice to see the ownership-transfer pattern done properly.

Copy link
Copy Markdown
Collaborator

@ryanontheinside ryanontheinside left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude says there is one test bug: test_create_closes_stream_and_engine_session_when_late_setup_fails stubs encode_cond_pair returning a single object() but create() unpacks a 2-tuple, so it fails with TypeError before reaching the path it tests. Fix: lambda *_args, **_kwargs: (object(), object())

j0sh added 2 commits June 4, 2026 13:58
The early-init cleanup path closes a returned StreamingSession when the WebSocket handshake/setup fails before streaming.run() takes over. If the failure happened after session_registry.register() but before run(), the existing run-finally unregister block was never active, so the HTTP/MCP session registry could retain a stale handle.

Track whether registration has happened and let the same pre-run cleanup callback unregister before closing the session. Normal sessions still unregister in the existing run-finally path because streaming_entered_run is true.
The late-setup cleanup test stubbed encode_cond_pair with a single object, but StreamingSession.create() unpacks that helper as a two-value pair. That made the test fail early with a TypeError before reaching the intended AudioEngine failure path.

Return a two-object tuple so the test exercises stream and engine-session cleanup after stream creation.
@j0sh
Copy link
Copy Markdown
Contributor Author

j0sh commented Jun 5, 2026

thanks for the reviews @gioelecerati @ryanontheinside

Completeness nit:

Fixed in 5e4032d

audio_eng.stop() isn't actually a no-op on the WS path

From Codex:

The audio_eng.stop() issue does not appear to exist in the current tree. AudioEngine already has the proposed fix both on this branch and on origin/main:
audio_engine.py (line 22) sets self._stream = None in init.
audio_engine.py (line 171) guards stop() with if self._stream is None: return.
So on the WS path, where start() is never called, stop() returns cleanly and should not log audio_engine_stop_raised.

Claude says there is one test bug:

Fixed in e021b21 . Note that there seem to be a number of failing LoRA tests separate from this, also failing on main. From Codex:

Ran it on the Vast instance:

cd /workspace/DEMON && uv run pytest tests/unit

Result:

126 collected
114 passed
4 skipped
8 failed
All 8 failures are in tests/unit/test_lora_refit.py, and they all fail the same way:

AttributeError: 'TRTLoRAManager' object has no attribute '_transpose_for_engine'
This does not look like a broad machine/dependency issue. The suite imports, collects, runs quickly, and most tests pass. The failing tests use a helper that deliberately bypasses TRTLoRAManager.init:

mgr = TRTLoRAManager.new(TRTLoRAManager)
But production init now initializes _transpose_for_engine, plus other newer refit fields like _is_fp8, _scratch_acc, _trt_dtype, and _param_numel. So the unit fixture is stale relative to the implementation.

@j0sh j0sh merged commit 181dd00 into main Jun 5, 2026
@j0sh j0sh deleted the ja/handshake-close branch June 5, 2026 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants