Add SSE replay, event persistence, and multi-container support by hwuiwon · Pull Request #20 · AppliedLabsAI/cua

hwuiwon · 2026-03-30T02:24:17Z

Summary

SSE event stream now replays all past events on connect and supports Last-Event-ID for reconnection
Run status and events are persisted to Modal Volume, queryable after sandbox terminates
Status/stream/stop endpoints work from any API container via Sandbox.from_id() fallback
Fixed pre-existing bug where SSE streaming never worked (status API and agent were separate processes
sharing no state)
Converted status literals to StrEnum for type safety
Split README into focused docs under docs/

Key changes

api/streaming.py — Replay-then-live SSE pattern with dedup, _action_log list, persist_status() to
volume
agent/main.py — Status API (uvicorn) runs in-process as asyncio task, sharing module globals with agent
loop
api/server.py — _get_or_reconstruct_handle() for multi-container, _load_persisted_status() /
_replay_persisted_events() for volume fallback, recordings volume mounted on API server
actionlog/actions.py — id: field in SSE events for Last-Event-ID reconnection
api/models.py / api/run_registry.py — RunStatusValue and RunPhase as StrEnum
sandbox/entrypoint.sh — Removed separate uvicorn process (now in-process)

Test plan

21 new tests in tests/test_streaming.py (SSE replay, event IDs, Last-Event-ID, registry lifecycle)
All 350 existing tests pass
Deployed to Modal and validated end-to-end: live streaming, post-termination status retrieval,
post-termination event replay, Last-Event-ID filtering, stop endpoint

Summary by CodeRabbit

New Features
- SSE event replay with Last-Event-ID and per-event id sequencing
- Persistent run status and recordings so completed runs can be queried and replayed
- Explicit run phase/status tracking and clearer terminal states
- Automatic in-process status API startup for local runs
Documentation
- New API reference and guides for authentication, configuration, guardrails, observability, playbooks, recording, and tools
Bug Fixes
- Improved run-handle reconstruction and more robust status persistence on termination
Tests
- Added streaming and lifecycle test suite covering SSE replay, formatting, and status behavior

coderabbitai · 2026-03-30T02:24:32Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: daeefd17-d6d4-48c7-b0dd-cb255d8c19f4

📥 Commits

Reviewing files that changed from the base of the PR and between cc21376 and 7768478.

📒 Files selected for processing (4)

api/models.py
api/server.py
api/streaming.py
tests/test_streaming.py

📝 Walkthrough

Walkthrough

Moves the status API in-process, adds SSE id fields and replay with persisted status/events via a Modal volume, introduces explicit RunStatusValue/RunPhase enums and RunHandle fields, implements sandbox handle reconstruction for multi-container runs, updates session runner to persist status, and adds docs and tests for streaming and persistence.

Changes

Cohort / File(s)	Summary
Status API lifecycle `agent/main.py`, `sandbox/entrypoint.sh`	Start status API in-process on port 8090 via an asyncio task; remove external status subprocess startup/cleanup and cancel the in-process task on shutdown.
SSE formatting & in-memory replay `actionlog/actions.py`, `api/streaming.py`, `tests/test_streaming.py`	Emit SSE `id: <step>` plus `data: <json>`; retain an in-memory action log, support replaying prior actions on new `/events` subscriptions (honoring `Last-Event-ID`), and add tests for formatting, replay, and lifecycle semantics.
Run state and registry models `api/models.py`, `api/run_registry.py`	Replace RunStatusValue Literal with a `StrEnum`; add `status` to `RunResponse`; introduce `RunPhase` and add `phase` and `error` fields to `RunHandle`.
Server endpoints, reconstruction & persistence `api/server.py`, `api/streaming.py`	Add sandbox handle reconstruction (`_get_or_reconstruct_handle`), Modal volume mount `cua-recordings`, `_load_persisted_status`, SSE replay from persisted `RunStatus.actions`, forward `Last-Event-ID` upstream, and return persisted status when sandbox is cleaned up or unreachable.
Session lifecycle persistence `agent/session_runner.py`	Invoke `persist_status(output_dir)` after `complete_run()` in normal and error paths so status.json is written to recordings volume for replay.
Documentation `docs/api.md`, `docs/*` (authentication, configuration, guardrails, observability, playbooks, recording, tools)	Add comprehensive docs describing the API surface, SSE semantics, recordings, auth/credentials, configurables, guardrails, observability, playbooks, and tools.
Tests & CI validation `tests/test_streaming.py`	New test suite validating SSE `id` formatting, action-log replay, `Last-Event-ID` handling, RunHandle fields, and streaming/status endpoints behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as API Server
    participant Sandbox as Modal Sandbox
    participant Volume as Modal Volume
    participant ActionLog as In-memory Action Log

    Client->>API: POST /runs
    API->>Sandbox: create sandbox & start agent
    Sandbox->>ActionLog: init_status() (STARTING)
    loop actions
        Sandbox->>API: push_action()
        API->>ActionLog: append action (step N)
    end
    Client->>API: GET /runs/{id}/stream (may include Last-Event-ID)
    API->>ActionLog: replay actions with step > Last-Event-ID
    API->>Client: SSE events (id: step, data: {...})
    Sandbox->>API: complete_run()
    API->>Volume: persist_status(/recordings/{run_id}/status.json)
    API->>Client: SSE event=complete (final status)
    Note right of Client: Later reconnect
    Client->>API: GET /runs/{id}/stream
    API->>Sandbox: attempt reconstruct via Sandbox.from_id()
    alt reconstruct fails or sandbox exited
        API->>Volume: load status.json
        API->>Client: replay persisted actions + complete event
    else reconstruct succeeds
        API->>Sandbox: proxy live events
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

fix modal #16 — Related changes to api/server.py around Modal sandbox reconstruction, multi-container behavior, and endpoint semantics.
Add session recording with Playwright tracing and screenshot persistence #8 — Related session recording persistence and Modal volume handling for storing/retrieving status and artifacts.
switch to pydantic models #14 — Related to actionlog changes (SSE/event model); this PR’s SSE id emission intersects with #14’s ActionLog model migration.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 47.17% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main changes: SSE replay, event persistence, and multi-container support are central features added across streaming, server, and session-runner code.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch sse

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Add SSE replay, event persistence, and multi-container support

cc21376

Add SSE replay, event persistence, and multi-container support

7768478

hwuiwon merged commit 4eb2426 into main Mar 30, 2026
1 check failed

hwuiwon deleted the sse branch March 30, 2026 02:29

This was referenced Mar 30, 2026

Refactor API, bridge, guardrails, playbook, and eval orchestration into focused services #29

Merged

Refactor run/session lifecycle into dedicated packages and tighten browser execution boundaries #47

Merged

This was referenced Apr 7, 2026

harden stream fallbacks, namespace action logs by run, and add playbook failure messages #52

Merged

Fix sandbox errors, add project docs, and update stale documentation #56

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SSE replay, event persistence, and multi-container support#20

Add SSE replay, event persistence, and multi-container support#20
hwuiwon merged 2 commits into
mainfrom
sse

hwuiwon commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hwuiwon commented Mar 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hwuiwon commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading