Skip to content

Add SSE replay, event persistence, and multi-container support#20

Merged
hwuiwon merged 2 commits into
mainfrom
sse
Mar 30, 2026
Merged

Add SSE replay, event persistence, and multi-container support#20
hwuiwon merged 2 commits into
mainfrom
sse

Conversation

@hwuiwon
Copy link
Copy Markdown
Collaborator

@hwuiwon hwuiwon commented Mar 30, 2026

Summary

  • SSE event stream now replays all past events on connect and supports Last-Event-ID for reconnection
  • Run status and events are persisted to Modal Volume, queryable after sandbox terminates
  • Status/stream/stop endpoints work from any API container via Sandbox.from_id() fallback
  • Fixed pre-existing bug where SSE streaming never worked (status API and agent were separate processes
    sharing no state)
  • Converted status literals to StrEnum for type safety
  • Split README into focused docs under docs/

Key changes

  • api/streaming.py — Replay-then-live SSE pattern with dedup, _action_log list, persist_status() to
    volume
  • agent/main.py — Status API (uvicorn) runs in-process as asyncio task, sharing module globals with agent
    loop
  • api/server.py — _get_or_reconstruct_handle() for multi-container, _load_persisted_status() /
    _replay_persisted_events() for volume fallback, recordings volume mounted on API server
  • actionlog/actions.py — id: field in SSE events for Last-Event-ID reconnection
  • api/models.py / api/run_registry.py — RunStatusValue and RunPhase as StrEnum
  • sandbox/entrypoint.sh — Removed separate uvicorn process (now in-process)

Test plan

  • 21 new tests in tests/test_streaming.py (SSE replay, event IDs, Last-Event-ID, registry lifecycle)
  • All 350 existing tests pass
  • Deployed to Modal and validated end-to-end: live streaming, post-termination status retrieval,
    post-termination event replay, Last-Event-ID filtering, stop endpoint

Summary by CodeRabbit

  • New Features

    • SSE event replay with Last-Event-ID and per-event id sequencing
    • Persistent run status and recordings so completed runs can be queried and replayed
    • Explicit run phase/status tracking and clearer terminal states
    • Automatic in-process status API startup for local runs
  • Documentation

    • New API reference and guides for authentication, configuration, guardrails, observability, playbooks, recording, and tools
  • Bug Fixes

    • Improved run-handle reconstruction and more robust status persistence on termination
  • Tests

    • Added streaming and lifecycle test suite covering SSE replay, formatting, and status behavior

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 30, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: daeefd17-d6d4-48c7-b0dd-cb255d8c19f4

📥 Commits

Reviewing files that changed from the base of the PR and between cc21376 and 7768478.

📒 Files selected for processing (4)
  • api/models.py
  • api/server.py
  • api/streaming.py
  • tests/test_streaming.py

📝 Walkthrough

Walkthrough

Moves the status API in-process, adds SSE id fields and replay with persisted status/events via a Modal volume, introduces explicit RunStatusValue/RunPhase enums and RunHandle fields, implements sandbox handle reconstruction for multi-container runs, updates session runner to persist status, and adds docs and tests for streaming and persistence.

Changes

Cohort / File(s) Summary
Status API lifecycle
agent/main.py, sandbox/entrypoint.sh
Start status API in-process on port 8090 via an asyncio task; remove external status subprocess startup/cleanup and cancel the in-process task on shutdown.
SSE formatting & in-memory replay
actionlog/actions.py, api/streaming.py, tests/test_streaming.py
Emit SSE id: <step> plus data: <json>; retain an in-memory action log, support replaying prior actions on new /events subscriptions (honoring Last-Event-ID), and add tests for formatting, replay, and lifecycle semantics.
Run state and registry models
api/models.py, api/run_registry.py
Replace RunStatusValue Literal with a StrEnum; add status to RunResponse; introduce RunPhase and add phase and error fields to RunHandle.
Server endpoints, reconstruction & persistence
api/server.py, api/streaming.py
Add sandbox handle reconstruction (_get_or_reconstruct_handle), Modal volume mount cua-recordings, _load_persisted_status, SSE replay from persisted RunStatus.actions, forward Last-Event-ID upstream, and return persisted status when sandbox is cleaned up or unreachable.
Session lifecycle persistence
agent/session_runner.py
Invoke persist_status(output_dir) after complete_run() in normal and error paths so status.json is written to recordings volume for replay.
Documentation
docs/api.md, docs/* (authentication, configuration, guardrails, observability, playbooks, recording, tools)
Add comprehensive docs describing the API surface, SSE semantics, recordings, auth/credentials, configurables, guardrails, observability, playbooks, and tools.
Tests & CI validation
tests/test_streaming.py
New test suite validating SSE id formatting, action-log replay, Last-Event-ID handling, RunHandle fields, and streaming/status endpoints behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as API Server
    participant Sandbox as Modal Sandbox
    participant Volume as Modal Volume
    participant ActionLog as In-memory Action Log

    Client->>API: POST /runs
    API->>Sandbox: create sandbox & start agent
    Sandbox->>ActionLog: init_status() (STARTING)
    loop actions
        Sandbox->>API: push_action()
        API->>ActionLog: append action (step N)
    end
    Client->>API: GET /runs/{id}/stream (may include Last-Event-ID)
    API->>ActionLog: replay actions with step > Last-Event-ID
    API->>Client: SSE events (id: step, data: {...})
    Sandbox->>API: complete_run()
    API->>Volume: persist_status(/recordings/{run_id}/status.json)
    API->>Client: SSE event=complete (final status)
    Note right of Client: Later reconnect
    Client->>API: GET /runs/{id}/stream
    API->>Sandbox: attempt reconstruct via Sandbox.from_id()
    alt reconstruct fails or sandbox exited
        API->>Volume: load status.json
        API->>Client: replay persisted actions + complete event
    else reconstruct succeeds
        API->>Sandbox: proxy live events
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.17% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main changes: SSE replay, event persistence, and multi-container support are central features added across streaming, server, and session-runner code.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sse

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant