harden stream fallbacks, namespace action logs by run, and add playbook failure messages by hwuiwon · Pull Request #52 · AppliedLabsAI/cua

hwuiwon · 2026-04-07T15:15:58Z

Summary

Action log isolation: persist_action_log now accepts a run_id and writes logs into per-run subdirectories,
preventing interleaved logs across concurrent runs. ActionRouter generates a fallback UUID when no run ID is
provided.
Finalizer lifecycle split: Split RunFinalizer.persist() into publish_status() (in-sandbox status API) and
persist() (volume commit), with persist() running after cleanup() so recordings are finalized before the
volume is committed. Each phase has independent error handling.
Stream/recording fallback hardening: Track bytes sent during live trace proxy — only fall back to the persisted
trace file when zero live bytes were delivered (prevents corrupt mixed streams). Similarly, RunService.stream_run
marks the run inactive on upstream failure and replays persisted events when no live events were yielded.
Playbook failure_message field: Steps can now declare a custom failure_message that surfaces as the
user-facing error instead of raw exception text. Supports {param} placeholder substitution. LLM recovery failures
now combine the original step error with the recovery error and preserve token accounting.
Bug fix: PlaybookRunner now appends the failed step result to step_results before returning an abort, so
callers always see what failed.
Guardrail setting: Add stuck_revisit_gap to GuardrailSettings for tuning revisit detection.

Test plan

New test_bridge_execution.py — covers execute_dom_action unknown action rejection + SequenceExecutor
happy/failure paths
New test_bridge_router.py — covers DOM filtering, action log namespacing by run ID, and captcha message
prepending
New test_recording_service.py — covers persisted trace fallback (zero bytes) vs. partial live stream (no
fallback)
Updated test_run_service.py — stream proxy marks run inactive and replays persisted events on connection
failure
Updated test_session_finalizer.py — asserts correct publish → cleanup → persist call ordering
Updated test_playbooks.py — failure_message schema, param binding, executor verification error, LLM recovery
failure surfacing, abort includes failed step
Updated test_actionlog.py — per-run-id file namespacing
Updated test_streaming.py — stuck_revisit_gap model field + removed redundant timeout args

Summary by CodeRabbit

Release Notes

New Features

Playbook steps can now include custom failure messages to provide more context when steps cannot complete
Added new guardrail configuration setting stuck_revisit_gap to customize revisit behavior (range: 1–50, default: 5)
Run identifiers now organize action logs, improving traceability and organization

Bug Fixes

Improved fallback mechanisms for trace and event streaming when upstream services are temporarily unavailable
Enhanced error recovery and reporting in playbook execution with better aggregation of failure context

Tests

Comprehensive test coverage added for action routing, playbook execution, recording services, and streaming functionality

…ok failure messages

coderabbitai · 2026-04-07T15:16:15Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c05347a5-507e-4c20-8d7b-f2edf04baa94

📥 Commits

Reviewing files that changed from the base of the PR and between 1ead0e4 and 7bbd670.

📒 Files selected for processing (22)

actionlog/actions.py
agent/session/finalizer.py
agent/session/runner.py
api/models.py
api/recording_service.py
api/runs/service.py
bridge/router.py
evaluation/runner.py
playbooks/executor.py
playbooks/params.py
playbooks/recovery.py
playbooks/runner.py
playbooks/schema.py
scripts/run_local.py
tests/test_actionlog.py
tests/test_bridge_execution.py
tests/test_bridge_router.py
tests/test_playbooks.py
tests/test_recording_service.py
tests/test_run_service.py
tests/test_session_finalizer.py
tests/test_streaming.py

📝 Walkthrough

Walkthrough

This pull request introduces run-aware action logging by threading a run_id parameter through the action routing and persistence pipeline, enabling per-run isolation of action logs on disk. Additionally, it enhances playbook execution with step-level failure messaging, improves recovery error aggregation, and adds fallback streaming mechanisms in recording and event services when live upstream connections fail.

Changes

Cohort / File(s)	Summary
Run ID Threading `actionlog/actions.py`, `bridge/router.py`, `agent/session/runner.py`, `evaluation/runner.py`, `scripts/run_local.py`	Added optional `run_id` parameter to `persist_action_log` to namespace persisted action logs by run; updated `ActionRouter` constructor to accept and generate `run_id` when needed, then pass it into persistence calls. Directory creation now ensured via `os.makedirs` before writing.
Playbook Failure Messaging `playbooks/schema.py`, `playbooks/params.py`, `playbooks/executor.py`	Added `failure_message: str` field to `PlaybookStep`; implemented placeholder binding for failure messages in parameter substitution; introduced `_step_error` helper to conditionally use step's custom failure message or fall back to default error text.
Playbook Recovery Enhancement `playbooks/recovery.py`, `playbooks/runner.py`	Added `_surface_llm_failure` helper to aggregate original step failure and LLM recovery failure into combined `StepResult` fields; guards early return when LLM handoff fails; ensured failed steps are appended to `step_results` before abort.
Fallback Streaming `api/recording_service.py`, `api/runs/service.py`	Added conditional fallback to persisted trace/event streams when live upstream fails; tracks bytes successfully sent to determine whether fallback is safe; persisted fallback only triggered if no bytes were streamed before error.
Session Finalization `agent/session/finalizer.py`	Renamed `persist(outcome)` to `publish_status(outcome)` for status publishing; added new parameterless `persist()` for recording artifact persistence; updated call order to: `publish_status()` → `cleanup()` → `persist()`. Both methods now wrap exceptions in broad try/catch with warning-level logging.
Configuration & Guardrails `api/models.py`	Added `stuck_revisit_gap` field to `GuardrailSettings` with bounds `1 ≤ value ≤ 50` and default `5`.
Test Coverage `tests/test_actionlog.py`, `tests/test_bridge_execution.py`, `tests/test_bridge_router.py`, `tests/test_playbooks.py`, `tests/test_recording_service.py`, `tests/test_run_service.py`, `tests/test_session_finalizer.py`, `tests/test_streaming.py`	Added comprehensive test suite validating run-id namespacing of persisted logs, action router orchestration, bridge execution control flow, playbook failure messaging and recovery behavior, streaming fallback mechanics, and finalizer execution order.

Sequence Diagrams

sequenceDiagram
    participant Agent as Agent Session
    participant Router as ActionRouter
    participant Logger as persist_action_log
    participant Disk as Local Disk

    Agent->>Router: execute_action(run_id="run-123")
    Router->>Router: _record_action(entry, run_id)
    Router->>Logger: persist_action_log(entry, run_id="run-123")
    Logger->>Logger: sanitize_run_id("run-123")
    Logger->>Disk: os.makedirs("run-123/", exist_ok=True)
    Logger->>Disk: write(run-123/action_N.json)
    Logger-->>Router: path to persisted file
    Router-->>Agent: ActionResult

sequenceDiagram
    participant Client as HTTP Client
    participant Service as RecordingService
    participant Upstream as Upstream Stream
    participant Storage as Persisted Storage

    Client->>Service: get_trace(run_id)
    Service->>Upstream: stream upstream
    Upstream->>Service: yield chunk 1
    Service->>Service: bytes_sent += len(chunk)
    Upstream-->>Service: HTTPError before chunk 2
    
    alt bytes_sent == 0
        Service->>Storage: load persisted trace.zip
        Storage-->>Service: bytes(trace.zip)
        Service-->>Client: persisted bytes
    else bytes_sent > 0
        Service-->>Client: partial upstream bytes only
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

Stabilize offline testing and add local evaluation suites #17: Modifies actionlog/actions.py's persist_action_log async function signature and on-disk output handling.
switch to pydantic models #14: Changes actionlog/actions.py persistence behavior through Pydantic serialization refactor.
Add SSE replay, event persistence, and multi-container support #20: Threads SSE event identifiers into action/event persistence in actionlog/actions.py.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch harden-stream

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

harden stream fallbacks, namespace action logs by run, and add playbo…

7bbd670

…ok failure messages

hwuiwon merged commit cc90b13 into main Apr 7, 2026
3 checks passed

hwuiwon deleted the harden-stream branch April 7, 2026 15:16

This was referenced Apr 7, 2026

add browser-to-API session handoff with api_request action and session capture #53

Closed

Fix agent looping on missing data and improve stuck detection #54

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harden stream fallbacks, namespace action logs by run, and add playbook failure messages#52

harden stream fallbacks, namespace action logs by run, and add playbook failure messages#52
hwuiwon merged 1 commit into
mainfrom
harden-stream

hwuiwon commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hwuiwon commented Apr 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hwuiwon commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading