Skip to content

harden stream fallbacks, namespace action logs by run, and add playbook failure messages#52

Merged
hwuiwon merged 1 commit into
mainfrom
harden-stream
Apr 7, 2026
Merged

harden stream fallbacks, namespace action logs by run, and add playbook failure messages#52
hwuiwon merged 1 commit into
mainfrom
harden-stream

Conversation

@hwuiwon
Copy link
Copy Markdown
Collaborator

@hwuiwon hwuiwon commented Apr 7, 2026

Summary

  • Action log isolation: persist_action_log now accepts a run_id and writes logs into per-run subdirectories,
    preventing interleaved logs across concurrent runs. ActionRouter generates a fallback UUID when no run ID is
    provided.

  • Finalizer lifecycle split: Split RunFinalizer.persist() into publish_status() (in-sandbox status API) and
    persist() (volume commit), with persist() running after cleanup() so recordings are finalized before the
    volume is committed. Each phase has independent error handling.

  • Stream/recording fallback hardening: Track bytes sent during live trace proxy — only fall back to the persisted
    trace file when zero live bytes were delivered (prevents corrupt mixed streams). Similarly, RunService.stream_run
    marks the run inactive on upstream failure and replays persisted events when no live events were yielded.

  • Playbook failure_message field: Steps can now declare a custom failure_message that surfaces as the
    user-facing error instead of raw exception text. Supports {param} placeholder substitution. LLM recovery failures
    now combine the original step error with the recovery error and preserve token accounting.

  • Bug fix: PlaybookRunner now appends the failed step result to step_results before returning an abort, so
    callers always see what failed.

  • Guardrail setting: Add stuck_revisit_gap to GuardrailSettings for tuning revisit detection.

Test plan

  • New test_bridge_execution.py — covers execute_dom_action unknown action rejection + SequenceExecutor
    happy/failure paths
  • New test_bridge_router.py — covers DOM filtering, action log namespacing by run ID, and captcha message
    prepending
  • New test_recording_service.py — covers persisted trace fallback (zero bytes) vs. partial live stream (no
    fallback)
  • Updated test_run_service.py — stream proxy marks run inactive and replays persisted events on connection
    failure
  • Updated test_session_finalizer.py — asserts correct publish → cleanup → persist call ordering
  • Updated test_playbooks.py — failure_message schema, param binding, executor verification error, LLM recovery
    failure surfacing, abort includes failed step
  • Updated test_actionlog.py — per-run-id file namespacing
  • Updated test_streaming.pystuck_revisit_gap model field + removed redundant timeout args

Summary by CodeRabbit

Release Notes

New Features

  • Playbook steps can now include custom failure messages to provide more context when steps cannot complete
  • Added new guardrail configuration setting stuck_revisit_gap to customize revisit behavior (range: 1–50, default: 5)
  • Run identifiers now organize action logs, improving traceability and organization

Bug Fixes

  • Improved fallback mechanisms for trace and event streaming when upstream services are temporarily unavailable
  • Enhanced error recovery and reporting in playbook execution with better aggregation of failure context

Tests

  • Comprehensive test coverage added for action routing, playbook execution, recording services, and streaming functionality

@hwuiwon hwuiwon merged commit cc90b13 into main Apr 7, 2026
3 checks passed
@hwuiwon hwuiwon deleted the harden-stream branch April 7, 2026 15:16
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 7, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c05347a5-507e-4c20-8d7b-f2edf04baa94

📥 Commits

Reviewing files that changed from the base of the PR and between 1ead0e4 and 7bbd670.

📒 Files selected for processing (22)
  • actionlog/actions.py
  • agent/session/finalizer.py
  • agent/session/runner.py
  • api/models.py
  • api/recording_service.py
  • api/runs/service.py
  • bridge/router.py
  • evaluation/runner.py
  • playbooks/executor.py
  • playbooks/params.py
  • playbooks/recovery.py
  • playbooks/runner.py
  • playbooks/schema.py
  • scripts/run_local.py
  • tests/test_actionlog.py
  • tests/test_bridge_execution.py
  • tests/test_bridge_router.py
  • tests/test_playbooks.py
  • tests/test_recording_service.py
  • tests/test_run_service.py
  • tests/test_session_finalizer.py
  • tests/test_streaming.py

📝 Walkthrough

Walkthrough

This pull request introduces run-aware action logging by threading a run_id parameter through the action routing and persistence pipeline, enabling per-run isolation of action logs on disk. Additionally, it enhances playbook execution with step-level failure messaging, improves recovery error aggregation, and adds fallback streaming mechanisms in recording and event services when live upstream connections fail.

Changes

Cohort / File(s) Summary
Run ID Threading
actionlog/actions.py, bridge/router.py, agent/session/runner.py, evaluation/runner.py, scripts/run_local.py
Added optional run_id parameter to persist_action_log to namespace persisted action logs by run; updated ActionRouter constructor to accept and generate run_id when needed, then pass it into persistence calls. Directory creation now ensured via os.makedirs before writing.
Playbook Failure Messaging
playbooks/schema.py, playbooks/params.py, playbooks/executor.py
Added failure_message: str field to PlaybookStep; implemented placeholder binding for failure messages in parameter substitution; introduced _step_error helper to conditionally use step's custom failure message or fall back to default error text.
Playbook Recovery Enhancement
playbooks/recovery.py, playbooks/runner.py
Added _surface_llm_failure helper to aggregate original step failure and LLM recovery failure into combined StepResult fields; guards early return when LLM handoff fails; ensured failed steps are appended to step_results before abort.
Fallback Streaming
api/recording_service.py, api/runs/service.py
Added conditional fallback to persisted trace/event streams when live upstream fails; tracks bytes successfully sent to determine whether fallback is safe; persisted fallback only triggered if no bytes were streamed before error.
Session Finalization
agent/session/finalizer.py
Renamed persist(outcome) to publish_status(outcome) for status publishing; added new parameterless persist() for recording artifact persistence; updated call order to: publish_status()cleanup()persist(). Both methods now wrap exceptions in broad try/catch with warning-level logging.
Configuration & Guardrails
api/models.py
Added stuck_revisit_gap field to GuardrailSettings with bounds 1 ≤ value ≤ 50 and default 5.
Test Coverage
tests/test_actionlog.py, tests/test_bridge_execution.py, tests/test_bridge_router.py, tests/test_playbooks.py, tests/test_recording_service.py, tests/test_run_service.py, tests/test_session_finalizer.py, tests/test_streaming.py
Added comprehensive test suite validating run-id namespacing of persisted logs, action router orchestration, bridge execution control flow, playbook failure messaging and recovery behavior, streaming fallback mechanics, and finalizer execution order.

Sequence Diagrams

sequenceDiagram
    participant Agent as Agent Session
    participant Router as ActionRouter
    participant Logger as persist_action_log
    participant Disk as Local Disk

    Agent->>Router: execute_action(run_id="run-123")
    Router->>Router: _record_action(entry, run_id)
    Router->>Logger: persist_action_log(entry, run_id="run-123")
    Logger->>Logger: sanitize_run_id("run-123")
    Logger->>Disk: os.makedirs("run-123/", exist_ok=True)
    Logger->>Disk: write(run-123/action_N.json)
    Logger-->>Router: path to persisted file
    Router-->>Agent: ActionResult
Loading
sequenceDiagram
    participant Client as HTTP Client
    participant Service as RecordingService
    participant Upstream as Upstream Stream
    participant Storage as Persisted Storage

    Client->>Service: get_trace(run_id)
    Service->>Upstream: stream upstream
    Upstream->>Service: yield chunk 1
    Service->>Service: bytes_sent += len(chunk)
    Upstream-->>Service: HTTPError before chunk 2
    
    alt bytes_sent == 0
        Service->>Storage: load persisted trace.zip
        Storage-->>Service: bytes(trace.zip)
        Service-->>Client: persisted bytes
    else bytes_sent > 0
        Service-->>Client: partial upstream bytes only
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch harden-stream

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant