Refactor run/session lifecycle into dedicated packages and tighten browser execution boundaries#47
Conversation
…owser execution boundaries
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
📝 WalkthroughWalkthroughReorganizes sandbox session lifecycle into a new Changes
Sequence Diagram(s)sequenceDiagram
participant Agent as Agent/Main
participant Runner as run_sandbox_session
participant Browser as BrowserManager
participant Recording as RecordingManager
participant Blinders as DOMBlinders
participant Router as ActionRouter
participant LLM as Agent/Model
participant Finalizer as RunFinalizer
Agent->>Runner: run_sandbox_session(run_id, config, ...)
activate Runner
Runner->>Browser: Launch browser
activate Browser
alt recording enabled
Runner->>Recording: Start recording
activate Recording
end
Runner->>Blinders: Extract task scope & build blinders
Runner->>Router: Create ActionRouter(browser, blinders, ...)
loop agent loop
Runner->>LLM: run_agent(router,...)
LLM->>Router: push_action / execute
Router->>Browser: perform browser action
Router->>Blinders: apply DOM blinders / guardrails
Router-->>LLM: action result
end
Runner->>Finalizer: finalize(outcome, result)
activate Finalizer
Finalizer->>Finalizer: persist status (complete_run / persist_status)
alt recording present
Finalizer->>Recording: stop & commit/upload recordings
deactivate Recording
end
Finalizer->>Browser: close browser
deactivate Browser
Finalizer->>Finalizer: emit telemetry & update metrics
deactivate Finalizer
deactivate Runner
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
agent/session/finalizer.py (1)
101-105: Consider adding a type hint for theresultparameter.The
resultparameter lacks a type annotation, which reduces IDE support and type checker coverage. If circular import is a concern, consider usingTYPE_CHECKINGwith a string annotation.💡 Suggested type annotation
+ from typing import TYPE_CHECKING + if TYPE_CHECKING: + from agent.loop import AgentResult + `@classmethod` def from_agent_result( cls, - result, + result: "AgentResult", ) -> RunOutcome:Or at minimum, document the expected interface in a docstring.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agent/session/finalizer.py` around lines 101 - 105, The classmethod from_agent_result is missing a type annotation for its result parameter; add an appropriate type hint (e.g., the AgentResult type or a protocol describing the expected interface) to the result parameter in from_agent_result and update imports using typing.TYPE_CHECKING with a string annotation if needed to avoid circular imports, or use a forward-reference string like "AgentResult" and/or document the expected attributes in the method docstring; ensure the return type remains RunOutcome and adjust imports to include typing.TYPE_CHECKING if you import AgentResult only for type checking.tests/test_session_finalizer.py (1)
16-18: Consider documenting the stub's expected fields.The
_AgentResultstub declares onlysuccess: boolbut tests pass additional fields viaSimpleNamespacekwargs. While this works, it could mask issues if the realAgentResultschema changes.💡 Optional: Add comments documenting expected fields
class _AgentResult(SimpleNamespace): + """Stub for AgentResult. Expected fields depend on test: + - success, summary, data, extracted_texts, error (for from_agent_result) + - action_count, total_input_tokens, total_output_tokens, total_duration_ms (for finalize) + """ success: bool🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_session_finalizer.py` around lines 16 - 18, The _AgentResult test stub currently subclasses SimpleNamespace and only declares success: bool, which can hide missing fields passed via kwargs; update the _AgentResult stub in tests/test_session_finalizer.py to document (via an inline comment or docstring) the expected fields used by tests (e.g., success, message, output, etc.) and, optionally, add explicit attributes with types to match the real AgentResult shape so the tests fail if the real schema changes; reference the _AgentResult class and SimpleNamespace base when making these additions.agent/session/runner.py (1)
70-76: Consider extracting finalizer creation to avoid duplication.The
RunFinalizeris created twice: once before setup (withrecording=None) and again after successful setup (with the actualrecording). While this is logically correct (early failure has no recording), the duplication could be reduced.💡 Optional: Create finalizer only after setup or use a builder pattern
One approach is to defer finalizer creation until after setup:
browser = BrowserManager() - finalizer = RunFinalizer( - run_id=run_id, - browser=browser, - recording=recording, - recording_upload=config.recording_config.upload, - ) + finalizer: RunFinalizer | None = None try: with tracer.start_as_current_span(AGENT_SETUP) as setup_span: # ... setup code ... except Exception as exc: logger.error("Setup failed: %s", exc) run_span.record_exception(exc) outcome = RunOutcome.setup_failed(run_id, exc) run_span.set_status(outcome.trace_status, outcome.trace_message or "") + finalizer = RunFinalizer(run_id=run_id, browser=browser, recording=None, recording_upload=False) return await finalizer.finalize(outcome) finalizer = RunFinalizer( run_id=run_id, browser=browser, recording=recording, recording_upload=config.recording_config.upload, )This makes it clearer that the finalizer configuration depends on setup success.
Also applies to: 117-122
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agent/session/runner.py` around lines 70 - 76, Refactor to eliminate the duplicated RunFinalizer construction by deferring or centralizing its creation: remove the initial RunFinalizer(...) instantiation before setup and instead create the finalizer after setup succeeds using the actual recording (use run_id, BrowserManager(), recording, and config.recording_config.upload), or add a small helper/builder function (e.g., build_finalizer(run_id, browser, recording, upload)) that both pre- and post-setup code paths call with the appropriate recording value; update any references to the finalizer variable accordingly (locations around the current RunFinalizer usage and the later re-creation at lines that reference recording and finalizer).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/runs/store.py`:
- Around line 20-22: The constructor __init__ should annotate the volume
parameter as optional (e.g., Optional[...] or the correct volume type) and store
it as an Optional on self; then guard any usage of self._volume.reload.aio()
(and similar calls) with a None check or early return so that when volume is
None (local/non-Modal) you don't call attributes on None; update the type import
(typing.Optional) and ensure methods that call self._volume (referenced by
self._volume.reload.aio()) handle the None case safely.
- Around line 62-64: The SSE "complete" event currently serializes only the
status field; update the yield in the generator that emits the complete event so
it includes the full RunStatus object from the persisted run (the same RunStatus
that contains result, error, data, duration_ms, etc.) instead of {'status':
persisted.status}. Serialize the entire RunStatus (e.g., convert
persisted.status to a dict/JSON via its existing
to_dict()/dict()/asdict()/__dict__ or JSON helper) and use json.dumps on that
object when forming the f"event: complete\ndata: ..." payload so clients receive
the full run outcome.
---
Nitpick comments:
In `@agent/session/finalizer.py`:
- Around line 101-105: The classmethod from_agent_result is missing a type
annotation for its result parameter; add an appropriate type hint (e.g., the
AgentResult type or a protocol describing the expected interface) to the result
parameter in from_agent_result and update imports using typing.TYPE_CHECKING
with a string annotation if needed to avoid circular imports, or use a
forward-reference string like "AgentResult" and/or document the expected
attributes in the method docstring; ensure the return type remains RunOutcome
and adjust imports to include typing.TYPE_CHECKING if you import AgentResult
only for type checking.
In `@agent/session/runner.py`:
- Around line 70-76: Refactor to eliminate the duplicated RunFinalizer
construction by deferring or centralizing its creation: remove the initial
RunFinalizer(...) instantiation before setup and instead create the finalizer
after setup succeeds using the actual recording (use run_id, BrowserManager(),
recording, and config.recording_config.upload), or add a small helper/builder
function (e.g., build_finalizer(run_id, browser, recording, upload)) that both
pre- and post-setup code paths call with the appropriate recording value; update
any references to the finalizer variable accordingly (locations around the
current RunFinalizer usage and the later re-creation at lines that reference
recording and finalizer).
In `@tests/test_session_finalizer.py`:
- Around line 16-18: The _AgentResult test stub currently subclasses
SimpleNamespace and only declares success: bool, which can hide missing fields
passed via kwargs; update the _AgentResult stub in
tests/test_session_finalizer.py to document (via an inline comment or docstring)
the expected fields used by tests (e.g., success, message, output, etc.) and,
optionally, add explicit attributes with types to match the real AgentResult
shape so the tests fail if the real schema changes; reference the _AgentResult
class and SimpleNamespace base when making these additions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3644c5ba-722f-4d45-b4c5-080407ec20c1
📒 Files selected for processing (28)
README.mdagent/main.pyagent/session/__init__.pyagent/session/finalizer.pyagent/session/runner.pyagent/session_runner.pyapi/recording_service.pyapi/runs/__init__.pyapi/runs/registry.pyapi/runs/service.pyapi/runs/store.pyapi/server.pyblinders/verifier.pybridge/browser.pybridge/execution.pybridge/observation.pybridge/router.pyevaluation/runner.pyplaybooks/recovery.pyplaybooks/runner.pyscripts/run_local.pytests/test_blinders.pytests/test_dry_run.pytests/test_playbooks.pytests/test_run_registry.pytests/test_run_service.pytests/test_session_finalizer.pytests/test_streaming.py
💤 Files with no reviewable changes (3)
- evaluation/runner.py
- scripts/run_local.py
- agent/session_runner.py
Summary
This PR reorganizes the run/session lifecycle code into clearer packages, extracts repeated lifecycle
logic into dedicated helpers, and tightens the bridge/playbook boundaries to make the codebase easier to
reason about and test.
What changed
Refactors
Tests
Added:
Updated:
Verification
test_session_finalizer.py tests/test_dry_run.py tests/test_playbooks.py
Summary by CodeRabbit
New Features
Documentation
Refactor
Tests