Run Ralph read-before-edit UI pilot by bglusman · Pull Request #70 · bglusman/wardwright

bglusman · 2026-05-22T16:58:23Z

Summary

adds the first Ralph UI-loop pilot report for the read-before-edit tool misuse scenario
captures screenshot evidence and follow-up work items under docs/ralph-runs/2026-05-22-read-before-edit-tool-misuse
fixes the Control Debugger default Workbench pattern mapping for read-before-edit and output-contract examples
documents that future Ralph runs must prove UI, MCP, and CLI/debugging-help symmetry from cold start

Validation

mise exec -- mix format --check-formatted
mise exec -- mix test --no-deps-check test/workbench_test.exs
full pre-commit/push gate: mise check

Adversarial Review

Useful: the loop found a real UI mismatch instead of only producing a scripted happy path.
Risk: OpenCode was only exercised through UI handoff export, not native resumed live-agent execution; follow-up RALPH-RBE-002 keeps that explicit.
Risk: example-to-pattern mapping is still a small hardcoded table; acceptable for the current example set but should become data-driven if examples grow.
Test quality: regression tests assert behavior visible to users: read-before-edit saves to tool-governance and output-contract saves to ambiguous-success. The tests are capable of failing on the original mismatch.

Summary by Sourcery

Strengthen the read-before-edit control debugger loop by exposing it consistently across UI, MCP, CLI, and protected HTTP APIs, and by formalizing harness state-fidelity probing and verification without over-claiming equivalent agent resume.

New Features:

Expose Control Debugger workflows (list examples, record examples, load traces, replay/fork at cursors, save evidence) as protected HTTP endpoints and MCP tools, and surface them via wardwright tools.
Add harness adapter state-fidelity probes and a verification API/tool that compare exported trace/tool fingerprints against imported harness state while keeping equivalent resume claims conservative.
Introduce Ralph loop supervision docs and an OpenCode import trial log to track and constrain the ongoing read-before-edit investigation.

Bug Fixes:

Fix default Workbench pattern mapping so read-before-edit examples target tool-governance and output-contract examples target ambiguous-success.
Ensure test router cases use isolated transcript store directories to avoid cross-test leakage.

Enhancements:

Enrich Control Debugger UI with clearer read-before-edit violation text, context-aware fork recommendations, and explicit harness handoff/verification guidance.
Extend harness adapter metadata with resume-claim status, state-fidelity verification checklists, and export sidecar artifacts, plus improved warnings in UI and exports.
Tighten CLI help and agent-authoring docs to position wardwright tools as a cold-start entrypoint and document environment-based storage options and MCP/CLI/UI symmetry expectations.

Tests:

Add MCP, HTTP, CLI, UI, and adapter tests that drive the read-before-edit loop end-to-end through non-UI controls and validate state-fidelity probes and verification behavior.

sourcery-ai · 2026-05-22T16:58:55Z

Reviewer's Guide

Introduce a Ralph read-before-edit UI-loop pilot that aligns Control Debugger, MCP tools, CLI, and harness exports around the read-before-edit and output-contract scenarios, adds harness state-fidelity probes and verification, and fixes default pattern routing and UI messaging, all covered by MCP/HTTP/CLI/UI tests and documentation updates.

Sequence diagram for harness state-fidelity verification via MCP and HTTP

sequenceDiagram
  actor Operator
  participant MCPClient
  participant MCPServer
  participant VerifyTool as VerifyHarnessStateFidelity
  participant Router
  participant Adapters as AgentHarnessAdapters

  Operator->>MCPClient: call verify_harness_state_fidelity(probe, observed)
  MCPClient->>MCPServer: tool invocation
  MCPServer->>VerifyTool: execute(params, frame)
  VerifyTool->>Adapters: verify_state_fidelity(probe, observed)
  Adapters-->>VerifyTool: verification map
  VerifyTool-->>MCPServer: reply_json({verification})
  MCPServer-->>MCPClient: MCP result
  MCPClient-->>Operator: show verification.status

  Operator->>Router: POST /v1/policy-authoring/harness-adapters/state-fidelity/verify
  Router->>Adapters: verify_state_fidelity(probe, observed)
  Adapters-->>Router: verification map
  Router-->>Operator: {"verification": verification}

File-Level Changes

Change	Details	Files
Add harness state-fidelity probing, verification, and explicit resume-claim status to agent harness adapters and exports.	Introduce verify_state_fidelity/2 to compare probe vs observed trace/tool-result fingerprints and read-before-edit cursor identification, returning a structured verification result. Augment adapter metadata with equivalent_agent_resume precompute, resume_claim_status, missing_fidelity, and state_fidelity_verification checklist, plus state_fidelity_probe generation from trace events. Ensure exports include a state-fidelity probe sidecar file, saved alongside artifacts for JSON and prompt handoff formats, and surface verification guidance and warnings in Control Debugger UI and harness tests.	`app/lib/wardwright_web/agent_harness_adapters.ex` `app/src/wardwright/harness_adapter.gleam` `app/lib/wardwright_web/control_debugger_data.ex` `app/test/agent_harness_adapters_test.exs` `app/src/wardwright/lustre_control_debugger_test_support.gleam` `app/test/gleam_policy_core_test.exs`
Expose Control Debugger and harness capabilities via protected HTTP APIs, MCP tools, and CLI discovery to support a UI-free Ralph loop.	Add Control Debugger HTTP endpoints for listing examples, recording examples, loading traces, replaying and forking cursors, and saving evidence, plus a protected harness state-fidelity verification endpoint. Implement ControlDebuggerTools to wrap existing debugger operations into structured maps suitable for HTTP/MCP, including example metadata, trace summaries, replay/fork execution, and scenario creation from trace evidence. Add MCP tool components for Control Debugger flows, harness listing/export, receipt replay, and harness state-fidelity verification, and register them in the MCP server and policy-authoring tool catalog so `wardwright tools` and MCP discovery surface them with safety notes. Extend CLI help and `wardwright tools` tests to cover new tools and environment variables, treating CLI as a cold-start help surface for agents.	`app/lib/wardwright/router.ex` `app/lib/wardwright_web/control_debugger_tools.ex` `app/lib/wardwright_web/mcp_server.ex` `app/lib/wardwright_web/mcp/tools/*.ex` `app/lib/wardwright_web/policy_authoring_tools.ex` `app/lib/wardwright/cli.ex` `app/test/mcp_authoring_test.exs` `app/test/public_api_test.exs` `app/test/cli_test.exs`
Tighten Control Debugger UI behavior for read-before-edit and output-contract examples, including default Workbench pattern routing and violation messaging.	Add default_pattern_id_for_example/1 to route read-before-edit to tool-governance and output-contract to ambiguous-success, and wire the Lustre Control Debugger UI to use example-specific defaults and update pattern selection when the example changes or records. Enhance transcript summaries to compute per-path read-before-edit status, labeling offending edit_file events with explicit violation text and improved fork-point recommendations, and expose scenario storage health messaging for simulator cases. Expand Gleam/Lustre UI test support to assert exact read-before-edit violation text, scenario targeting, updated harness export messaging, and new select value helpers, and update LiveView smoke tests to script the record/save workflow via browser automation.	`app/lib/wardwright_web/control_debugger_data.ex` `app/src/wardwright/lustre_control_debugger.gleam` `app/src/wardwright/lustre_control_debugger_test_support.gleam` `app/test/workbench_test.exs` `scripts/browser-smoke/liveview-playback.mjs`
Document Ralph loop acceptance criteria and the read-before-edit pilot run, emphasizing UI/MCP/CLI symmetry and harness fidelity limits.	Add Ralph Loop Acceptance Criteria to the Control Debugger docs, requiring evidence across UI, MCP, and CLI surfaces and cross-checking of traces and scenarios. Update agent authoring docs to position `wardwright tools` as the cold-start help surface and to describe symmetry expectations between UI, MCP, and CLI, plus new harness fidelity and verification behavior. Introduce a Ralph Loop Supervisor doc tracking the read-before-edit continuation loops, findings, and the RALPH-RBE-002 follow-up about OpenCode import/resume equivalence.	`docs/agent-control-debugger.md` `docs/agent-authoring.md` `docs/ralph-runs/ralph-loop-supervisor.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

gemini-code-assist

Code Review

This pull request introduces the "Ralph Loop Acceptance Criteria" to ensure functional symmetry across UI, MCP, and CLI surfaces, supported by a detailed pilot run report for tool misuse scenarios. Key technical changes include the implementation of a mapping between counterfactual examples and default workbench patterns, updates to the Lustre Control Debugger model to synchronize these patterns during example selection and transcript loading, and the addition of corresponding regression tests. Feedback was provided regarding the need to consistently update the pattern ID in both the success and failure branches of the transcript loading logic to ensure the UI state remains synchronized.

gemini-code-assist · 2026-05-22T17:03:24Z

+                pattern_id: external_default_pattern_id_for_example(
+                  model.example_id,
+                ),


The pattern_id is explicitly updated here when the transcript loads successfully, but it is not updated in the failure branch (line 327). While ExampleChanged handles the initial sync, if the intent of this fix is to ensure the pattern picker is reset to the example's default after a recording run, it should be applied consistently regardless of whether the transcript loading succeeded, as long as the recording itself was successful.

sourcery-ai

Hey - I've reviewed your changes and they look great!

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

Copilot

Pull request overview

Adds a first “Ralph UI-loop” pilot report for the read-before-edit tool-misuse scenario while fixing a Control Debugger UX mismatch where recording/selecting certain counterfactual examples defaulted to the wrong Workbench pattern. This helps keep Control Debugger evidence capture aligned with the correct policy/workbench slice (tool governance vs output contract review), and documents expectations for UI/MCP/CLI symmetry in future runs.

Changes:

Document a full Ralph run (steps, evidence, screenshots list, and follow-ups) under docs/ralph-runs/2026-05-22-read-before-edit-tool-misuse/.
Update the Lustre Control Debugger to pick a default Workbench pattern based on the selected/recorded counterfactual example.
Add regression tests asserting the example→pattern mapping (read-before-edit → tool-governance, output-contract → ambiguous-success).

Reviewed changes

Copilot reviewed 13 out of 24 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
docs/ralph-runs/2026-05-22-read-before-edit-tool-misuse/ui-pilot-final-evidence.txt	Captures final UI evidence text for the pilot after the mapping fix.
docs/ralph-runs/2026-05-22-read-before-edit-tool-misuse/ui-pilot-before-change-evidence.json	Structured “before change” evidence index + text samples for screenshots.
docs/ralph-runs/2026-05-22-read-before-edit-tool-misuse/ui-friction-default-pattern-before-change.txt	Evidence of the prior default-pattern mismatch in the UI.
docs/ralph-runs/2026-05-22-read-before-edit-tool-misuse/README.md	Narrative pilot report: scenario, repro path, observed failure, and outcomes.
docs/ralph-runs/2026-05-22-read-before-edit-tool-misuse/followups.yml	Follow-up work items derived from the pilot (UI clarity, capability gaps, reliability, test gap).
docs/ralph-runs/2026-05-22-read-before-edit-tool-misuse/evidence.md	Run commands, URLs, screenshots list, receipts/transcripts IDs, and validation results.
docs/agent-control-debugger.md	Adds Ralph loop acceptance criteria focused on UI/MCP/CLI parity and “cold start” usability.
docs/agent-authoring.md	Documents `wardwright tools` as the cold-start entrypoint and reiterates UI/MCP/CLI symmetry expectations.
app/test/workbench_test.exs	Adds a regression test that exercises the example→pattern targeting behavior.
app/src/wardwright/lustre_control_debugger.gleam	Uses an example-aware default-pattern function during init, example changes, and example recording.
app/src/wardwright/lustre_control_debugger_test_support.gleam	Adds test helpers to assert the selected Workbench pattern after recording examples.
app/lib/wardwright/cli.ex	Clarifies CLI help text expectations for `wardwright tools` as an agent-facing discovery surface.
app/lib/wardwright_web/control_debugger_data.ex	Introduces `default_pattern_id_for_example/1` mapping counterfactual examples to the appropriate Workbench pattern id.

+  def default_pattern_id_for_example("read-before-edit"), do: "tool-governance"
+  def default_pattern_id_for_example("output-contract"), do: "ambiguous-success"
+  def default_pattern_id_for_example(_example_id), do: default_pattern_id()


bglusman · 2026-05-22T22:21:50Z

Ralph loop continuation update:

Consolidated loops 1-4 onto this PR branch (codex/ralph-ui-loop-pilot-1), so there is now one branch and PR stream instead of one branch per loop.
Deleted remote follow-up branches codex/ralph-loop-followups-1 and codex/ralph-loop-followups-2 after fast-forwarding their commits here.
Replaced the noisy per-run evidence directories with docs/ralph-runs/ralph-loop-supervisor.md, a compact tracker with the loop budget, validation notes, and open/closed followups.
Local validation and push hooks ran full mise check; GitHub checks are now running on the consolidated PR branch.

Adversarial review: the main tradeoff is losing raw screenshots and JSON captures from earlier Ralph loops. I think that is the right call: the artifacts were large and review-hostile, while the retained tracker plus commits/tests preserve the actionable evidence. Open risk remains RALPH-RBE-002: OpenCode import/resume still should not be claimed as equivalent live-agent resume until a real state-fidelity trial proves it.

bglusman · 2026-05-22T22:48:40Z

Ralph loop 5 update:

Added machine-readable harness resume status so OpenCode export is clearly unverified_best_effort_handoff, not equivalent native resume.
Added state_fidelity_verification metadata and checklist to adapter payloads, MCP/API export responses, and Control Debugger export facts.
Updated tests across Gleam contract, adapter exports, and MCP authoring surfaces.
Updated docs/ralph-runs/ralph-loop-supervisor.md; 8 continuation cycles remain.

Validation:

mise run check:docs
cd app && MIX_ENV=test mise exec -- mix test (393 passed, 21 properties, 6 excluded)
Pre-push mise check passed locally.

Adversarial review: this does not close RALPH-RBE-002. It deliberately makes that proof gap harder to miss by exposing the verification requirement and next-step checklist wherever an adapter/export is consumed.

bglusman · 2026-05-22T23:18:22Z

Ralph loop 6 update:

Added state_fidelity_probe to harness exports so the next OpenCode import/resume trial has concrete trace and tool-result fingerprints to compare, not only prose instructions.
Saved harness exports now include a private wardwright-state-fidelity-probe.json sidecar for both OpenCode session JSON and prompt handoff adapters.
Updated direct export and MCP tests to assert the probe contract.
Fixed RouterCase transcript-store isolation after the push gate caught duplicate full-session transcript events from persistent state leaking across Mix invocations.
Updated docs/ralph-runs/ralph-loop-supervisor.md; 7 continuation cycles remain.

Validation:

cd app && MIX_ENV=test mise exec -- mix test (393 passed, 21 properties, 6 excluded)
mise run check:docs
Full mise check passed locally and in the pre-push gate.

Adversarial review: this still does not close RALPH-RBE-002. The probe makes the future state-fidelity trial falsifiable, and the RouterCase isolation fix removes test pollution without changing production transcript behavior.

bglusman · 2026-05-22T23:34:07Z

Ralph loop 7 update:

Added verify_harness_state_fidelity as a protected API and MCP tool for comparing an exported state_fidelity_probe with observed imported-harness state.
The verifier checks trace fingerprint, tool-result fingerprints, and read-before-edit cursor identification.
Tool-result comparison is multiset-based, so duplicated tool evidence cannot pass with only one observed copy.
The result keeps equivalent_agent_resume_claim_allowed=false even on a probe match, because this still does not prove hidden/private harness state or workspace snapshot fidelity.
Updated the authoring tool catalog, debugger docs, and compact loop tracker; 6 continuation cycles remain.

Validation:

cd app && MIX_ENV=test mise exec -- mix test (395 passed, 21 properties, 6 excluded)
mise run check:docs
mise run check:maps
Full mise check passed locally and in the pre-push gate.

Adversarial review: this moves RALPH-RBE-002 from an exported probe to a falsifiable verification step, but it intentionally does not close RBE-002. I also caught and fixed a stale docs anchor before push.

bglusman · 2026-05-22T23:47:47Z

Ralph loop 8 update:

Clarified the Control Debugger harness handoff facts so the operator sees the saved handoff artifact, state-fidelity probe file, handoff command, and verify_harness_state_fidelity follow-up from the same UI flow.
Kept the wording adapter-neutral (Handoff artifact / Handoff command) so prompt-only adapters are not misrepresented as native imports.
Added CLI help and JSON registry coverage for verify_harness_state_fidelity so agents discover the verifier from wardwright tools as well as MCP/API metadata.
Updated the compact loop tracker; 5 continuation cycles remain.

Validation:

cd app && gleam format --check src
cd app && gleam check --target erlang
cd app && mix format --check-formatted
cd app && mix test (395 passed, 21 properties, 6 excluded)
Full mise check passed locally and again in the pre-push gate.

Adversarial review: this does not close RALPH-RBE-002. It only removes a usability gap between export and verification. I caught and fixed one wording issue before push: the first draft said Import artifact, which was too strong for lower-fidelity prompt handoffs.

bglusman · 2026-05-23T00:01:14Z

Focused RBE-002 loop update pushed in 6e5a386.

Result: OpenCode import/fork works as best-effort evidence handoff, but does not preserve Wardwright trace/tool results as native OpenCode tool state.

What I verified:

current OpenCode config is forge/forge backed by gemma4:26b-a4b-it-q4_K_M;
opencode import created imported session ses_wwQj1zkBdHxnvqJh4n8JSODMku;
OpenCode DB stores the imported trace as text/step parts, not native tool-call/tool-result parts;
opencode run --session ... --fork --model forge/forge created a fork and correctly concluded that edit_file occurred before any read_file of README.md;
verify_state_fidelity correctly reports probe_mismatch for the actual observed imported state because trace/tool-result fingerprints are not exposed by the import.

Conclusion: keep PR #70's conservative fidelity warnings. The handoff is useful for human/model continuation, but it is not equivalent agent resume.

Local validation before push: full mise check passed.

bglusman · 2026-05-23T02:49:51Z

@sourcery-ai review

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

The new verify_state_fidelity/2 path in AgentHarnessAdapters is doing several distinct things (event extraction, fingerprinting, comparison, status assembly); consider extracting the event-normalization and diffing logic into smaller private helpers or a separate module to keep the main verifier easier to follow and reuse.
ControlDebuggerTools now reimplements several concepts already present in ControlDebuggerData (event labels, details, fork recommendations, suggested cursors); it may be worth centralizing this summarization logic in one place to avoid future drift between UI and MCP/API behavior.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The new `verify_state_fidelity/2` path in `AgentHarnessAdapters` is doing several distinct things (event extraction, fingerprinting, comparison, status assembly); consider extracting the event-normalization and diffing logic into smaller private helpers or a separate module to keep the main verifier easier to follow and reuse.
- `ControlDebuggerTools` now reimplements several concepts already present in `ControlDebuggerData` (event labels, details, fork recommendations, suggested cursors); it may be worth centralizing this summarization logic in one place to avoid future drift between UI and MCP/API behavior.

## Individual Comments

### Comment 1
<location path="app/lib/wardwright_web/control_debugger_tools.ex" line_range="318" />
<code_context>
+
+  defp event_detail(_event), do: "recorded event"
+
+  defp compact_json(value), do: value |> JSON.encode!() |> String.slice(0, 180)
+
+  defp overlay_for_example(example_id) do
</code_context>
<issue_to_address>
**issue (bug_risk):** Consider making compact_json resilient to JSON encoding failures

compact_json/1 calls JSON.encode!/1 directly, so any non-encodable value in an event (e.g., structs or unknown types) will raise and crash the endpoint. Since these values come from runtime traces, it’d be safer to follow the safe_json pattern (encode with a rescue that falls back to inspect/1) so we return a structured error instead of failing the request.
</issue_to_address>

### Comment 2
<location path="app/lib/wardwright_web/control_debugger_tools.ex" line_range="199" />
<code_context>
+    end
+  end
+
+  defp selected_trace_events(_events, _cursor), do: {:error, "session_id and trace_cursor are required"}
+
+  defp scenario_from_trace(pattern_id, session_id, cursor, selected, events, params) do
</code_context>
<issue_to_address>
**nitpick:** Error message in selected_trace_events mentions session_id even though it only receives events and cursor

Since this function only takes `events` and `cursor`, the current `{:error, "session_id and trace_cursor are required"}` message can be misleading when it bubbles up via `save_evidence`. The actual precondition here is just a non-blank cursor. Please update the message to something like `"trace_cursor is required"` and let higher-level code add any session_id context if needed.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-05-23T02:51:22Z

+
+  defp event_detail(_event), do: "recorded event"
+
+  defp compact_json(value), do: value |> JSON.encode!() |> String.slice(0, 180)


issue (bug_risk): Consider making compact_json resilient to JSON encoding failures

compact_json/1 calls JSON.encode!/1 directly, so any non-encodable value in an event (e.g., structs or unknown types) will raise and crash the endpoint. Since these values come from runtime traces, it’d be safer to follow the safe_json pattern (encode with a rescue that falls back to inspect/1) so we return a structured error instead of failing the request.

sourcery-ai · 2026-05-23T02:51:22Z

+    end
+  end
+
+  defp selected_trace_events(_events, _cursor), do: {:error, "session_id and trace_cursor are required"}


nitpick: Error message in selected_trace_events mentions session_id even though it only receives events and cursor

Since this function only takes events and cursor, the current {:error, "session_id and trace_cursor are required"} message can be misleading when it bubbles up via save_evidence. The actual precondition here is just a non-blank cursor. Please update the message to something like "trace_cursor is required" and let higher-level code add any session_id context if needed.

bglusman · 2026-05-23T03:16:44Z

UI follow-up pushed in 908a469. This applies the blind design-review pass directly to this PR: clearer example-vs-live-model framing, simpler labels, a concise latest-run summary, technical rule names hidden behind details, and updated browser smoke coverage for the renamed registered/live model surface. Validation: mise check passed locally and in the push hook.

bglusman · 2026-05-23T03:54:35Z

Updated the Ralph/Lustre branch through 58e14d5. The legacy /policies LiveView workbench code and assets are removed; /policies now only redirects to /admin, preserving model links. The Lustre admin now carries the example model library, and the browser smoke target is the Lustre workbench. Local full mise check and pre-push checks passed; GitHub checks are still running.

Spike Pi and OMP replay exports

bglusman added 2 commits May 22, 2026 12:40

Run Ralph read-before-edit UI pilot

f8a1637

Document Ralph control surface symmetry

9c0e7ab

gemini-code-assist Bot reviewed May 22, 2026

View reviewed changes

bglusman added 2 commits May 22, 2026 13:28

Run Ralph followups for control debugger

851d4f5

Add Control Debugger MCP trace controls

a434531

bglusman marked this pull request as ready for review May 22, 2026 21:19

Copilot AI review requested due to automatic review settings May 22, 2026 21:19

Copilot started reviewing on behalf of bglusman May 22, 2026 21:19 View session

sourcery-ai Bot reviewed May 22, 2026

View reviewed changes

Copilot AI reviewed May 22, 2026

View reviewed changes

Consolidate Ralph run evidence

d6b1bb2

Expose harness resume verification status

0f16020

Add harness state fidelity probe

0b72e77

Add harness state fidelity verifier

267eb03

Clarify harness verification handoff

50a2a86

Record OpenCode import fidelity trial

6e5a386

sourcery-ai Bot reviewed May 23, 2026

View reviewed changes

Clarify workbench UI language

908a469

bglusman added 2 commits May 22, 2026 23:46

Remove legacy policy LiveView workbench

a0c7412

Clarify legacy policy route redirect

58e14d5

bglusman added 29 commits May 23, 2026 19:06

Complete adapter install validation loop

ed98049

Keep adapter Ralph runner alive after completion

493d455

Handle stale adapter Ralph runner locks

3282b0d

Continue adapter Ralph follow-up backlog

cdd38f5

Add OpenCode surface probe verification

c84757f

Record OpenCode surface probe Ralph loop

f3ebafe

Downgrade OpenCode native install claim

99dd52d

Record OpenCode native adapter Ralph loop

d7339fa

Add Pi adapter metadata lifecycle

d155931

Record Pi adapter Ralph loop

3abf952

Add Claude Code adapter identity lifecycle

13228dc

Record Claude Code adapter Ralph loop

b7d555c

Add OpenClaw runtime resolution

56acbc2

Record OpenClaw adapter Ralph completion

d8c61b7

Fix dynamic adapter metadata drift

27af12f

Record adapter clean-state validation

48afc58

Scope framework adapter priorities

284a7e1

Add framework adapter Ralph loop

d4e74e6

Define framework adapter contract foundation

ed355c7

Add Vercel AI SDK adapter smoke

22180a1

Add LangChain LangGraph adapter smoke

76c30eb

Add Pydantic AI adapter smoke

0284546

Add OpenAI Agents SDK adapter smoke

a144423

Add Microsoft Extensions AI adapter smoke

3378f76

Add LlamaIndex adapter smoke

12f793f

Add reusable framework smoke test support

9b0bfab

Complete framework adapter validation loop

364a9f8

Address adapter review feedback

fcf4a30

Merge pull request #71 from bglusman/codex/pi-replay-spike

4f20fc6

Spike Pi and OMP replay exports

bglusman merged commit d8944a2 into main May 24, 2026
6 checks passed


		defp event_detail(_event), do: "recorded event"

		defp compact_json(value), do: value \|> JSON.encode!() \|> String.slice(0, 180)

Conversation

bglusman commented May 22, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Adversarial Review

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for harness state-fidelity verification via MCP and HTTP

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

bglusman commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bglusman commented May 22, 2026

Uh oh!

bglusman commented May 22, 2026

Uh oh!

bglusman commented May 22, 2026

Uh oh!

bglusman commented May 22, 2026

Uh oh!

bglusman commented May 23, 2026

Uh oh!

bglusman commented May 23, 2026

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

bglusman commented May 23, 2026

Uh oh!

bglusman commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bglusman commented May 22, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented May 22, 2026 •

edited

Loading

bglusman commented May 22, 2026 •

edited

Loading