Skip to content

Run Ralph read-before-edit UI pilot#70

Merged
bglusman merged 78 commits into
mainfrom
codex/ralph-ui-loop-pilot-1
May 24, 2026
Merged

Run Ralph read-before-edit UI pilot#70
bglusman merged 78 commits into
mainfrom
codex/ralph-ui-loop-pilot-1

Conversation

@bglusman
Copy link
Copy Markdown
Owner

@bglusman bglusman commented May 22, 2026

Summary

  • adds the first Ralph UI-loop pilot report for the read-before-edit tool misuse scenario
  • captures screenshot evidence and follow-up work items under docs/ralph-runs/2026-05-22-read-before-edit-tool-misuse
  • fixes the Control Debugger default Workbench pattern mapping for read-before-edit and output-contract examples
  • documents that future Ralph runs must prove UI, MCP, and CLI/debugging-help symmetry from cold start

Validation

  • mise exec -- mix format --check-formatted
  • mise exec -- mix test --no-deps-check test/workbench_test.exs
  • full pre-commit/push gate: mise check

Adversarial Review

  • Useful: the loop found a real UI mismatch instead of only producing a scripted happy path.
  • Risk: OpenCode was only exercised through UI handoff export, not native resumed live-agent execution; follow-up RALPH-RBE-002 keeps that explicit.
  • Risk: example-to-pattern mapping is still a small hardcoded table; acceptable for the current example set but should become data-driven if examples grow.
  • Test quality: regression tests assert behavior visible to users: read-before-edit saves to tool-governance and output-contract saves to ambiguous-success. The tests are capable of failing on the original mismatch.

Summary by Sourcery

Strengthen the read-before-edit control debugger loop by exposing it consistently across UI, MCP, CLI, and protected HTTP APIs, and by formalizing harness state-fidelity probing and verification without over-claiming equivalent agent resume.

New Features:

  • Expose Control Debugger workflows (list examples, record examples, load traces, replay/fork at cursors, save evidence) as protected HTTP endpoints and MCP tools, and surface them via wardwright tools.
  • Add harness adapter state-fidelity probes and a verification API/tool that compare exported trace/tool fingerprints against imported harness state while keeping equivalent resume claims conservative.
  • Introduce Ralph loop supervision docs and an OpenCode import trial log to track and constrain the ongoing read-before-edit investigation.

Bug Fixes:

  • Fix default Workbench pattern mapping so read-before-edit examples target tool-governance and output-contract examples target ambiguous-success.
  • Ensure test router cases use isolated transcript store directories to avoid cross-test leakage.

Enhancements:

  • Enrich Control Debugger UI with clearer read-before-edit violation text, context-aware fork recommendations, and explicit harness handoff/verification guidance.
  • Extend harness adapter metadata with resume-claim status, state-fidelity verification checklists, and export sidecar artifacts, plus improved warnings in UI and exports.
  • Tighten CLI help and agent-authoring docs to position wardwright tools as a cold-start entrypoint and document environment-based storage options and MCP/CLI/UI symmetry expectations.

Tests:

  • Add MCP, HTTP, CLI, UI, and adapter tests that drive the read-before-edit loop end-to-end through non-UI controls and validate state-fidelity probes and verification behavior.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 22, 2026

Reviewer's Guide

Introduce a Ralph read-before-edit UI-loop pilot that aligns Control Debugger, MCP tools, CLI, and harness exports around the read-before-edit and output-contract scenarios, adds harness state-fidelity probes and verification, and fixes default pattern routing and UI messaging, all covered by MCP/HTTP/CLI/UI tests and documentation updates.

Sequence diagram for harness state-fidelity verification via MCP and HTTP

sequenceDiagram
  actor Operator
  participant MCPClient
  participant MCPServer
  participant VerifyTool as VerifyHarnessStateFidelity
  participant Router
  participant Adapters as AgentHarnessAdapters

  Operator->>MCPClient: call verify_harness_state_fidelity(probe, observed)
  MCPClient->>MCPServer: tool invocation
  MCPServer->>VerifyTool: execute(params, frame)
  VerifyTool->>Adapters: verify_state_fidelity(probe, observed)
  Adapters-->>VerifyTool: verification map
  VerifyTool-->>MCPServer: reply_json({verification})
  MCPServer-->>MCPClient: MCP result
  MCPClient-->>Operator: show verification.status

  Operator->>Router: POST /v1/policy-authoring/harness-adapters/state-fidelity/verify
  Router->>Adapters: verify_state_fidelity(probe, observed)
  Adapters-->>Router: verification map
  Router-->>Operator: {"verification": verification}
Loading

File-Level Changes

Change Details Files
Add harness state-fidelity probing, verification, and explicit resume-claim status to agent harness adapters and exports.
  • Introduce verify_state_fidelity/2 to compare probe vs observed trace/tool-result fingerprints and read-before-edit cursor identification, returning a structured verification result.
  • Augment adapter metadata with equivalent_agent_resume precompute, resume_claim_status, missing_fidelity, and state_fidelity_verification checklist, plus state_fidelity_probe generation from trace events.
  • Ensure exports include a state-fidelity probe sidecar file, saved alongside artifacts for JSON and prompt handoff formats, and surface verification guidance and warnings in Control Debugger UI and harness tests.
app/lib/wardwright_web/agent_harness_adapters.ex
app/src/wardwright/harness_adapter.gleam
app/lib/wardwright_web/control_debugger_data.ex
app/test/agent_harness_adapters_test.exs
app/src/wardwright/lustre_control_debugger_test_support.gleam
app/test/gleam_policy_core_test.exs
Expose Control Debugger and harness capabilities via protected HTTP APIs, MCP tools, and CLI discovery to support a UI-free Ralph loop.
  • Add Control Debugger HTTP endpoints for listing examples, recording examples, loading traces, replaying and forking cursors, and saving evidence, plus a protected harness state-fidelity verification endpoint.
  • Implement ControlDebuggerTools to wrap existing debugger operations into structured maps suitable for HTTP/MCP, including example metadata, trace summaries, replay/fork execution, and scenario creation from trace evidence.
  • Add MCP tool components for Control Debugger flows, harness listing/export, receipt replay, and harness state-fidelity verification, and register them in the MCP server and policy-authoring tool catalog so wardwright tools and MCP discovery surface them with safety notes.
  • Extend CLI help and wardwright tools tests to cover new tools and environment variables, treating CLI as a cold-start help surface for agents.
app/lib/wardwright/router.ex
app/lib/wardwright_web/control_debugger_tools.ex
app/lib/wardwright_web/mcp_server.ex
app/lib/wardwright_web/mcp/tools/*.ex
app/lib/wardwright_web/policy_authoring_tools.ex
app/lib/wardwright/cli.ex
app/test/mcp_authoring_test.exs
app/test/public_api_test.exs
app/test/cli_test.exs
Tighten Control Debugger UI behavior for read-before-edit and output-contract examples, including default Workbench pattern routing and violation messaging.
  • Add default_pattern_id_for_example/1 to route read-before-edit to tool-governance and output-contract to ambiguous-success, and wire the Lustre Control Debugger UI to use example-specific defaults and update pattern selection when the example changes or records.
  • Enhance transcript summaries to compute per-path read-before-edit status, labeling offending edit_file events with explicit violation text and improved fork-point recommendations, and expose scenario storage health messaging for simulator cases.
  • Expand Gleam/Lustre UI test support to assert exact read-before-edit violation text, scenario targeting, updated harness export messaging, and new select value helpers, and update LiveView smoke tests to script the record/save workflow via browser automation.
app/lib/wardwright_web/control_debugger_data.ex
app/src/wardwright/lustre_control_debugger.gleam
app/src/wardwright/lustre_control_debugger_test_support.gleam
app/test/workbench_test.exs
scripts/browser-smoke/liveview-playback.mjs
Document Ralph loop acceptance criteria and the read-before-edit pilot run, emphasizing UI/MCP/CLI symmetry and harness fidelity limits.
  • Add Ralph Loop Acceptance Criteria to the Control Debugger docs, requiring evidence across UI, MCP, and CLI surfaces and cross-checking of traces and scenarios.
  • Update agent authoring docs to position wardwright tools as the cold-start help surface and to describe symmetry expectations between UI, MCP, and CLI, plus new harness fidelity and verification behavior.
  • Introduce a Ralph Loop Supervisor doc tracking the read-before-edit continuation loops, findings, and the RALPH-RBE-002 follow-up about OpenCode import/resume equivalence.
docs/agent-control-debugger.md
docs/agent-authoring.md
docs/ralph-runs/ralph-loop-supervisor.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the "Ralph Loop Acceptance Criteria" to ensure functional symmetry across UI, MCP, and CLI surfaces, supported by a detailed pilot run report for tool misuse scenarios. Key technical changes include the implementation of a mapping between counterfactual examples and default workbench patterns, updates to the Lustre Control Debugger model to synchronize these patterns during example selection and transcript loading, and the addition of corresponding regression tests. Feedback was provided regarding the need to consistently update the pattern ID in both the success and failure branches of the transcript loading logic to ensure the UI state remains synchronized.

Comment on lines +315 to +317
pattern_id: external_default_pattern_id_for_example(
model.example_id,
),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pattern_id is explicitly updated here when the transcript loads successfully, but it is not updated in the failure branch (line 327). While ExampleChanged handles the initial sync, if the intent of this fix is to ensure the pattern picker is reset to the example's default after a recording run, it should be applied consistently regardless of whether the transcript loading succeeded, as long as the recording itself was successful.

@bglusman bglusman marked this pull request as ready for review May 22, 2026 21:19
Copilot AI review requested due to automatic review settings May 22, 2026 21:19
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a first “Ralph UI-loop” pilot report for the read-before-edit tool-misuse scenario while fixing a Control Debugger UX mismatch where recording/selecting certain counterfactual examples defaulted to the wrong Workbench pattern. This helps keep Control Debugger evidence capture aligned with the correct policy/workbench slice (tool governance vs output contract review), and documents expectations for UI/MCP/CLI symmetry in future runs.

Changes:

  • Document a full Ralph run (steps, evidence, screenshots list, and follow-ups) under docs/ralph-runs/2026-05-22-read-before-edit-tool-misuse/.
  • Update the Lustre Control Debugger to pick a default Workbench pattern based on the selected/recorded counterfactual example.
  • Add regression tests asserting the example→pattern mapping (read-before-edit → tool-governance, output-contract → ambiguous-success).

Reviewed changes

Copilot reviewed 13 out of 24 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
docs/ralph-runs/2026-05-22-read-before-edit-tool-misuse/ui-pilot-final-evidence.txt Captures final UI evidence text for the pilot after the mapping fix.
docs/ralph-runs/2026-05-22-read-before-edit-tool-misuse/ui-pilot-before-change-evidence.json Structured “before change” evidence index + text samples for screenshots.
docs/ralph-runs/2026-05-22-read-before-edit-tool-misuse/ui-friction-default-pattern-before-change.txt Evidence of the prior default-pattern mismatch in the UI.
docs/ralph-runs/2026-05-22-read-before-edit-tool-misuse/README.md Narrative pilot report: scenario, repro path, observed failure, and outcomes.
docs/ralph-runs/2026-05-22-read-before-edit-tool-misuse/followups.yml Follow-up work items derived from the pilot (UI clarity, capability gaps, reliability, test gap).
docs/ralph-runs/2026-05-22-read-before-edit-tool-misuse/evidence.md Run commands, URLs, screenshots list, receipts/transcripts IDs, and validation results.
docs/agent-control-debugger.md Adds Ralph loop acceptance criteria focused on UI/MCP/CLI parity and “cold start” usability.
docs/agent-authoring.md Documents wardwright tools as the cold-start entrypoint and reiterates UI/MCP/CLI symmetry expectations.
app/test/workbench_test.exs Adds a regression test that exercises the example→pattern targeting behavior.
app/src/wardwright/lustre_control_debugger.gleam Uses an example-aware default-pattern function during init, example changes, and example recording.
app/src/wardwright/lustre_control_debugger_test_support.gleam Adds test helpers to assert the selected Workbench pattern after recording examples.
app/lib/wardwright/cli.ex Clarifies CLI help text expectations for wardwright tools as an agent-facing discovery surface.
app/lib/wardwright_web/control_debugger_data.ex Introduces default_pattern_id_for_example/1 mapping counterfactual examples to the appropriate Workbench pattern id.

Comment on lines +25 to +27
def default_pattern_id_for_example("read-before-edit"), do: "tool-governance"
def default_pattern_id_for_example("output-contract"), do: "ambiguous-success"
def default_pattern_id_for_example(_example_id), do: default_pattern_id()
@bglusman
Copy link
Copy Markdown
Owner Author

bglusman commented May 22, 2026

Ralph loop continuation update:

  • Consolidated loops 1-4 onto this PR branch (codex/ralph-ui-loop-pilot-1), so there is now one branch and PR stream instead of one branch per loop.
  • Deleted remote follow-up branches codex/ralph-loop-followups-1 and codex/ralph-loop-followups-2 after fast-forwarding their commits here.
  • Replaced the noisy per-run evidence directories with docs/ralph-runs/ralph-loop-supervisor.md, a compact tracker with the loop budget, validation notes, and open/closed followups.
  • Local validation and push hooks ran full mise check; GitHub checks are now running on the consolidated PR branch.

Adversarial review: the main tradeoff is losing raw screenshots and JSON captures from earlier Ralph loops. I think that is the right call: the artifacts were large and review-hostile, while the retained tracker plus commits/tests preserve the actionable evidence. Open risk remains RALPH-RBE-002: OpenCode import/resume still should not be claimed as equivalent live-agent resume until a real state-fidelity trial proves it.

@bglusman
Copy link
Copy Markdown
Owner Author

Ralph loop 5 update:

  • Added machine-readable harness resume status so OpenCode export is clearly unverified_best_effort_handoff, not equivalent native resume.
  • Added state_fidelity_verification metadata and checklist to adapter payloads, MCP/API export responses, and Control Debugger export facts.
  • Updated tests across Gleam contract, adapter exports, and MCP authoring surfaces.
  • Updated docs/ralph-runs/ralph-loop-supervisor.md; 8 continuation cycles remain.

Validation:

  • mise run check:docs
  • cd app && MIX_ENV=test mise exec -- mix test (393 passed, 21 properties, 6 excluded)
  • Pre-push mise check passed locally.

Adversarial review: this does not close RALPH-RBE-002. It deliberately makes that proof gap harder to miss by exposing the verification requirement and next-step checklist wherever an adapter/export is consumed.

@bglusman
Copy link
Copy Markdown
Owner Author

Ralph loop 6 update:

  • Added state_fidelity_probe to harness exports so the next OpenCode import/resume trial has concrete trace and tool-result fingerprints to compare, not only prose instructions.
  • Saved harness exports now include a private wardwright-state-fidelity-probe.json sidecar for both OpenCode session JSON and prompt handoff adapters.
  • Updated direct export and MCP tests to assert the probe contract.
  • Fixed RouterCase transcript-store isolation after the push gate caught duplicate full-session transcript events from persistent state leaking across Mix invocations.
  • Updated docs/ralph-runs/ralph-loop-supervisor.md; 7 continuation cycles remain.

Validation:

  • cd app && MIX_ENV=test mise exec -- mix test (393 passed, 21 properties, 6 excluded)
  • mise run check:docs
  • Full mise check passed locally and in the pre-push gate.

Adversarial review: this still does not close RALPH-RBE-002. The probe makes the future state-fidelity trial falsifiable, and the RouterCase isolation fix removes test pollution without changing production transcript behavior.

@bglusman
Copy link
Copy Markdown
Owner Author

Ralph loop 7 update:

  • Added verify_harness_state_fidelity as a protected API and MCP tool for comparing an exported state_fidelity_probe with observed imported-harness state.
  • The verifier checks trace fingerprint, tool-result fingerprints, and read-before-edit cursor identification.
  • Tool-result comparison is multiset-based, so duplicated tool evidence cannot pass with only one observed copy.
  • The result keeps equivalent_agent_resume_claim_allowed=false even on a probe match, because this still does not prove hidden/private harness state or workspace snapshot fidelity.
  • Updated the authoring tool catalog, debugger docs, and compact loop tracker; 6 continuation cycles remain.

Validation:

  • cd app && MIX_ENV=test mise exec -- mix test (395 passed, 21 properties, 6 excluded)
  • mise run check:docs
  • mise run check:maps
  • Full mise check passed locally and in the pre-push gate.

Adversarial review: this moves RALPH-RBE-002 from an exported probe to a falsifiable verification step, but it intentionally does not close RBE-002. I also caught and fixed a stale docs anchor before push.

@bglusman
Copy link
Copy Markdown
Owner Author

Ralph loop 8 update:

  • Clarified the Control Debugger harness handoff facts so the operator sees the saved handoff artifact, state-fidelity probe file, handoff command, and verify_harness_state_fidelity follow-up from the same UI flow.
  • Kept the wording adapter-neutral (Handoff artifact / Handoff command) so prompt-only adapters are not misrepresented as native imports.
  • Added CLI help and JSON registry coverage for verify_harness_state_fidelity so agents discover the verifier from wardwright tools as well as MCP/API metadata.
  • Updated the compact loop tracker; 5 continuation cycles remain.

Validation:

  • cd app && gleam format --check src
  • cd app && gleam check --target erlang
  • cd app && mix format --check-formatted
  • cd app && mix test (395 passed, 21 properties, 6 excluded)
  • Full mise check passed locally and again in the pre-push gate.

Adversarial review: this does not close RALPH-RBE-002. It only removes a usability gap between export and verification. I caught and fixed one wording issue before push: the first draft said Import artifact, which was too strong for lower-fidelity prompt handoffs.

@bglusman
Copy link
Copy Markdown
Owner Author

Focused RBE-002 loop update pushed in 6e5a386.

Result: OpenCode import/fork works as best-effort evidence handoff, but does not preserve Wardwright trace/tool results as native OpenCode tool state.

What I verified:

  • current OpenCode config is forge/forge backed by gemma4:26b-a4b-it-q4_K_M;
  • opencode import created imported session ses_wwQj1zkBdHxnvqJh4n8JSODMku;
  • OpenCode DB stores the imported trace as text/step parts, not native tool-call/tool-result parts;
  • opencode run --session ... --fork --model forge/forge created a fork and correctly concluded that edit_file occurred before any read_file of README.md;
  • verify_state_fidelity correctly reports probe_mismatch for the actual observed imported state because trace/tool-result fingerprints are not exposed by the import.

Conclusion: keep PR #70's conservative fidelity warnings. The handoff is useful for human/model continuation, but it is not equivalent agent resume.

Local validation before push: full mise check passed.

@bglusman
Copy link
Copy Markdown
Owner Author

@sourcery-ai review

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The new verify_state_fidelity/2 path in AgentHarnessAdapters is doing several distinct things (event extraction, fingerprinting, comparison, status assembly); consider extracting the event-normalization and diffing logic into smaller private helpers or a separate module to keep the main verifier easier to follow and reuse.
  • ControlDebuggerTools now reimplements several concepts already present in ControlDebuggerData (event labels, details, fork recommendations, suggested cursors); it may be worth centralizing this summarization logic in one place to avoid future drift between UI and MCP/API behavior.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `verify_state_fidelity/2` path in `AgentHarnessAdapters` is doing several distinct things (event extraction, fingerprinting, comparison, status assembly); consider extracting the event-normalization and diffing logic into smaller private helpers or a separate module to keep the main verifier easier to follow and reuse.
- `ControlDebuggerTools` now reimplements several concepts already present in `ControlDebuggerData` (event labels, details, fork recommendations, suggested cursors); it may be worth centralizing this summarization logic in one place to avoid future drift between UI and MCP/API behavior.

## Individual Comments

### Comment 1
<location path="app/lib/wardwright_web/control_debugger_tools.ex" line_range="318" />
<code_context>
+
+  defp event_detail(_event), do: "recorded event"
+
+  defp compact_json(value), do: value |> JSON.encode!() |> String.slice(0, 180)
+
+  defp overlay_for_example(example_id) do
</code_context>
<issue_to_address>
**issue (bug_risk):** Consider making compact_json resilient to JSON encoding failures

compact_json/1 calls JSON.encode!/1 directly, so any non-encodable value in an event (e.g., structs or unknown types) will raise and crash the endpoint. Since these values come from runtime traces, it’d be safer to follow the safe_json pattern (encode with a rescue that falls back to inspect/1) so we return a structured error instead of failing the request.
</issue_to_address>

### Comment 2
<location path="app/lib/wardwright_web/control_debugger_tools.ex" line_range="199" />
<code_context>
+    end
+  end
+
+  defp selected_trace_events(_events, _cursor), do: {:error, "session_id and trace_cursor are required"}
+
+  defp scenario_from_trace(pattern_id, session_id, cursor, selected, events, params) do
</code_context>
<issue_to_address>
**nitpick:** Error message in selected_trace_events mentions session_id even though it only receives events and cursor

Since this function only takes `events` and `cursor`, the current `{:error, "session_id and trace_cursor are required"}` message can be misleading when it bubbles up via `save_evidence`. The actual precondition here is just a non-blank cursor. Please update the message to something like `"trace_cursor is required"` and let higher-level code add any session_id context if needed.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.


defp event_detail(_event), do: "recorded event"

defp compact_json(value), do: value |> JSON.encode!() |> String.slice(0, 180)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Consider making compact_json resilient to JSON encoding failures

compact_json/1 calls JSON.encode!/1 directly, so any non-encodable value in an event (e.g., structs or unknown types) will raise and crash the endpoint. Since these values come from runtime traces, it’d be safer to follow the safe_json pattern (encode with a rescue that falls back to inspect/1) so we return a structured error instead of failing the request.

end
end

defp selected_trace_events(_events, _cursor), do: {:error, "session_id and trace_cursor are required"}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Error message in selected_trace_events mentions session_id even though it only receives events and cursor

Since this function only takes events and cursor, the current {:error, "session_id and trace_cursor are required"} message can be misleading when it bubbles up via save_evidence. The actual precondition here is just a non-blank cursor. Please update the message to something like "trace_cursor is required" and let higher-level code add any session_id context if needed.

@bglusman
Copy link
Copy Markdown
Owner Author

UI follow-up pushed in 908a469. This applies the blind design-review pass directly to this PR: clearer example-vs-live-model framing, simpler labels, a concise latest-run summary, technical rule names hidden behind details, and updated browser smoke coverage for the renamed registered/live model surface. Validation: mise check passed locally and in the push hook.

@bglusman
Copy link
Copy Markdown
Owner Author

Updated the Ralph/Lustre branch through 58e14d5. The legacy /policies LiveView workbench code and assets are removed; /policies now only redirects to /admin, preserving model links. The Lustre admin now carries the example model library, and the browser smoke target is the Lustre workbench. Local full mise check and pre-push checks passed; GitHub checks are still running.

@bglusman bglusman merged commit d8944a2 into main May 24, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants