Skip to content

Spike Pi and OMP replay exports#71

Merged
bglusman merged 62 commits into
codex/ralph-ui-loop-pilot-1from
codex/pi-replay-spike
May 24, 2026
Merged

Spike Pi and OMP replay exports#71
bglusman merged 62 commits into
codex/ralph-ui-loop-pilot-1from
codex/pi-replay-spike

Conversation

@bglusman
Copy link
Copy Markdown
Owner

@bglusman bglusman commented May 23, 2026

Summary

  • split the prior Pi/oh-my-pi prompt handoff into native Pi JSONL and OMP replay-bundle adapters
  • add an OMP TTSR rule and extension scaffold so Wardwright can compare the same read-before-edit rule in replay and agent-runtime placement
  • add an OpenCode plugin-bundle spike while keeping OpenCode fidelity claims conservative
  • document adapter policy questions around gateway defaults, adapter identity, graceful degradation, and install/doctor/repair surfaces

Validation

  • MIX_ENV=test mise exec -- mix test from app/: 360 passed, 6 excluded
  • mise run check:docs
  • pre-commit hooks: app checks, docs, gitleaks
  • pre-push mise check: app, docs, maps, style, types, browser smoke
  • live Pi export smoke with @earendil-works/pi-coding-agent@0.75.5 --export
  • live OMP export smoke with downloaded omp/15.2.4 --export
  • isolated OpenCode import/export smoke confirmed imported plugin path still has zero native tool parts

Review notes

Post-commit review found and fixed one OMP TTSR correctness issue: OMP matches condition against streamed content/tool arguments and uses scope for tool names, so the exported rule now uses a broad condition and tool-scoped edit/write/patch targets.

Remaining risk: tests assert exported artifact shape and full Wardwright behavior, but the next useful slice is a true OMP runtime equivalence test proving the exported TTSR rule fires on edit/write tool streams with the same failure classification as Wardwright replay.

Summary by Sourcery

Add native Pi session and oh-my-pi/omp replay-bundle adapters alongside an OpenCode plugin-bundle spike, and extend the harness export pipeline to generate corresponding artifacts, commands, and state-fidelity scaffolding.

New Features:

  • Expose separate Pi, oh-my-pi/omp, and OpenCode plugin adapters with explicit capabilities and fidelity claims in the harness adapter list.
  • Support exporting Wardwright sessions as native Pi JSONL, omp replay bundles with TTSR rule and extension files, and OpenCode plugin bundles.
  • Persist new text-based artifacts and reconstruct saved export commands for the new adapter types so users can resume or fork sessions from saved files.

Enhancements:

  • Tighten state-fidelity probing and verification around tool-result fingerprints and read-before-edit cursor identification, including in omp TTSR exports.
  • Clarify and extend the agent harness adapter contract and add spike documentation for Pi/omp replay, OpenCode plugins, and adapter policy questions.
  • Reformat and slightly restructure existing code and tests to accommodate the new adapter types and artifact formats.

Tests:

  • Add and update adapter tests to cover the new Pi, omp, and OpenCode plugin adapters, their exported artifact shapes, saved files, commands, and state-fidelity behavior.

Copilot AI review requested due to automatic review settings May 23, 2026 18:11
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 23, 2026

Reviewer's Guide

Refactors the Wardwright agent harness to treat Pi and oh-my-pi/omp as first-class native replay adapters (JSONL sessions and OMP replay bundles) alongside a new OpenCode plugin-bundle spike, adds corresponding export/write logic and saved-command handling, and documents the adapter policy and Pi/omp replay spike behavior, with tests asserting artifact shapes and state-fidelity behavior.

Flow diagram for new Pi/OMP/OpenCode plugin export paths

flowchart TD
  A[export session_id, adapter_id] --> B{adapter_id}

  B -->|opencode| C[opencode_export -> artifact_format opencode_session_json]
  B -->|opencode-plugin| D[opencode_plugin_export -> artifact_format opencode_plugin_bundle]
  B -->|pi| E[pi_session_export -> artifact_format pi_session_jsonl]
  B -->|oh-my-pi| F[oh_my_pi_export -> artifact_format oh_my_pi_replay_bundle]
  B -->|claude/codex| G[handoff_export -> artifact_format prompt_handoff]

  C --> H[write_export]
  D --> H
  E --> H
  F --> H
  G --> H

  H --> I{artifact_format}

  I -->|opencode_session_json| J[write_json_artifact + write_probe_artifact]
  I -->|pi_session_jsonl| K[write_text_artifact + write_probe_artifact]
  I -->|oh_my_pi_replay_bundle| L[write_prompt_handoff_files + write_probe_artifact]
  I -->|opencode_plugin_bundle| M[write_prompt_handoff_files + write_probe_artifact]
  I -->|prompt_handoff| L

  J --> N[saved_export_commands -> opencode import/fork]
  K --> O[saved_export_commands -> npx pi_coding_agent --session/--fork]
  L --> P[saved_export_commands -> omp rules/extensions + omp --session/--fork]
  M --> Q[saved_export_commands -> opencode plugins setup + import]
Loading

File-Level Changes

Change Details Files
Promote Pi from prompt-handoff to native session import with JSONL export and dedicated commands, and split oh-my-pi/omp into its own replay-bundle adapter.
  • Introduce separate adapter entries for Pi and oh-my-pi/omp in the adapter list, with Pi marked as session_import_best_effort and native_tool_results enabled.
  • Add pi_session_export that builds a Pi v3 session JSONL file containing session header, imported Wardwright trace context, synthetic assistant tool calls, and toolResult entries with cursor/fingerprint metadata, plus npx-based session/fork commands.
  • Add oh_my_pi_export that packages the Pi session JSONL together with a TTSR rule file and a state-fidelity extension file into an omp replay bundle, with commands for installing rule/extension into .omp and running omp session/fork.
  • Extend write_export and saved_export_commands to handle pi_session_jsonl and oh_my_pi_replay_bundle formats, writing text artifacts and generating path-aware commands for persisted exports.
  • Add helper functions for Pi session IDs, filenames, entry IDs, and event-to-message translation, including synthetic assistant/toolResult messages and non-tool custom_message entries.
app/lib/wardwright_web/agent_harness_adapters.ex
app/test/agent_harness_adapters_test.exs
contracts/agent-harness-adapter-contract.md
docs/ralph-runs/pi-oh-my-pi-replay-spike.md
Add an OpenCode plugin-bundle export mode that wraps the existing OpenCode session JSON artifact with a plugin scaffold for replay metadata, and adjust persistence/commands accordingly.
  • Introduce an opencode-plugin adapter with capabilities mirroring OpenCode but explicitly without native_tool_results.
  • Implement opencode_plugin_export which reuses the OpenCode export, wraps it into a files-plus-session_file artifact, adds a wardwright-state-fidelity.ts plugin scaffold, and defines commands to install the plugin and run OpenCode with fork from the imported session.
  • Update write_export and saved_export_commands to handle opencode_plugin_bundle artifact_format and to generate commands that reference the stored session and plugin paths.
  • Provide opencode_state_fidelity_plugin implementation that injects replay reminders into session compaction and annotates edit/write tool executions with Wardwright replay metadata.
app/lib/wardwright_web/agent_harness_adapters.ex
app/test/agent_harness_adapters_test.exs
contracts/agent-harness-adapter-contract.md
Tighten state-fidelity behavior and OMP TTSR rule semantics, and ensure probe artifacts are written consistently for all export types.
  • Ensure all export branches (OpenCode, Pi, OMP replay bundle, OpenCode plugin bundle, prompt handoff) write a state-fidelity probe file via write_probe_artifact and return combined artifact/probe paths.
  • Refine the read_before_edit check aggregation so that the final pass/fail status depends on trace fingerprint equality, tool-result fingerprint coverage, and read_before_edit_cursor_identified.
  • Add an OMP TTSR rule definition that uses a broad condition over streamed content and a scoped set of edit/write/patch tools, with interruptMode set to always to guarantee enforcement on write-like operations.
  • Add a pi_state_fidelity_extension that defines a wardwright_verify_state_fidelity tool comparing an exported probe to observed replay state, returning a structured verification result without granting equivalent-resume claims.
app/lib/wardwright_web/agent_harness_adapters.ex
app/test/agent_harness_adapters_test.exs
docs/ralph-runs/pi-oh-my-pi-replay-spike.md
Expand tests and documentation to cover new adapters, export formats, and adapter policy questions.
  • Update adapter list tests to expect opencode, opencode-plugin, pi, and oh-my-pi entries, and to assert Pi’s fidelity, capabilities, and missing_fidelity fields.
  • Add tests verifying Pi export JSONL structure, presence of native toolResult entries with fingerprints/cursors, saved-file naming/modes, and regenerated commands for resumed sessions.
  • Add tests verifying OMP replay bundle content (session JSONL, rule, extension, probe), rule semantics (interruptMode and scoped tool entries), and that saved commands wire these files correctly into .omp and omp CLI invocations.
  • Add tests for opencode-plugin export shape, plugin content, and saved commands for installing the plugin when persisted.
  • Document the Pi/OMP replay spike and adapter policy questions, including rule placement tradeoffs, behavioral equivalence expectations between Wardwright replay and OMP runtime, and gateway behavior around adapter identity, defaults, and install/doctor/repair flows.
  • Update the agent harness adapter contract to describe the new Pi, oh-my-pi/omp, and OpenCode plugin spike fidelity claims and open questions about Pi/omp imports and adapter policy.
app/test/agent_harness_adapters_test.exs
contracts/agent-harness-adapter-contract.md
docs/ralph-runs/pi-oh-my-pi-replay-spike.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The Pi/OMP export paths duplicate several constants and command fragments (e.g., @pi_package, wardwright-read-before-edit.md, wardwright-state-fidelity.ts) in both export construction and saved_export_commands; consider centralizing these names/command templates so future changes don’t drift between the initial export and the saved-commands reconstruction.
  • The new pi_session_jsonl and associated pi_event_entries logic encode a specific Pi session schema (e.g., version, message shapes, usage fields) inline; it may be worth extracting a small helper/module that encapsulates the Pi session format so future Pi version changes or schema tweaks are localized rather than spread across this module.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The Pi/OMP export paths duplicate several constants and command fragments (e.g., `@pi_package`, `wardwright-read-before-edit.md`, `wardwright-state-fidelity.ts`) in both export construction and `saved_export_commands`; consider centralizing these names/command templates so future changes don’t drift between the initial export and the saved-commands reconstruction.
- The new `pi_session_jsonl` and associated `pi_event_entries` logic encode a specific Pi session schema (e.g., version, message shapes, usage fields) inline; it may be worth extracting a small helper/module that encapsulates the Pi session format so future Pi version changes or schema tweaks are localized rather than spread across this module.

## Individual Comments

### Comment 1
<location path="app/lib/wardwright_web/agent_harness_adapters.ex" line_range="753-767" />
<code_context>
+        }
+      ]
+
+    {_parent_id, event_entries} =
+      Enum.reduce(
+        Enum.with_index(events, 1),
+        {pi_entry_id(session_id, "trace-summary"), []},
+        fn {event, index}, {parent_id, acc} ->
+          {next_parent, entries} =
+            pi_event_entries(session_id, event, index, parent_id, now, now_ms)
</code_context>
<issue_to_address>
**suggestion (performance):** Event entry accumulation uses `acc ++ entries`, which is quadratic in the number of events.

The `Enum.reduce/3` in `pi_session_jsonl/5` appends with `acc ++ entries` for each event, giving O(n²) behavior on large traces. Prefer pre-pending and fixing order once at the end, e.g. accumulate as `[entries | acc]` (or `[result | [assistant | acc]]` in `pi_event_entries/…`) and then call `List.flatten/1` or `Enum.reverse/1` at the end for better scalability.

```suggestion
    {_parent_id, event_entry_chunks} =
      Enum.reduce(
        Enum.with_index(events, 1),
        {pi_entry_id(session_id, "trace-summary"), []},
        fn {event, index}, {parent_id, acc} ->
          {next_parent, entries} =
            pi_event_entries(session_id, event, index, parent_id, now, now_ms)

          {next_parent, [entries | acc]}
        end
      )

    event_entries =
      event_entry_chunks
      |> Enum.reverse()
      |> List.flatten()

    [header | entries ++ event_entries]
    |> Enum.map_join("\n", &JSON.encode!/1)
    |> Kernel.<>("\n")
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +753 to +767
{_parent_id, event_entries} =
Enum.reduce(
Enum.with_index(events, 1),
{pi_entry_id(session_id, "trace-summary"), []},
fn {event, index}, {parent_id, acc} ->
{next_parent, entries} =
pi_event_entries(session_id, event, index, parent_id, now, now_ms)

{next_parent, acc ++ entries}
end
)

[header | entries ++ event_entries]
|> Enum.map_join("\n", &JSON.encode!/1)
|> Kernel.<>("\n")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Event entry accumulation uses acc ++ entries, which is quadratic in the number of events.

The Enum.reduce/3 in pi_session_jsonl/5 appends with acc ++ entries for each event, giving O(n²) behavior on large traces. Prefer pre-pending and fixing order once at the end, e.g. accumulate as [entries | acc] (or [result | [assistant | acc]] in pi_event_entries/…) and then call List.flatten/1 or Enum.reverse/1 at the end for better scalability.

Suggested change
{_parent_id, event_entries} =
Enum.reduce(
Enum.with_index(events, 1),
{pi_entry_id(session_id, "trace-summary"), []},
fn {event, index}, {parent_id, acc} ->
{next_parent, entries} =
pi_event_entries(session_id, event, index, parent_id, now, now_ms)
{next_parent, acc ++ entries}
end
)
[header | entries ++ event_entries]
|> Enum.map_join("\n", &JSON.encode!/1)
|> Kernel.<>("\n")
{_parent_id, event_entry_chunks} =
Enum.reduce(
Enum.with_index(events, 1),
{pi_entry_id(session_id, "trace-summary"), []},
fn {event, index}, {parent_id, acc} ->
{next_parent, entries} =
pi_event_entries(session_id, event, index, parent_id, now, now_ms)
{next_parent, [entries | acc]}
end
)
event_entries =
event_entry_chunks
|> Enum.reverse()
|> List.flatten()
[header | entries ++ event_entries]
|> Enum.map_join("\n", &JSON.encode!/1)
|> Kernel.<>("\n")

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands Wardwright’s agent-harness export surface to support separate Pi and oh-my-pi/omp replay artifacts (including an exported TTSR rule + extension scaffold), and adds an OpenCode plugin-bundle spike—while documenting the resulting policy/fidelity questions and updating tests/contracts to match.

Changes:

  • Split prior Pi/omp handling into distinct adapters: native Pi JSONL session export, and an omp replay bundle that includes a TTSR rule + extension scaffold.
  • Add an OpenCode “plugin bundle” export path (session JSON + plugin scaffold) while keeping fidelity claims explicitly best-effort.
  • Update the adapter contract docs and expand test coverage for the new adapter IDs, artifact formats, saved files, and commands.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
docs/ralph-runs/pi-oh-my-pi-replay-spike.md New spike doc describing Pi/omp/OpenCode adapter results and adapter policy questions.
contracts/agent-harness-adapter-contract.md Updates adapter table + open questions to reflect the new Pi, omp, and OpenCode plugin spike adapters.
app/test/agent_harness_adapters_test.exs Adds/updates tests for new adapter IDs, export artifacts, saved files, and commands.
app/lib/wardwright_web/agent_harness_adapters.ex Implements new adapter entries and export/write paths for Pi JSONL, omp replay bundle, and OpenCode plugin bundle.

Comment on lines +753 to +762
{_parent_id, event_entries} =
Enum.reduce(
Enum.with_index(events, 1),
{pi_entry_id(session_id, "trace-summary"), []},
fn {event, index}, {parent_id, acc} ->
{next_parent, entries} =
pi_event_entries(session_id, event, index, parent_id, now, now_ms)

{next_parent, acc ++ entries}
end
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new agent harness adapters for Pi, oh-my-pi, and an OpenCode plugin spike, enabling native session exports and support for Trace-Triggered Step Rules (TTSR). Key additions include the implementation of Pi-compatible JSONL session generation, automated rule and extension scaffolding for oh-my-pi, and updated documentation regarding adapter fidelity. Feedback from the review highlights an opportunity to optimize list construction in the Pi export logic to avoid O(N^2) complexity, the need to dynamically select the correct executable name for oh-my-pi commands based on system availability, and a recommendation to add error handling for file operations within the state fidelity TypeScript extension.

Comment on lines +753 to +765
{_parent_id, event_entries} =
Enum.reduce(
Enum.with_index(events, 1),
{pi_entry_id(session_id, "trace-summary"), []},
fn {event, index}, {parent_id, acc} ->
{next_parent, entries} =
pi_event_entries(session_id, event, index, parent_id, now, now_ms)

{next_parent, acc ++ entries}
end
)

[header | entries ++ event_entries]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation uses acc ++ entries inside Enum.reduce, which results in $O(N^2)$ time complexity because the list is copied on every iteration. Prepending to the list and reversing it at the end is the idiomatic Elixir way to build lists efficiently.

    {_parent_id, event_entries} =
      Enum.reduce(
        Enum.with_index(events, 1),
        {pi_entry_id(session_id, "trace-summary"), []},
        fn {event, index}, {parent_id, acc} ->
          {next_parent, entries} =
            pi_event_entries(session_id, event, index, parent_id, now, now_ms)

          {next_parent, [entries | acc]}
        end
      )

    [header | entries ++ (event_entries |> Enum.reverse() |> List.flatten())]

Comment on lines +656 to +657
"omp --session #{shell_quote(session_path)}",
"omp --fork #{shell_quote(session_path)} \"Continue from the Wardwright trace cursor you want to investigate.\""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The commands are hardcoded to use omp, but the adapter detection logic (line 61) allows for either omp or oh-my-pi. If only oh-my-pi is available on the system, these commands will fail. Consider using the detected executable name.

      "#{if installed?("omp"), do: "omp", else: "oh-my-pi"} --session #{shell_quote(session_path)}",
      "#{if installed?("omp"), do: "omp", else: "oh-my-pi"} --fork #{shell_quote(session_path)} \"Continue from the Wardwright trace cursor you want to investigate.\""

Comment on lines +898 to +911
async execute(_toolCallId, params, _signal, _onUpdate, _ctx) {
const probe = JSON.parse(readFileSync(params.probePath, "utf8"));
const observed = params.observed ?? {};
return {
content: [{ type: "text", text: JSON.stringify({
schema: "wardwright.pi_state_fidelity_verification_spike.v0",
adapter_id: probe.adapter_id,
trace_fingerprint_matches: probe.trace_fingerprint === observed.trace_fingerprint,
observed_digest: digest(observed),
equivalent_agent_resume_claim_allowed: false,
}) }],
details: {},
};
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The execute function in the extension lacks error handling for file system and JSON parsing operations. If the file is missing or malformed, it will throw an unhandled exception. Adding a try-catch block would make the tool more resilient.

        async execute(_toolCallId, params, _signal, _onUpdate, _ctx) {
          try {
            const probe = JSON.parse(readFileSync(params.probePath, "utf8"));
            const observed = params.observed ?? {};
            return {
              content: [{ type: "text", text: JSON.stringify({
                schema: "wardwright.pi_state_fidelity_verification_spike.v0",
                adapter_id: probe.adapter_id,
                trace_fingerprint_matches: probe.trace_fingerprint === observed.trace_fingerprint,
                observed_digest: digest(observed),
                equivalent_agent_resume_claim_allowed: false,
              }) }],
              details: {},
            };
          } catch (error) {
            const message = error instanceof Error ? error.message : String(error);
            return {
              content: [{ type: "text", text: `Error verifying fidelity: ${message}` }],
              details: { error },
            };
          }
        },

@bglusman bglusman merged commit 4f20fc6 into codex/ralph-ui-loop-pilot-1 May 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants