Spike Pi and OMP replay exports#71
Conversation
Reviewer's GuideRefactors the Wardwright agent harness to treat Pi and oh-my-pi/omp as first-class native replay adapters (JSONL sessions and OMP replay bundles) alongside a new OpenCode plugin-bundle spike, adds corresponding export/write logic and saved-command handling, and documents the adapter policy and Pi/omp replay spike behavior, with tests asserting artifact shapes and state-fidelity behavior. Flow diagram for new Pi/OMP/OpenCode plugin export pathsflowchart TD
A[export session_id, adapter_id] --> B{adapter_id}
B -->|opencode| C[opencode_export -> artifact_format opencode_session_json]
B -->|opencode-plugin| D[opencode_plugin_export -> artifact_format opencode_plugin_bundle]
B -->|pi| E[pi_session_export -> artifact_format pi_session_jsonl]
B -->|oh-my-pi| F[oh_my_pi_export -> artifact_format oh_my_pi_replay_bundle]
B -->|claude/codex| G[handoff_export -> artifact_format prompt_handoff]
C --> H[write_export]
D --> H
E --> H
F --> H
G --> H
H --> I{artifact_format}
I -->|opencode_session_json| J[write_json_artifact + write_probe_artifact]
I -->|pi_session_jsonl| K[write_text_artifact + write_probe_artifact]
I -->|oh_my_pi_replay_bundle| L[write_prompt_handoff_files + write_probe_artifact]
I -->|opencode_plugin_bundle| M[write_prompt_handoff_files + write_probe_artifact]
I -->|prompt_handoff| L
J --> N[saved_export_commands -> opencode import/fork]
K --> O[saved_export_commands -> npx pi_coding_agent --session/--fork]
L --> P[saved_export_commands -> omp rules/extensions + omp --session/--fork]
M --> Q[saved_export_commands -> opencode plugins setup + import]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The Pi/OMP export paths duplicate several constants and command fragments (e.g.,
@pi_package,wardwright-read-before-edit.md,wardwright-state-fidelity.ts) in both export construction andsaved_export_commands; consider centralizing these names/command templates so future changes don’t drift between the initial export and the saved-commands reconstruction. - The new
pi_session_jsonland associatedpi_event_entrieslogic encode a specific Pi session schema (e.g., version, message shapes, usage fields) inline; it may be worth extracting a small helper/module that encapsulates the Pi session format so future Pi version changes or schema tweaks are localized rather than spread across this module.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The Pi/OMP export paths duplicate several constants and command fragments (e.g., `@pi_package`, `wardwright-read-before-edit.md`, `wardwright-state-fidelity.ts`) in both export construction and `saved_export_commands`; consider centralizing these names/command templates so future changes don’t drift between the initial export and the saved-commands reconstruction.
- The new `pi_session_jsonl` and associated `pi_event_entries` logic encode a specific Pi session schema (e.g., version, message shapes, usage fields) inline; it may be worth extracting a small helper/module that encapsulates the Pi session format so future Pi version changes or schema tweaks are localized rather than spread across this module.
## Individual Comments
### Comment 1
<location path="app/lib/wardwright_web/agent_harness_adapters.ex" line_range="753-767" />
<code_context>
+ }
+ ]
+
+ {_parent_id, event_entries} =
+ Enum.reduce(
+ Enum.with_index(events, 1),
+ {pi_entry_id(session_id, "trace-summary"), []},
+ fn {event, index}, {parent_id, acc} ->
+ {next_parent, entries} =
+ pi_event_entries(session_id, event, index, parent_id, now, now_ms)
</code_context>
<issue_to_address>
**suggestion (performance):** Event entry accumulation uses `acc ++ entries`, which is quadratic in the number of events.
The `Enum.reduce/3` in `pi_session_jsonl/5` appends with `acc ++ entries` for each event, giving O(n²) behavior on large traces. Prefer pre-pending and fixing order once at the end, e.g. accumulate as `[entries | acc]` (or `[result | [assistant | acc]]` in `pi_event_entries/…`) and then call `List.flatten/1` or `Enum.reverse/1` at the end for better scalability.
```suggestion
{_parent_id, event_entry_chunks} =
Enum.reduce(
Enum.with_index(events, 1),
{pi_entry_id(session_id, "trace-summary"), []},
fn {event, index}, {parent_id, acc} ->
{next_parent, entries} =
pi_event_entries(session_id, event, index, parent_id, now, now_ms)
{next_parent, [entries | acc]}
end
)
event_entries =
event_entry_chunks
|> Enum.reverse()
|> List.flatten()
[header | entries ++ event_entries]
|> Enum.map_join("\n", &JSON.encode!/1)
|> Kernel.<>("\n")
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| {_parent_id, event_entries} = | ||
| Enum.reduce( | ||
| Enum.with_index(events, 1), | ||
| {pi_entry_id(session_id, "trace-summary"), []}, | ||
| fn {event, index}, {parent_id, acc} -> | ||
| {next_parent, entries} = | ||
| pi_event_entries(session_id, event, index, parent_id, now, now_ms) | ||
|
|
||
| {next_parent, acc ++ entries} | ||
| end | ||
| ) | ||
|
|
||
| [header | entries ++ event_entries] | ||
| |> Enum.map_join("\n", &JSON.encode!/1) | ||
| |> Kernel.<>("\n") |
There was a problem hiding this comment.
suggestion (performance): Event entry accumulation uses acc ++ entries, which is quadratic in the number of events.
The Enum.reduce/3 in pi_session_jsonl/5 appends with acc ++ entries for each event, giving O(n²) behavior on large traces. Prefer pre-pending and fixing order once at the end, e.g. accumulate as [entries | acc] (or [result | [assistant | acc]] in pi_event_entries/…) and then call List.flatten/1 or Enum.reverse/1 at the end for better scalability.
| {_parent_id, event_entries} = | |
| Enum.reduce( | |
| Enum.with_index(events, 1), | |
| {pi_entry_id(session_id, "trace-summary"), []}, | |
| fn {event, index}, {parent_id, acc} -> | |
| {next_parent, entries} = | |
| pi_event_entries(session_id, event, index, parent_id, now, now_ms) | |
| {next_parent, acc ++ entries} | |
| end | |
| ) | |
| [header | entries ++ event_entries] | |
| |> Enum.map_join("\n", &JSON.encode!/1) | |
| |> Kernel.<>("\n") | |
| {_parent_id, event_entry_chunks} = | |
| Enum.reduce( | |
| Enum.with_index(events, 1), | |
| {pi_entry_id(session_id, "trace-summary"), []}, | |
| fn {event, index}, {parent_id, acc} -> | |
| {next_parent, entries} = | |
| pi_event_entries(session_id, event, index, parent_id, now, now_ms) | |
| {next_parent, [entries | acc]} | |
| end | |
| ) | |
| event_entries = | |
| event_entry_chunks | |
| |> Enum.reverse() | |
| |> List.flatten() | |
| [header | entries ++ event_entries] | |
| |> Enum.map_join("\n", &JSON.encode!/1) | |
| |> Kernel.<>("\n") |
There was a problem hiding this comment.
Pull request overview
This PR expands Wardwright’s agent-harness export surface to support separate Pi and oh-my-pi/omp replay artifacts (including an exported TTSR rule + extension scaffold), and adds an OpenCode plugin-bundle spike—while documenting the resulting policy/fidelity questions and updating tests/contracts to match.
Changes:
- Split prior Pi/omp handling into distinct adapters: native Pi JSONL session export, and an omp replay bundle that includes a TTSR rule + extension scaffold.
- Add an OpenCode “plugin bundle” export path (session JSON + plugin scaffold) while keeping fidelity claims explicitly best-effort.
- Update the adapter contract docs and expand test coverage for the new adapter IDs, artifact formats, saved files, and commands.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| docs/ralph-runs/pi-oh-my-pi-replay-spike.md | New spike doc describing Pi/omp/OpenCode adapter results and adapter policy questions. |
| contracts/agent-harness-adapter-contract.md | Updates adapter table + open questions to reflect the new Pi, omp, and OpenCode plugin spike adapters. |
| app/test/agent_harness_adapters_test.exs | Adds/updates tests for new adapter IDs, export artifacts, saved files, and commands. |
| app/lib/wardwright_web/agent_harness_adapters.ex | Implements new adapter entries and export/write paths for Pi JSONL, omp replay bundle, and OpenCode plugin bundle. |
| {_parent_id, event_entries} = | ||
| Enum.reduce( | ||
| Enum.with_index(events, 1), | ||
| {pi_entry_id(session_id, "trace-summary"), []}, | ||
| fn {event, index}, {parent_id, acc} -> | ||
| {next_parent, entries} = | ||
| pi_event_entries(session_id, event, index, parent_id, now, now_ms) | ||
|
|
||
| {next_parent, acc ++ entries} | ||
| end |
There was a problem hiding this comment.
Code Review
This pull request introduces new agent harness adapters for Pi, oh-my-pi, and an OpenCode plugin spike, enabling native session exports and support for Trace-Triggered Step Rules (TTSR). Key additions include the implementation of Pi-compatible JSONL session generation, automated rule and extension scaffolding for oh-my-pi, and updated documentation regarding adapter fidelity. Feedback from the review highlights an opportunity to optimize list construction in the Pi export logic to avoid O(N^2) complexity, the need to dynamically select the correct executable name for oh-my-pi commands based on system availability, and a recommendation to add error handling for file operations within the state fidelity TypeScript extension.
| {_parent_id, event_entries} = | ||
| Enum.reduce( | ||
| Enum.with_index(events, 1), | ||
| {pi_entry_id(session_id, "trace-summary"), []}, | ||
| fn {event, index}, {parent_id, acc} -> | ||
| {next_parent, entries} = | ||
| pi_event_entries(session_id, event, index, parent_id, now, now_ms) | ||
|
|
||
| {next_parent, acc ++ entries} | ||
| end | ||
| ) | ||
|
|
||
| [header | entries ++ event_entries] |
There was a problem hiding this comment.
The current implementation uses acc ++ entries inside Enum.reduce, which results in
{_parent_id, event_entries} =
Enum.reduce(
Enum.with_index(events, 1),
{pi_entry_id(session_id, "trace-summary"), []},
fn {event, index}, {parent_id, acc} ->
{next_parent, entries} =
pi_event_entries(session_id, event, index, parent_id, now, now_ms)
{next_parent, [entries | acc]}
end
)
[header | entries ++ (event_entries |> Enum.reverse() |> List.flatten())]
| "omp --session #{shell_quote(session_path)}", | ||
| "omp --fork #{shell_quote(session_path)} \"Continue from the Wardwright trace cursor you want to investigate.\"" |
There was a problem hiding this comment.
The commands are hardcoded to use omp, but the adapter detection logic (line 61) allows for either omp or oh-my-pi. If only oh-my-pi is available on the system, these commands will fail. Consider using the detected executable name.
"#{if installed?("omp"), do: "omp", else: "oh-my-pi"} --session #{shell_quote(session_path)}",
"#{if installed?("omp"), do: "omp", else: "oh-my-pi"} --fork #{shell_quote(session_path)} \"Continue from the Wardwright trace cursor you want to investigate.\""
| async execute(_toolCallId, params, _signal, _onUpdate, _ctx) { | ||
| const probe = JSON.parse(readFileSync(params.probePath, "utf8")); | ||
| const observed = params.observed ?? {}; | ||
| return { | ||
| content: [{ type: "text", text: JSON.stringify({ | ||
| schema: "wardwright.pi_state_fidelity_verification_spike.v0", | ||
| adapter_id: probe.adapter_id, | ||
| trace_fingerprint_matches: probe.trace_fingerprint === observed.trace_fingerprint, | ||
| observed_digest: digest(observed), | ||
| equivalent_agent_resume_claim_allowed: false, | ||
| }) }], | ||
| details: {}, | ||
| }; | ||
| }, |
There was a problem hiding this comment.
The execute function in the extension lacks error handling for file system and JSON parsing operations. If the file is missing or malformed, it will throw an unhandled exception. Adding a try-catch block would make the tool more resilient.
async execute(_toolCallId, params, _signal, _onUpdate, _ctx) {
try {
const probe = JSON.parse(readFileSync(params.probePath, "utf8"));
const observed = params.observed ?? {};
return {
content: [{ type: "text", text: JSON.stringify({
schema: "wardwright.pi_state_fidelity_verification_spike.v0",
adapter_id: probe.adapter_id,
trace_fingerprint_matches: probe.trace_fingerprint === observed.trace_fingerprint,
observed_digest: digest(observed),
equivalent_agent_resume_claim_allowed: false,
}) }],
details: {},
};
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
return {
content: [{ type: "text", text: `Error verifying fidelity: ${message}` }],
details: { error },
};
}
},
Summary
Validation
MIX_ENV=test mise exec -- mix testfromapp/: 360 passed, 6 excludedmise run check:docsmise check: app, docs, maps, style, types, browser smoke@earendil-works/pi-coding-agent@0.75.5 --exportomp/15.2.4 --exportReview notes
Post-commit review found and fixed one OMP TTSR correctness issue: OMP matches
conditionagainst streamed content/tool arguments and usesscopefor tool names, so the exported rule now uses a broad condition and tool-scoped edit/write/patch targets.Remaining risk: tests assert exported artifact shape and full Wardwright behavior, but the next useful slice is a true OMP runtime equivalence test proving the exported TTSR rule fires on edit/write tool streams with the same failure classification as Wardwright replay.
Summary by Sourcery
Add native Pi session and oh-my-pi/omp replay-bundle adapters alongside an OpenCode plugin-bundle spike, and extend the harness export pipeline to generate corresponding artifacts, commands, and state-fidelity scaffolding.
New Features:
Enhancements:
Tests: