Skip to content

fix: include tool_result in /messages + unblock CI format#117

Merged
emal-avala merged 2 commits intomainfrom
fix/e2e-l1-echo-prompt
Apr 15, 2026
Merged

fix: include tool_result in /messages + unblock CI format#117
emal-avala merged 2 commits intomainfrom
fix/e2e-l1-echo-prompt

Conversation

@emal-avala
Copy link
Copy Markdown
Member

@emal-avala emal-avala commented Apr 15, 2026

Summary

Closes the final e2e gap (was: 72/73 after earlier fixes) with a server-side fix rather than a test tweak, plus unblocks format CI on main.

1. /messages now includes tool_result content (main fix)

`handle_messages` in `serve.rs` previously extracted only `ContentBlock::Text` from user/assistant messages — all tool_use and tool_result content was stripped. That's a real UX bug, not just a test shim: any API consumer polling `/messages` to inspect what tools returned saw nothing, and every test in the suite that wanted to verify tool output had to resort to prompt-engineering the assistant to echo content verbatim.

Extended the user-message arm to also include `ContentBlock::ToolResult` content. The response shape is unchanged — tool output is concatenated into the same string field alongside any user text, separated by newlines.

Also reverts the L1 e2e prompt from the awkward "reply with ONLY the exact contents" back to a natural "tell me its contents", since the marker now lands in `/messages` via the tool_result block regardless of whether the assistant paraphrases.

2. Format fix for main

PR #116 (mascot revert) merged with two rustfmt violations that were blocking CI on every branch off main:

  • `crates/cli/src/ui/repl.rs` — last println! in the bot mascot block should be on one line
  • `crates/cli/src/ui/tui.rs` — double newline at EOF after sed-deleting the crab block

Both fixed mechanically to what `cargo fmt` produces.

Test plan

  • `bash -n` syntax check passes
  • Format fix verified by CI
  • Fresh e2e run → expect 73/73 (server fix + format both needed for A7 and L1)

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

The /messages endpoint (serve.rs:handle_messages) only extracts
ContentBlock::Text from user/assistant messages — tool_use and
tool_result content are stripped server-side. So checking whether
SHELL_MARKER_L1 appears in /messages only works if the assistant
echoes the file content verbatim in its text reply.

Previously the prompt was "tell me its contents", which small
models like gpt-5-nano/mini would paraphrase ("the file contains a
marker line") instead of echoing. Tighten the prompt to force a
verbatim single-line echo with no extra words.

Also update the failure message to reflect the real cause
(paraphrasing) rather than the red herring of "not found in
response".
@emal-avala emal-avala force-pushed the fix/e2e-l1-echo-prompt branch from fecaa2b to f22b09e Compare April 15, 2026 07:20
Two formatting issues slipped through on PR #116 and are breaking CI
on every branch off main:

- crates/cli/src/ui/repl.rs:412-416 — the last println! in the bot
  mascot block fits on one line; rustfmt collapses it.
- crates/cli/src/ui/tui.rs — sed left a double newline at EOF after
  removing the crab block. rustfmt wants a single trailing newline.

Both fixes are mechanical and match what cargo fmt would produce.
@emal-avala emal-avala merged commit 40ed76b into main Apr 15, 2026
13 of 14 checks passed
@emal-avala emal-avala deleted the fix/e2e-l1-echo-prompt branch April 15, 2026 07:22
@emal-avala emal-avala changed the title fix(e2e): L1 — force verbatim echo so marker appears in /messages fix: include tool_result in /messages + unblock CI format Apr 15, 2026
emal-avala added a commit that referenced this pull request Apr 15, 2026
This commit was in PR #117's branch but got lost during squash-merge:
the merge happened before the commit made it to the base ref, so only
the format+prompt tweaks landed. The L1 test therefore still fails on
main (confirmed by the v0.15.1 release e2e run 24442335126 at L1),
and API consumers of /messages still can't see tool output.

Reapply the fix:
- handle_messages now extracts ContentBlock::ToolResult content from
  user messages (in addition to ContentBlock::Text). API consumers can
  now inspect what tools returned via /messages. Response shape
  unchanged.
- Revert L1 prompt to the natural "tell me its contents" — no longer
  depends on the assistant echoing file content verbatim.

Verified locally by re-running the full e2e suite on a branch with
this change (24442163173: 73/73 passing with gpt-5-mini).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant