Skip to content

Serialize stored shell tool calls correctly on susbequent requests#41

Merged
ScriptSmith merged 2 commits into
mainfrom
shell-persistence
May 31, 2026
Merged

Serialize stored shell tool calls correctly on susbequent requests#41
ScriptSmith merged 2 commits into
mainfrom
shell-persistence

Conversation

@ScriptSmith
Copy link
Copy Markdown
Owner

No description provided.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 31, 2026

Greptile Summary

This PR fixes a serialization bug where shell_call / shell_call_output items reconstructed from previous_response_id history were forwarded verbatim to function-mode providers, causing OpenAI-compatible upstreams to reject the request (array output field where a string is expected) and Anthropic/Bedrock/Vertex to silently drop the tool results.

  • Adds rewrite_shell_history_to_function_calls (called unconditionally at the top of preprocess_shell_tools, before the early-return for no-tools, so continuations that no longer re-declare the shell tool still get their history normalized) to rewrite stored hosted-shell items to the function_call / function_call_output pair the model originally exchanged.
  • Adds render_shell_output_text to flatten the array output chunks back into the plain-text exit_code/stdout/stderr blob the live tool loop sends, and updates provider documentation to describe this rewrite contract.

Confidence Score: 5/5

Safe to merge — the rewrite is well-scoped, only touches function-mode provider paths, and is guarded by an existing openai_keep_native_shell gate for native passthrough; a focused integration test covers the round-trip.

The core logic is straightforward: iterate input items, swap two variant types, flatten array output to a string. The output text format matches the live executor for the common single-chunk case. The only discrepancy found is cosmetic — timed-out calls reconstruct exit_code 124 instead of -1 — which is unlikely to affect downstream model behavior.

The render_shell_output_text timeout branch in src/services/shell_tool.rs is worth a second look given the exit-code mismatch with the live executor path.

Important Files Changed

Filename Overview
src/services/shell_tool.rs Adds rewrite_shell_history_to_function_calls and render_shell_output_text to convert stored shell_call/shell_call_output history items into function_call/function_call_output for function-mode providers; minor fidelity discrepancy for timed-out calls (exit code 124 vs -1 in reconstructed text)
src/services/responses_chain.rs Doc-comment only: explains that ShellCall/ShellCallOutput items are replayed verbatim here and normalized by preprocess_shell_tools downstream; no logic changes
agent_instructions/adding_provider.md Documentation update explaining the shell history rewrite contract for provider implementors; accurately describes the rewrite behavior and its motivation

Sequence Diagram

sequenceDiagram
    participant Client
    participant Hadrian
    participant Provider

    Note over Client,Provider: Turn 1 (function mode)
    Client->>Hadrian: POST /responses (shell tool declared)
    Hadrian->>Provider: "function_call { name: shell, arguments: {...} }"
    Provider-->>Hadrian: shell executes
    Hadrian-->>Client: shell_call + shell_call_output (array output, persisted)

    Note over Client,Provider: Turn 2 - continuation (previous_response_id)
    Client->>Hadrian: POST /responses (previous_response_id)
    Hadrian->>Hadrian: output_item_to_input() replays shell_call/shell_call_output verbatim
    Hadrian->>Hadrian: preprocess_shell_tools() calls rewrite_shell_history_to_function_calls()
    Note right of Hadrian: ShellCall to OutputFunctionCall, ShellCallOutput to FunctionCallOutput, array output flattened to string
    Hadrian->>Provider: function_call + function_call_output (string output)
    Provider-->>Client: next model response
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
src/services/shell_tool.rs:838-843
**Timeout exit-code mismatch in reconstructed history**

The live executor writes `exit_code: {exit_for_report}` into the continuation text blob, where `exit_for_report = final_exit.unwrap_or(-1)`. A killed/timeout call has `final_exit = None`, so the model originally saw `exit_code: -1`. On continuation, `render_shell_output_text` maps `ShellCallOutcome::Timeout` to `124` — a `timeout(1)` sentinel — so the reconstructed history tells the model `exit_code: 124` rather than what it saw. The comment ("matches how the live loop reports a killed call") is therefore inaccurate. Most models tolerate any non-zero exit code for a failed call, so this is unlikely to change behavior, but the infidelity is worth noting for future multi-turn debugging.

Reviews (3): Last reviewed commit: "Review fixes" | Re-trigger Greptile

Comment thread src/services/shell_tool.rs
Comment thread src/services/shell_tool.rs Outdated
@ScriptSmith
Copy link
Copy Markdown
Owner Author

@greptile-apps

@ScriptSmith ScriptSmith merged commit 9b9efe1 into main May 31, 2026
20 checks passed
@ScriptSmith ScriptSmith deleted the shell-persistence branch May 31, 2026 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant