fix(tools): append_messages() inside the tool runner loop no longer infinite-loops or drops the assistant turn#1538
Open
Zelys-DFKH wants to merge 1 commit into
Conversation
…ite-loops or drops the assistant turn Fixes anthropics#1536. The runner used a _messages_modified boolean that could not distinguish between two distinct append behaviors: a user supplying their own tool_result (skip auto-append) vs. a user appending extra context (auto-append still needed). Both set the flag True, so calling append_messages() inside the loop caused either an infinite loop or a 400 API error depending on what was appended. Replace the flag with a message-count snapshot taken before each yield. After yield returns, slice the list to get what the user appended and branch on whether those messages contain a tool_result block. If yes, insert the assistant turn before the user's result only when they did not include it themselves. If no, generate tool responses as normal and splice the (assistant, tool_result) pair in before any extra messages. Add _set_messages_list helper to BaseToolRunner so the four call sites that replace the full message list can capture the list as a method parameter rather than a loop variable, satisfying both ruff B023 and mypy's callable-type inference. Remove the xfail from test_custom_message_handling and update its HTTP fixture to return a valid 200 response. Add three regression tests covering: plain user message append (no infinite loop), manual tool_result (assistant turn auto-inserted), and user-supplied assistant + tool_result (no duplicate assistant message).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1536.
What was broken
Calling
append_messages()inside a tool runner loop had two failure modes, depending on what you appended.If you added a plain context message like
{"role": "user", "content": "Remember: be concise."}, the loop ran forever. The runner used a_messages_modifiedflag to decide whether to skip the auto-generated(assistant, tool_result)pair. The problem was that both user appends and the auto-append itself set the flag toTrue, so once you calledappend_messages(), the flag never came back down and the loop never stopped.If you manually provided a
tool_resultblock (for cases where you want full control over what gets returned), the runner sent it to the API without the preceding assistant turn. That produced a 400unexpected tool_use_iderror every time.How the fix works
Before yielding in the loop, the runner now snapshots the current message count. After yield returns, anything between that index and the end of the list is what the user appended. The runner checks whether any of those messages contain a
tool_resultblock.If yes: the user is handling tool results themselves. The runner inserts the assistant turn before their result only if they didn't include it already.
If no: the runner generates tool responses as usual, then splices the auto-generated
(assistant, tool_result)pair in before any extra messages the user added.The logic is the same in both the sync (
BaseSyncToolRunner) and async (BaseAsyncToolRunner) paths.BetaStreamingToolRunnerandBetaAsyncStreamingToolRunnerboth inherit__run__from the base classes, so they get the fix for free.The four call sites that set the full message list now use a private
_set_messages_listhelper instead of inline lambdas. Capturing a loop variable in a lambda triggersruff B023, and the default-arg workaround (msgs=new_messages) confuses mypy's callable-type inference. A method parameter solves both at once.Tests
test_custom_message_handlingwas previously markedxfailwith a# TODO: fix the append_messages methodcomment. The xfail is removed and the HTTP fixture updated to replay a proper 200 response instead of the 400 the bug used to produce.Three new regression tests cover the scenarios from the issue: appending a plain user message (must not loop, must place the tool result correctly), providing a manual tool result (runner must prepend the assistant turn), and providing both the assistant turn and the tool result manually (runner must not insert a second assistant message).