Skip to content

feat(agent-runner): add tool_choice parameter to fix empty tool calls response in "skills-like" tool call mode#7101

Merged
Soulter merged 1 commit intomasterfrom
fixs/7049
Mar 28, 2026
Merged

feat(agent-runner): add tool_choice parameter to fix empty tool calls response in "skills-like" tool call mode#7101
Soulter merged 1 commit intomasterfrom
fixs/7049

Conversation

@Soulter
Copy link
Copy Markdown
Member

@Soulter Soulter commented Mar 28, 2026

fixes: #7049

Modifications / 改动点

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果


Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Ensure tools are invoked reliably in skills-like tool call mode by making tool usage configurable and improving fallback behavior when models return no tool calls.

New Features:

  • Introduce a configurable tool_choice parameter across provider text_chat and streaming APIs to control whether models must call tools or can choose automatically.

Bug Fixes:

  • Prevent empty responses in skills-like tool schema mode by re-querying with required tool calls, retrying with stronger instructions when necessary, and gracefully falling back to a plain assistant reply when no tool calls are returned.

Enhancements:

  • Refactor assistant completion handling into a dedicated helper to consistently finalize runs and record assistant messages without tool calls.
  • Propagate tool_choice handling through Gemini, Anthropic, and OpenAI providers so their underlying SDKs receive explicit function-calling configuration.

… response in "skills-like" tool call mode

fixes: #7049
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 28, 2026
@Soulter Soulter merged commit 7db7f4a into master Mar 28, 2026
6 checks passed
@Soulter Soulter deleted the fixs/7049 branch March 28, 2026 15:06
@dosubot dosubot bot added the area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. label Mar 28, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The _has_meaningful_assistant_reply helper currently only checks completion_text; consider also treating reasoning_content or a non-empty result_chain as meaningful to avoid unnecessary second-stage retries when the model actually returned a useful explanation.
  • In openai_source._query and _query_stream, payloads["tool_choice"] = payloads.get("tool_choice", "auto") is effectively a no-op given how payloads is constructed; you can either drop this assignment or wire the tool_choice parameter through more explicitly for clarity.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `_has_meaningful_assistant_reply` helper currently only checks `completion_text`; consider also treating `reasoning_content` or a non-empty `result_chain` as meaningful to avoid unnecessary second-stage retries when the model actually returned a useful explanation.
- In `openai_source._query` and `_query_stream`, `payloads["tool_choice"] = payloads.get("tool_choice", "auto")` is effectively a no-op given how `payloads` is constructed; you can either drop this assignment or wire the `tool_choice` parameter through more explicitly for clarity.

## Individual Comments

### Comment 1
<location path="astrbot/core/agent/runners/tool_loop_agent_runner.py" line_range="988-996" />
<code_context>
+                # If the re-query still returns no tool calls, and also does not have a meaningful assistant reply,
+                # we consider it as a failure of the LLM to follow the tool-use instruction,
+                # and we will retry once with a stronger instruction that explicitly requires the LLM to either call the tool or give an explanation.
+                if (
+                    not llm_resp.tools_call_name
+                    and not self._has_meaningful_assistant_reply(llm_resp)
+                ):
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Second-stage repair still enforces `tool_choice="required"`, which conflicts with allowing a no-tool explanation.

In the second-stage repair call you instruct the model that it may either call a tool or return an explanation, but you still pass `tool_choice="required"`:

```python
repair_resp = await self.provider.text_chat(
    contexts=repair_contexts,
    func_tool=param_subset,
    model=self.req.model,
    session_id=self.req.session_id,
    extra_user_content_parts=self.req.extra_user_content_parts,
    tool_choice="required",
    abort_signal=self._abort_signal,
)
```

For Gemini/Anthropic this maps to `ANY`/`any` and for OpenAI to `required`, all of which bias strongly toward always calling a tool. This conflicts with the intent to allow a no-tool explanation and may weaken the fallback behavior. Consider using `tool_choice="auto"` (or omitting it) here so the API constraint matches the instruction that a pure explanation is allowed.

```suggestion
        repair_resp = await self.provider.text_chat(
            contexts=repair_contexts,
            func_tool=param_subset,
            model=self.req.model,
            session_id=self.req.session_id,
            extra_user_content_parts=self.req.extra_user_content_parts,
            tool_choice="auto",
            abort_signal=self._abort_signal,
        )
```
</issue_to_address>

### Comment 2
<location path="astrbot/core/agent/runners/tool_loop_agent_runner.py" line_range="919-922" />
<code_context>
             contexts.insert(0, {"role": "system", "content": instruction})
         return contexts

+    @staticmethod
+    def _has_meaningful_assistant_reply(llm_resp: LLMResponse) -> bool:
+        text = (llm_resp.completion_text or "").strip()
+        return bool(text)
+
     def _build_tool_subset(self, tool_set: ToolSet, tool_names: list[str]) -> ToolSet:
</code_context>
<issue_to_address>
**suggestion:** `_has_meaningful_assistant_reply` only looks at `completion_text`, which can misclassify valid replies.

The current implementation only inspects `completion_text`:

```python
@staticmethod
def _has_meaningful_assistant_reply(llm_resp: LLMResponse) -> bool:
    text = (llm_resp.completion_text or "").strip()
    return bool(text)
```

If a provider returns the assistant reply in another field (e.g. `result_chain` or message parts) and leaves `completion_text` empty, this will incorrectly treat a valid reply as missing and trigger an unnecessary second repair attempt. If `LLMResponse` guarantees that any assistant NL reply is always mirrored to `completion_text`, this is fine; otherwise, this helper should also consider other fields that can contain assistant text before deciding there was “no explanation.”

Suggested implementation:

```python
    @staticmethod
    def _has_meaningful_assistant_reply(llm_resp: LLMResponse) -> bool:
        """
        Return True if the LLMResponse appears to contain a non‑empty natural‑language
        assistant reply in any of the known fields, not just `completion_text`.

        This is intentionally conservative: it only returns False when we are reasonably
        sure there is no assistant text at all, to avoid triggering unnecessary repair
        attempts when providers populate alternative fields.
        """
        # Primary / normalized path: most providers should mirror assistant text here.
        text = (getattr(llm_resp, "completion_text", None) or "").strip()
        if text:
            return True

        # Fallback 1: some providers may populate `result_chain` instead of `completion_text`.
        result_chain = getattr(llm_resp, "result_chain", None)
        if result_chain:
            # If it's already a string, check it directly.
            if isinstance(result_chain, str):
                if result_chain.strip():
                    return True
            # If it's a list/sequence of "messages" or nodes, look for any non‑empty content.
            elif isinstance(result_chain, (list, tuple)):
                for node in result_chain:
                    content = ""

                    # Common patterns: dicts with "content" or "parts".
                    if isinstance(node, dict):
                        if "content" in node:
                            content = node.get("content") or ""
                        elif "parts" in node and isinstance(node["parts"], list):
                            # Join parts that are simple strings.
                            str_parts = [
                                p for p in node["parts"] if isinstance(p, str)
                            ]
                            content = " ".join(str_parts)
                    # Or objects with a `content` attribute.
                    elif hasattr(node, "content"):
                        content = getattr(node, "content") or ""

                    if isinstance(content, str) and content.strip():
                        return True

        # Fallback 2: generic `message` / `messages` fields if present.
        message = getattr(llm_resp, "message", None)
        if isinstance(message, str) and message.strip():
            return True

        messages = getattr(llm_resp, "messages", None)
        if isinstance(messages, (list, tuple)):
            for msg in messages:
                content = ""
                if isinstance(msg, dict):
                    content = msg.get("content") or ""
                elif hasattr(msg, "content"):
                    content = getattr(msg, "content") or ""
                if isinstance(content, str) and content.strip():
                    return True

        # If none of the known fields contain text, treat it as no meaningful reply.
        return False

```

This implementation assumes `LLMResponse` may expose assistant content via `result_chain`, `message`, or `messages` fields, and that these follow typical "message with content" conventions. If your actual `LLMResponse` type uses different field names or shapes (e.g., `output_messages`, `assistant_message`, etc.), you should:
1. Update `_has_meaningful_assistant_reply` to explicitly inspect those real fields.
2. Optionally, centralize the “extract assistant text” logic in a helper on `LLMResponse` (e.g., a `get_assistant_text()` method) and have `_has_meaningful_assistant_reply` call that instead, to keep this runner decoupled from provider‑specific details.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +988 to +996
repair_resp = await self.provider.text_chat(
contexts=repair_contexts,
func_tool=param_subset,
model=self.req.model,
session_id=self.req.session_id,
extra_user_content_parts=self.req.extra_user_content_parts,
tool_choice="required",
abort_signal=self._abort_signal,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Second-stage repair still enforces tool_choice="required", which conflicts with allowing a no-tool explanation.

In the second-stage repair call you instruct the model that it may either call a tool or return an explanation, but you still pass tool_choice="required":

repair_resp = await self.provider.text_chat(
    contexts=repair_contexts,
    func_tool=param_subset,
    model=self.req.model,
    session_id=self.req.session_id,
    extra_user_content_parts=self.req.extra_user_content_parts,
    tool_choice="required",
    abort_signal=self._abort_signal,
)

For Gemini/Anthropic this maps to ANY/any and for OpenAI to required, all of which bias strongly toward always calling a tool. This conflicts with the intent to allow a no-tool explanation and may weaken the fallback behavior. Consider using tool_choice="auto" (or omitting it) here so the API constraint matches the instruction that a pure explanation is allowed.

Suggested change
repair_resp = await self.provider.text_chat(
contexts=repair_contexts,
func_tool=param_subset,
model=self.req.model,
session_id=self.req.session_id,
extra_user_content_parts=self.req.extra_user_content_parts,
tool_choice="required",
abort_signal=self._abort_signal,
)
repair_resp = await self.provider.text_chat(
contexts=repair_contexts,
func_tool=param_subset,
model=self.req.model,
session_id=self.req.session_id,
extra_user_content_parts=self.req.extra_user_content_parts,
tool_choice="auto",
abort_signal=self._abort_signal,
)

Comment on lines +919 to +922
@staticmethod
def _has_meaningful_assistant_reply(llm_resp: LLMResponse) -> bool:
text = (llm_resp.completion_text or "").strip()
return bool(text)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: _has_meaningful_assistant_reply only looks at completion_text, which can misclassify valid replies.

The current implementation only inspects completion_text:

@staticmethod
def _has_meaningful_assistant_reply(llm_resp: LLMResponse) -> bool:
    text = (llm_resp.completion_text or "").strip()
    return bool(text)

If a provider returns the assistant reply in another field (e.g. result_chain or message parts) and leaves completion_text empty, this will incorrectly treat a valid reply as missing and trigger an unnecessary second repair attempt. If LLMResponse guarantees that any assistant NL reply is always mirrored to completion_text, this is fine; otherwise, this helper should also consider other fields that can contain assistant text before deciding there was “no explanation.”

Suggested implementation:

    @staticmethod
    def _has_meaningful_assistant_reply(llm_resp: LLMResponse) -> bool:
        """
        Return True if the LLMResponse appears to contain a non‑empty natural‑language
        assistant reply in any of the known fields, not just `completion_text`.

        This is intentionally conservative: it only returns False when we are reasonably
        sure there is no assistant text at all, to avoid triggering unnecessary repair
        attempts when providers populate alternative fields.
        """
        # Primary / normalized path: most providers should mirror assistant text here.
        text = (getattr(llm_resp, "completion_text", None) or "").strip()
        if text:
            return True

        # Fallback 1: some providers may populate `result_chain` instead of `completion_text`.
        result_chain = getattr(llm_resp, "result_chain", None)
        if result_chain:
            # If it's already a string, check it directly.
            if isinstance(result_chain, str):
                if result_chain.strip():
                    return True
            # If it's a list/sequence of "messages" or nodes, look for any non‑empty content.
            elif isinstance(result_chain, (list, tuple)):
                for node in result_chain:
                    content = ""

                    # Common patterns: dicts with "content" or "parts".
                    if isinstance(node, dict):
                        if "content" in node:
                            content = node.get("content") or ""
                        elif "parts" in node and isinstance(node["parts"], list):
                            # Join parts that are simple strings.
                            str_parts = [
                                p for p in node["parts"] if isinstance(p, str)
                            ]
                            content = " ".join(str_parts)
                    # Or objects with a `content` attribute.
                    elif hasattr(node, "content"):
                        content = getattr(node, "content") or ""

                    if isinstance(content, str) and content.strip():
                        return True

        # Fallback 2: generic `message` / `messages` fields if present.
        message = getattr(llm_resp, "message", None)
        if isinstance(message, str) and message.strip():
            return True

        messages = getattr(llm_resp, "messages", None)
        if isinstance(messages, (list, tuple)):
            for msg in messages:
                content = ""
                if isinstance(msg, dict):
                    content = msg.get("content") or ""
                elif hasattr(msg, "content"):
                    content = getattr(msg, "content") or ""
                if isinstance(content, str) and content.strip():
                    return True

        # If none of the known fields contain text, treat it as no meaningful reply.
        return False

This implementation assumes LLMResponse may expose assistant content via result_chain, message, or messages fields, and that these follow typical "message with content" conventions. If your actual LLMResponse type uses different field names or shapes (e.g., output_messages, assistant_message, etc.), you should:

  1. Update _has_meaningful_assistant_reply to explicitly inspect those real fields.
  2. Optionally, centralize the “extract assistant text” logic in a helper on LLMResponse (e.g., a get_assistant_text() method) and have _has_meaningful_assistant_reply call that instead, to keep this runner decoupled from provider‑specific details.

@dosubot
Copy link
Copy Markdown

dosubot bot commented Mar 28, 2026

Documentation Updates

1 document(s) were updated by changes in this PR:

pr4697的改动
View Changes
@@ -1752,7 +1752,139 @@
 
 ---
 
-### 15. 其他优化
+### 15. skills_like 工具模式 tool_choice 参数与重试机制(PR #7101)
+
+#### 功能说明
+[PR #7101](https://github.com/AstrBotDevs/AstrBot/pull/7101) 新增了 `tool_choice` 参数用于控制 LLM 工具调用策略,并在 skills_like 工具模式下引入了增强的重试机制,解决了 LLM 在参数补全阶段返回空响应的问题([Issue #7049](https://github.com/AstrBotDevs/AstrBot/issues/7049))。
+
+#### tool_choice 参数
+
+##### 参数说明
+新增 `tool_choice` 参数,支持在 provider API 的 `text_chat()` 和 `text_chat_stream()` 方法中指定工具调用策略:
+
+- **参数类型**:`Literal["auto", "required"]`
+- **默认值**:`"auto"`
+- **语义说明**:
+  - `"auto"`:LLM 自行决定是否调用工具或直接回复
+  - `"required"`:强制要求 LLM 调用其中一个可用工具
+
+##### 提供商适配
+`tool_choice` 参数已在所有三个 provider 源中实现,并根据各提供商的 API 规范进行了适配:
+
+- **OpenAI**:直接传递 `tool_choice` 值(`"auto"` 或 `"required"`)
+- **Anthropic**:转换为结构化对象(`{"type": "auto"}` 或 `{"type": "any"}`)
+- **Gemini**:使用 `FunctionCallingConfigMode.AUTO` 或 `FunctionCallingConfigMode.ANY` 枚举值
+
+#### skills_like 模式重试机制增强
+
+##### 问题背景
+在 skills_like 工具模式下,系统分两轮与 LLM 交互:
+1. 第一轮使用轻量级工具描述获取工具名称
+2. 第二轮(re-query)使用完整参数 schema 进行参数补全
+
+修复前,如果 LLM 在参数补全阶段返回空响应(既无工具调用也无有意义的文本回复),系统会直接失败,导致工具执行流程中断。
+
+##### 两阶段重试机制
+PR #7101 在 `tool_loop_agent_runner.py` 的 `_resolve_tool_exec()` 方法中引入了两阶段重试机制:
+
+**阶段 1 - 强制工具调用重试**:
+- 当初始 re-query 返回空响应时,系统使用 `tool_choice="required"` 参数重新请求 LLM
+- 强制要求 LLM 调用工具或提供解释
+
+**阶段 2 - 强化指令重试**:
+- 如果第一阶段重试仍失败,系统会注入更强的指令并再次使用 `tool_choice="required"` 重试
+- 注入的额外指令明确要求 LLM 执行以下两者之一:
+  1. 使用提供的工具 schema 调用工具
+  2. 向用户解释为何无法调用工具
+
+**注入的额外指令内容**:
+```
+This is the second-stage tool execution step. 
+You must do exactly one of the following: 
+1. Call one of the selected tools using the provided tool schema. 
+2. If calling a tool is no longer possible or appropriate, reply to the user with a brief explanation of why. 
+Do not return an empty response. 
+Do not ignore the selected tools without explanation.
+```
+
+##### 平滑回退机制
+如果两阶段重试均失败但返回了有意义的 assistant 回复,系统会:
+1. 将响应视为普通的 assistant 文本回复
+2. 通过 `_complete_with_assistant_response()` 方法完成执行
+3. 在消息流中显示 LLM 的回复内容
+4. 避免因工具调用失败而中断对话
+
+#### 代码重构
+
+##### 新增辅助方法
+
+**`_complete_with_assistant_response()` 方法**:
+- 统一处理 assistant 响应的最终化逻辑
+- 构建包含 thinking(如有)和文本内容的消息
+- 将 assistant 消息追加到对话历史
+- 调用 `on_agent_done` 钩子并清理未消费的跟进消息
+- 该方法在多个场景下复用,确保行为一致性
+
+**`_has_meaningful_assistant_reply()` 静态方法**:
+- 检查 LLM 响应是否包含有意义的文本内容
+- 通过去除空白字符判断回复是否为空
+- 用于决定是否需要进一步重试
+
+**`_build_tool_requery_context()` 方法增强**:
+- 新增可选的 `extra_instruction` 参数
+- 支持在第二阶段重试时注入额外指令
+- 额外指令会附加到系统消息中,引导 LLM 更明确地执行工具调用
+
+#### 行为说明
+
+##### 工具调用流程(skills_like 模式)
+1. **初始 re-query**:使用完整工具 schema 请求参数补全
+2. **空响应检测**:检查是否返回工具调用或有意义的文本
+3. **第一阶段重试**:使用 `tool_choice="required"` 强制工具调用
+4. **第二阶段重试**:注入更强指令并再次使用 `tool_choice="required"`
+5. **平滑回退**:如有有意义的文本回复,转为普通 assistant 响应
+
+##### 可观测性增强
+系统在重试过程中记录详细的警告日志,便于诊断:
+
+- **初始空响应**:
+  ```
+  skills_like tool re-query returned no tool calls; fallback to assistant response.
+  ```
+- **第一阶段重试触发**:
+  ```
+  skills_like tool re-query returned no tool calls and no explanation; retrying with stronger instruction.
+  ```
+
+#### 用户收益
+- **提升可靠性**:防止因 LLM 空响应导致的工具执行失败,提高系统稳定性
+- **改善用户体验**:即使工具调用失败,用户也能收到 LLM 的解释或回复,避免静默失败
+- **增强可观测性**:通过分级日志清晰记录重试流程,便于开发者诊断和优化
+- **跨提供商一致性**:`tool_choice` 参数在所有提供商(OpenAI、Anthropic、Gemini)中统一实现,确保行为一致性
+
+#### 技术实现要点
+
+##### 重试逻辑集成
+重试逻辑完全集成在 `_resolve_tool_exec()` 方法中,避免影响其他工具模式:
+- 仅在 `tool_schema_mode == "skills_like"` 时激活
+- 重试条件:无工具调用且无有意义的 assistant 回复
+- 重试次数:最多两次(第一阶段 + 第二阶段)
+
+##### 上下文构建优化
+`_build_tool_requery_context()` 方法的增强确保额外指令正确注入:
+- 如果上下文首条消息为系统消息,将指令附加到现有内容
+- 否则在上下文开头插入新的系统消息
+- 保持上下文结构的完整性和一致性
+
+##### 回退策略
+平滑回退机制确保用户体验不受影响:
+- LLM 的文本回复通过 `AgentResponse` 正常返回
+- 对话历史正确记录 assistant 消息
+- Agent 状态转换为 DONE,执行流程正常结束
+
+---
+
+### 16. 其他优化
 - JWT 处理和错误处理机制增强,提升系统安全性和稳定性
 - UI 细节优化,提升用户体验
 - 日志与异常处理增强,便于问题追踪

How did I do? Any feedback?  Join Discord

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a tool_choice parameter across the LLM provider interface and its implementations (OpenAI, Anthropic, Gemini), allowing for explicit control over whether tool calls are required or optional. Additionally, the tool_loop_agent_runner has been refactored to centralize assistant response finalization and now includes a retry mechanism with reinforced instructions when a tool re-query fails to yield a tool call or a meaningful explanation. I have no feedback to provide as there were no review comments to evaluate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]工具调用被错误的关闭

1 participant