fix(skills): record tool call outcomes in native tool path by bug-ops · Pull Request #1462 · bug-ops/zeph

bug-ops · 2026-03-09T16:43:05Z

Problem

handle_native_tool_calls never called record_skill_outcomes(), so Wilson score re-ranking received zero signal from actual tool execution when using providers with native tool_use support (Claude, OpenAI, OpenAI-compatible).

Only the legacy text-extraction path called record_skill_outcomes (via handle_tool_result), meaning skill rankings were based solely on explicit user feedback corrections — never on whether tool calls actually succeeded or failed.

Root Cause

handle_native_tool_calls (tool_execution.rs) processes tool results and sends outputs back to the LLM, but contained no calls to record_skill_outcomes. The legacy handle_tool_result path had these calls at lines ~638, 644, 659, 736.

Fix

Added record_skill_outcomes calls in the results loop of handle_native_tool_calls, after the match tool_result block, mirroring legacy behavior:

Result	Outcome
`Ok(Some(out))` — success output	`"success"`
`Ok(Some(out))` — `[error]`/`[exit code`	`"tool_failure"` + `attempt_self_reflection`
`Ok(Some(out))` — empty output	`"success"`
`Ok(None)`	`"success"` (via `"(no output)"` → else branch)
`Err(e)`	`"tool_failure"`

attempt_self_reflection is included on the native failure path (matching legacy) — at the insertion point persist_message/push_message have not yet been called, so return Ok(()) is safe.

Tests

5 regression tests added covering all branches:

native_tool_success_outcome_does_not_panic
native_tool_error_output_does_not_panic
native_tool_exit_code_output_does_not_panic
native_tool_executor_error_does_not_panic
native_tool_no_active_skills_does_not_panic

Fixes #1436

`handle_native_tool_calls` never called `record_skill_outcomes()`, so Wilson score received no signal from actual tool execution when using providers with native tool_use support (Claude, OpenAI). Add `record_skill_outcomes` calls after each tool result in the native path, mirroring the legacy `handle_tool_result` behavior: - success output → "success" outcome - `[error]`/`[exit code` output → "tool_failure" + attempt_self_reflection - executor error (`Err(e)`) → "tool_failure" Also add 5 regression tests covering all branches. Fixes #1436

Apply ContentSource sanitization to tool output before passing it to attempt_self_reflection(), matching the legacy error path behavior. Also document the known limitation that a self-reflection early return from handle_native_tool_calls drops remaining batch results. Update config snapshot after LSP defaults removal in main.

The lsp section was incorrectly removed from the snapshot during branch sync. Config::default() with --features full still includes it.

github-actions bot added bug Something isn't working size/L Large PR (201-500 lines) rust Rust code changes core zeph-core crate and removed size/L Large PR (201-500 lines) labels Mar 9, 2026

bug-ops enabled auto-merge (squash) March 9, 2026 17:09

bug-ops added 3 commits March 9, 2026 18:22

fix(test): restore config snapshot with lsp section

76024e2

The lsp section was incorrectly removed from the snapshot during branch sync. Config::default() with --features full still includes it.

bug-ops force-pushed the wilson-score-active-skill-names branch from fd6de07 to 76024e2 Compare March 9, 2026 17:22

bug-ops merged commit 3bdd8f2 into main Mar 9, 2026
18 checks passed

bug-ops deleted the wilson-score-active-skill-names branch March 9, 2026 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(skills): record tool call outcomes in native tool path#1462

fix(skills): record tool call outcomes in native tool path#1462
bug-ops merged 3 commits intomainfrom
wilson-score-active-skill-names

bug-ops commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bug-ops commented Mar 9, 2026

Problem

Root Cause

Fix

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant