Skip to content

fix: Add tool-honesty instructions to prevent workers from fabricating results#371

Merged
PureWeen merged 2 commits intomainfrom
fix/session-uitestvalidationskill-orchestrat-20260313-1820
Mar 13, 2026
Merged

fix: Add tool-honesty instructions to prevent workers from fabricating results#371
PureWeen merged 2 commits intomainfrom
fix/session-uitestvalidationskill-orchestrat-20260313-1820

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

Problem

Workers in multi-agent orchestration could fabricate results when their CLI tools failed to run, instead of honestly reporting the failure. The user expectation is: if tools don't run, workers should fail and report that — never fall back to self-evaluation or making up results.

Root Cause

Three prompt construction paths lacked tool-usage honesty instructions:

  • Worker prompt (ExecuteWorkerAsync): No instruction about tool failures
  • Synthesis prompt (BuildSynthesisPrompt): No guidance to verify tool usage in worker outputs
  • Evaluator prompts (both BuildEvaluatorPrompt paths): No tool-verification scoring dimension

Fix

  1. Worker prompt: Added CRITICAL: Tool Usage & Honesty Policy section requiring actual tool execution, honest failure reporting, and TOOL_FAILURE sentinel for unrecoverable errors
  2. Synthesis prompt: Added instructions to flag fabricated results and preserve TOOL_FAILURE reports as-is
  3. Evaluator prompts: Added Tool Verification as a 5th scoring dimension (multi-agent evaluator) and verification check (reflection cycle evaluator)
  4. Synthesis-with-eval prompt: Added Tool Verification assessment dimension

Tests

Added 7 unit tests in WorkerToolHonestyTests.cs covering all modified prompt paths:

  • Worker prompt contains tool-honesty instructions
  • Synthesis prompt contains tool-verification guidance
  • Synthesis prompt preserves TOOL_FAILURE reports
  • Multi-agent evaluator includes Tool Verification dimension
  • Synthesis-with-eval prompt includes Tool Verification
  • ReflectionCycle evaluator includes tool verification
  • ReflectionCycle evaluator preserves existing functionality

All 2544 passing tests continue to pass (4 pre-existing font-sizing failures unrelated).

PureWeen and others added 2 commits March 13, 2026 13:38
…r prompts

Workers in multi-agent orchestration could fabricate results when CLI tools
failed to run, instead of reporting the failure. This adds explicit instructions
across all orchestration prompt paths:

- Worker prompt: CRITICAL tool usage policy requiring actual tool execution,
  honest failure reporting, and TOOL_FAILURE sentinel for unrecoverable errors
- Synthesis prompt: Instructions to flag fabricated results and preserve
  TOOL_FAILURE reports as-is without guessing missing results
- Evaluator prompts (both multi-agent and reflection cycle): New 'Tool
  Verification' scoring dimension that penalizes results lacking evidence
  of actual tool execution
- Synthesis-with-eval prompt: Tool Verification assessment dimension

Includes 7 unit tests covering all modified prompt paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The test was constructing its own string and asserting against it,
providing no real regression protection. Now extracts the prompt
building into a BuildWorkerPrompt static method and tests it via
reflection, matching the pattern of the other 6 tests in the file.

Co-authored-by: GitHub Copilot <copilot@github.com>
@PureWeen PureWeen merged commit 79417d5 into main Mar 13, 2026
@PureWeen PureWeen deleted the fix/session-uitestvalidationskill-orchestrat-20260313-1820 branch March 13, 2026 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant