Skip to content

agent_workflow_baseline silently scores ~0 against tool-less agents #117

@Dongbumlee

Description

@Dongbumlee

Source

Discovered during E2E validation of #78.

Problem

Running agent_workflow_baseline against a Foundry agent that has no tool definitions registered (FoundryAgent in the test environment) produces near-zero scores:

TaskCompletionEvaluator      = 0.4
ToolCallAccuracyEvaluator    = 2.0
IntentResolutionEvaluator    = 3.6
TaskAdherenceEvaluator       = 1.0
ToolSelectionEvaluator       = 0.2
ToolInputAccuracyEvaluator   = 0.8

These look like "the agent is broken", but they actually mean "the agent doesn't have tools at all". A user new to the toolkit will misdiagnose this.

Options

A. Runner-side warning — when the dataset has tool_definitions rows but the agent doesn't expose any tools (detectable via the Foundry agent definition), log:

WARN: agent_workflow_baseline expects an agent with tool definitions. Configured agent 'FoundryAgent' has 0 tools registered. Scores will be low.

B. Doc note in docs/bundles.md and the tutorial: "Use this bundle only with agents that have the tool definitions in your dataset registered."

Recommendation: both.

Severity

Low-Medium (UX / first-impression)

Metadata

Metadata

Assignees

No one assigned

    Labels

    sprint3Sprint 3 test scenarios

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions