Source
Discovered during E2E validation of #78.
Problem
Running agent_workflow_baseline against a Foundry agent that has no tool definitions registered (FoundryAgent in the test environment) produces near-zero scores:
TaskCompletionEvaluator = 0.4
ToolCallAccuracyEvaluator = 2.0
IntentResolutionEvaluator = 3.6
TaskAdherenceEvaluator = 1.0
ToolSelectionEvaluator = 0.2
ToolInputAccuracyEvaluator = 0.8
These look like "the agent is broken", but they actually mean "the agent doesn't have tools at all". A user new to the toolkit will misdiagnose this.
Options
A. Runner-side warning — when the dataset has tool_definitions rows but the agent doesn't expose any tools (detectable via the Foundry agent definition), log:
WARN: agent_workflow_baseline expects an agent with tool definitions. Configured agent 'FoundryAgent' has 0 tools registered. Scores will be low.
B. Doc note in docs/bundles.md and the tutorial: "Use this bundle only with agents that have the tool definitions in your dataset registered."
Recommendation: both.
Severity
Low-Medium (UX / first-impression)
Source
Discovered during E2E validation of #78.
Problem
Running
agent_workflow_baselineagainst a Foundry agent that has no tool definitions registered (FoundryAgentin the test environment) produces near-zero scores:These look like "the agent is broken", but they actually mean "the agent doesn't have tools at all". A user new to the toolkit will misdiagnose this.
Options
A. Runner-side warning — when the dataset has
tool_definitionsrows but the agent doesn't expose any tools (detectable via the Foundry agent definition), log:B. Doc note in
docs/bundles.mdand the tutorial: "Use this bundle only with agents that have the tool definitions in your dataset registered."Recommendation: both.
Severity
Low-Medium (UX / first-impression)