Objective
Define and implement the Evaluation Bundle abstraction that standardizes how Foundry evaluators are configured and grouped.
Scope
Introduce reusable evaluation bundles that map to common agent patterns such as RAG agents and tool-using agents.
Tasks
-
Define EvaluationBundle model
-
Create bundle registry mechanism
-
Implement two initial bundles:
- rag_baseline
- tool_agent_baseline
-
Bundle configuration must support:
- Evaluator list
- Threshold definitions
- Dataset reference
-
Implement bundle resolution logic from YAML config
-
Add bundle validation logic
Acceptance Criteria
- YAML config can reference a bundle by name
- Bundle loads correct evaluator configuration
- Thresholds are applied correctly
- Bundle system is extensible without modifying core CLI logic
Out of Scope
- Observability integration
- Telemetry export
- Dashboard generation
Objective
Define and implement the Evaluation Bundle abstraction that standardizes how Foundry evaluators are configured and grouped.
Scope
Introduce reusable evaluation bundles that map to common agent patterns such as RAG agents and tool-using agents.
Tasks
Define EvaluationBundle model
Create bundle registry mechanism
Implement two initial bundles:
Bundle configuration must support:
Implement bundle resolution logic from YAML config
Add bundle validation logic
Acceptance Criteria
Out of Scope