feat(eval): Add span output attributes and metadata for evaluation spans #1079
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds comprehensive span output attributes and metadata for evaluation spans, following the AgentOutputSpanAttributes pattern with Pydantic models.
Changes
New Module:
_span_utils.pyCreated a centralized utility module for evaluation span configuration:
Pydantic Models:
EvalSetRunOutput: Output model for "Evaluation Set Run" spans (score as int)EvaluationOutput: Output model for "Evaluation" spans (score as int)EvaluationOutputSpanOutput: Output model for "Evaluation output" spans (type, value, justification)Calculation Functions:
calculate_overall_score(): Calculates average across all evaluatorscalculate_evaluation_average_score(): Calculates average for a single evaluationLow-level Attribute Setters:
set_eval_set_run_output_and_metadata(): Sets output and metadata for eval set run spansset_evaluation_output_and_metadata(): Sets output and metadata for evaluation spansset_evaluation_output_span_output(): Sets output for evaluation output spansHigh-level Configuration Functions:
configure_eval_set_run_span(): Complete configuration including schema retrieval and score calculationconfigure_evaluation_span(): Complete configuration with error handlingUpdated:
_runtime.py_span_utils.pySpan Attributes Added
All evaluation spans now include:
Tests
Unit Tests (
test_eval_span_utils.py): 19 new testsIntegration Tests (
test_eval_tracing_integration.py): 3 new testsSpan Attribute Tests (
test_eval_runtime_spans.py): 13 new testsTest Infrastructure Fix:
SpanCapturingTracerto capture attributes set viaspan.set_attribute()(not just initial attributes)Test Results
Files Changed
src/uipath/_cli/_evals/_span_utils.py(new, 290 lines)src/uipath/_cli/_evals/_runtime.py(40 lines modified)tests/cli/eval/test_eval_span_utils.py(new, 462 lines)tests/cli/eval/test_eval_runtime_spans.py(184 lines added)tests/cli/eval/test_eval_tracing_integration.py(297 lines added)Total: 1,268 lines added, 5 lines removed
🤖 Generated with Claude Code
Development Package
uipath pack --nolockto get the latest dev build from this PR (requires version range).