Skip to content

Conversation

@AAgnihotry
Copy link
Contributor

@AAgnihotry AAgnihotry commented Jan 9, 2026

Summary

This PR adds comprehensive span output attributes and metadata for evaluation spans, following the AgentOutputSpanAttributes pattern with Pydantic models.

Changes

New Module: _span_utils.py

Created a centralized utility module for evaluation span configuration:

Pydantic Models:

  • EvalSetRunOutput: Output model for "Evaluation Set Run" spans (score as int)
  • EvaluationOutput: Output model for "Evaluation" spans (score as int)
  • EvaluationOutputSpanOutput: Output model for "Evaluation output" spans (type, value, justification)

Calculation Functions:

  • calculate_overall_score(): Calculates average across all evaluators
  • calculate_evaluation_average_score(): Calculates average for a single evaluation

Low-level Attribute Setters:

  • set_eval_set_run_output_and_metadata(): Sets output and metadata for eval set run spans
  • set_evaluation_output_and_metadata(): Sets output and metadata for evaluation spans
  • set_evaluation_output_span_output(): Sets output for evaluation output spans

High-level Configuration Functions:

  • configure_eval_set_run_span(): Complete configuration including schema retrieval and score calculation
  • configure_evaluation_span(): Complete configuration with error handling

Updated: _runtime.py

  • Refactored to use utility functions from _span_utils.py
  • Simplified code from ~30 lines to ~6 lines per span type
  • All three span types now properly set output and metadata attributes

Span Attributes Added

All evaluation spans now include:

  • output: JSON string containing score (for eval set run and evaluation) or type/value/justification (for evaluation output)
  • agentId: execution ID
  • agentName: "N/A"
  • inputSchema: runtime input schema as JSON string
  • outputSchema: runtime output schema as JSON string

Tests

Unit Tests (test_eval_span_utils.py): 19 new tests

  • Pydantic model serialization tests
  • Calculation function tests
  • Low-level span attribute setting tests
  • High-level configuration function tests with async/await

Integration Tests (test_eval_tracing_integration.py): 3 new tests

  • Verification that "Evaluation Set Run" span has output with score
  • Verification that "Evaluation" span has metadata attributes
  • Verification that "Evaluation output" span has type, value, and justification

Span Attribute Tests (test_eval_runtime_spans.py): 13 new tests

  • Tests for output attributes on all three span types
  • Tests for metadata attributes (agentId, agentName, schemas)
  • Tests for proper JSON structure and types

Test Infrastructure Fix:

  • Fixed SpanCapturingTracer to capture attributes set via span.set_attribute() (not just initial attributes)

Test Results

  • ✅ All 1531 tests passing (7 skipped for authentication)
  • ✅ 19 new unit tests for span utilities
  • ✅ 3 new integration tests for span attributes
  • ✅ 13 new span attribute tests
  • ✅ Linting: ruff check passed
  • ✅ Formatting: ruff format passed
  • ✅ Type checking: mypy passed

Files Changed

  • src/uipath/_cli/_evals/_span_utils.py (new, 290 lines)
  • src/uipath/_cli/_evals/_runtime.py (40 lines modified)
  • tests/cli/eval/test_eval_span_utils.py (new, 462 lines)
  • tests/cli/eval/test_eval_runtime_spans.py (184 lines added)
  • tests/cli/eval/test_eval_tracing_integration.py (297 lines added)

Total: 1,268 lines added, 5 lines removed

🤖 Generated with Claude Code

Development Package

  • Use uipath pack --nolock to get the latest dev build from this PR (requires version range).
  • Add this package as a dependency in your pyproject.toml:
[project]
dependencies = [
  # Exact version:
  "uipath==2.4.9.dev1010793680",

  # Any version from PR
  "uipath>=2.4.9.dev1010790000,<2.4.9.dev1010800000"
]

[[tool.uv.index]]
name = "testpypi"
url = "https://test.pypi.org/simple/"
publish-url = "https://test.pypi.org/legacy/"
explicit = true

[tool.uv.sources]
uipath = { index = "testpypi" }

[tool.uv]
override-dependencies = [
    "uipath>=2.4.9.dev1010790000,<2.4.9.dev1010800000",
]
image image image

- Created _span_utils.py module with Pydantic models for span outputs
  - EvalSetRunOutput: for "Evaluation Set Run" spans
  - EvaluationOutput: for "Evaluation" spans
  - EvaluationOutputSpanOutput: for "Evaluation output" spans
- Added calculation functions for overall and evaluation average scores
- Added low-level functions to set span attributes (output, agentId, agentName, schemas)
- Added high-level configuration functions for complete span setup
- Refactored _runtime.py to use utility functions (reduced from ~30 to ~6 lines per span)
- Added comprehensive unit tests (19 tests in test_eval_span_utils.py)
- Added integration tests (3 tests in test_eval_tracing_integration.py)
- Added span attribute tests (13 tests in test_eval_runtime_spans.py)
- Fixed SpanCapturingTracer to capture attributes set via set_attribute()

All spans now include:
- output: JSON with score for eval set run and evaluation spans
- output: JSON with type, value, justification for evaluation output spans
- agentId: execution ID
- agentName: "N/A"
- inputSchema: runtime input schema as JSON
- outputSchema: runtime output schema as JSON

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@github-actions github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Jan 9, 2026
- Changed import from opentelemetry.sdk.trace.Span to opentelemetry.trace.Span (protocol)
- Added proper type annotations to MockSpan class
- Added None checks before accessing Status attributes (status_code, description)
- Fixed __str__ mock configuration with proper lambda signature
- Added type: ignore comments for MockSpan arg-type compatibility in tests

All mypy checks now pass with no errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@AAgnihotry AAgnihotry added the build:dev Create a dev build from the pr label Jan 9, 2026
@AAgnihotry AAgnihotry merged commit 626f0c2 into main Jan 9, 2026
120 of 121 checks passed
@AAgnihotry AAgnihotry deleted the feat/spanAttr branch January 9, 2026 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build:dev Create a dev build from the pr test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants