Merged
Conversation
Makes score optional and introduces an explicit `skipped: bool` flag so metrics can signal "no applicable data" distinctly from an error. Allows downstream consumers to treat skipped metrics as a first-class state instead of inferring it from None scores. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Agent speech fidelity (S2S) and transcription accuracy key entities both have legitimate cases where no entities exist to score. Previously these returned score=0.0 with error="Aggregation failed" (for S2S) or a zero-valued score that conflated with real zero scores. Now they set skipped=True with score=None so consumers can handle the case correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- validation_runner: skipped metrics no longer fail validation; they are excluded from threshold checks. - pass_at_k: skipped trials are excluded from n/c so pass@k is computed over the remaining valid trials. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…osite Previously any None component (missing, errored, or legitimately skipped) would collapse EVA-A_pass to None, excluding the record from composite pass statistics. Now a skipped component is excluded from the pass check while remaining applicable components still determine pass/fail. Missing or errored components still collapse the composite to None, since that represents genuine data absence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7d34aa6 to
7093a0a
Compare
tara-servicenow
approved these changes
Apr 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.