Skip to content

fix(ai): normalize boolean scores in onlineEval scoresSummary#263

Merged
lukasmalkmus merged 1 commit intomainfrom
lukasmalkmus/yoxwysopstxv
Feb 25, 2026
Merged

fix(ai): normalize boolean scores in onlineEval scoresSummary#263
lukasmalkmus merged 1 commit intomainfrom
lukasmalkmus/yoxwysopstxv

Conversation

@lukasmalkmus
Copy link
Contributor

@lukasmalkmus lukasmalkmus commented Feb 24, 2026

Overview

  • onlineEval() was writing raw boolean scores (true/false) into the parent eval span's eval.case.scores attribute, while child scorer spans correctly normalized them to 1/0 with eval.score.is_boolean metadata via normalizeBooleanScore()
  • Apply the same normalizeBooleanScore() call when building scoresSummary so both parent and child spans produce consistent numeric scores

Note

Low Risk
Small telemetry-only change that affects how scores are serialized into span attributes; low risk aside from potential downstream expectations of boolean values.

Overview
Ensures onlineEval() writes consistent numeric scores into the parent eval span’s eval.case.scores summary by normalizing boolean score values (true/false1/0) and propagating the corresponding eval.score.is_boolean metadata.

This updates onlineEval.ts to call normalizeBooleanScore() while building scoresSummary, and only emits normalized metadata when non-empty.

Written by Cursor Bugbot for commit bfa6ce7. This will update automatically on new commits. Configure here.

When a scorer returned `{ score: true }` or `{ score: false }`, the
parent eval span's `eval.case.scores` attribute contained raw booleans
instead of normalized numeric values with `eval.score.is_boolean`
metadata. This was inconsistent with individual scorer child spans
which already called `normalizeBooleanScore()` via `executor.ts`.

Apply the same normalization when building `scoresSummary` so both
the parent eval span and child scorer spans produce consistent
numeric scores.
@lukasmalkmus lukasmalkmus self-assigned this Feb 24, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 24, 2026

Open in StackBlitz

npm i https://pkg.pr.new/axiomhq/ai/axiom@263

commit: bfa6ce7

@lukasmalkmus lukasmalkmus enabled auto-merge (squash) February 24, 2026 14:18
@lukasmalkmus lukasmalkmus merged commit ff75842 into main Feb 25, 2026
11 checks passed
@lukasmalkmus lukasmalkmus deleted the lukasmalkmus/yoxwysopstxv branch February 25, 2026 13:32
lukasmalkmus pushed a commit that referenced this pull request Feb 25, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.46.1](axiom-v0.46.0...axiom-v0.46.1)
(2026-02-25)


### Bug Fixes

* **ai:** move online eval scorer counters to eval.* namespace
([#264](#264))
([bef94db](bef94db))
* **ai:** normalize boolean scores in onlineEval scoresSummary
([#263](#263))
([ff75842](ff75842))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Release metadata/changelog-only changes with no functional code
modifications in this PR.
> 
> **Overview**
> Publishes `packages/ai` version `0.46.1` by updating the release
manifest, `package.json` version, and `CHANGELOG.md`.
> 
> The changelog for `0.46.1` notes two bug fixes: moving online eval
scorer counters into the `eval.*` namespace and normalizing boolean
scores in `onlineEval` `scoresSummary`.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
90b0bd1. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants