Skip to content

Fix NaN in Results Overview when eval metric score is not a number #179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Copilot
Copy link

@Copilot Copilot AI commented Jun 20, 2025

When eval metrics return non-numeric scores (like "❓" for unknown results), the Results Overview table incorrectly shows "NaN" instead of the appropriate fallback value.

Problem

The issue occurs in the computeOverview function when:

  1. An eval metric returns a non-numeric response (e.g., "❓", "Unknown", etc.)
  2. The parseScore function returns undefined for these cases
  3. The scorer detection logic ms.some((m) => !isNaN(m.score)) incorrectly identifies these as numeric scorers
  4. The average calculation ms.reduce((total, m) => total + m.score, 0) / ms.length includes undefined values, resulting in NaN

Example

Before fix:

model | tests | tests positive | accuracy with eval
-- | -- | -- | --
qwen2.5:3b | 20 | 20 | 100
llama3.2:1b | 20 | 20 | NaN  ← Problem: should show count of OK results

After fix:

model | tests | tests positive | accuracy with eval
-- | -- | -- | --
qwen2.5:3b | 20 | 20 | 100
llama3.2:1b | 20 | 20 | 15   ← Fixed: shows count of OK results

Solution

Updated the scorer detection and calculation logic to:

  • Only consider metrics with valid numeric scores: typeof m.score === "number" && !isNaN(m.score)
  • Calculate averages using only the filtered numeric scores
  • Fall back to counting "ok" outcomes when no numeric scores are available

Testing

Verified the fix handles all scenarios correctly:

  • ✅ All numeric scores: No regression, averages calculated correctly
  • ✅ Mixed numeric/non-numeric: NaN issue resolved, averages only numeric scores
  • ✅ All non-numeric scores: Shows count of successful outcomes
  • ✅ Empty metrics: Handles gracefully

Fixes #174.

Warning

Firewall rules blocked me from connecting to one or more addresses

I tried to connect to the following addresses, but was blocked by firewall rules:

  • cdn.sheetjs.com
    • Triggering command: npm install (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] Results Overview shows NaN when eval metric score is not a number Fix NaN in Results Overview when eval metric score is not a number Jun 20, 2025
@Copilot Copilot AI requested a review from pelikhan June 20, 2025 20:37
Copilot finished work on behalf of pelikhan June 20, 2025 20:37
@pelikhan pelikhan requested a review from bzorn June 20, 2025 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Results Overview shows NaN when eval metric score is not a number
3 participants