Fix NaN in Results Overview when eval metric score is not a number #179

Copilot · 2025-06-20T20:27:06Z

When eval metrics return non-numeric scores (like "❓" for unknown results), the Results Overview table incorrectly shows "NaN" instead of the appropriate fallback value.

Problem

The issue occurs in the computeOverview function when:

An eval metric returns a non-numeric response (e.g., "❓", "Unknown", etc.)
The parseScore function returns undefined for these cases
The scorer detection logic ms.some((m) => !isNaN(m.score)) incorrectly identifies these as numeric scorers
The average calculation ms.reduce((total, m) => total + m.score, 0) / ms.length includes undefined values, resulting in NaN

Example

Before fix:

model | tests | tests positive | accuracy with eval
-- | -- | -- | --
qwen2.5:3b | 20 | 20 | 100
llama3.2:1b | 20 | 20 | NaN  ← Problem: should show count of OK results

After fix:

model | tests | tests positive | accuracy with eval
-- | -- | -- | --
qwen2.5:3b | 20 | 20 | 100
llama3.2:1b | 20 | 20 | 15   ← Fixed: shows count of OK results

Solution

Updated the scorer detection and calculation logic to:

Only consider metrics with valid numeric scores: typeof m.score === "number" && !isNaN(m.score)
Calculate averages using only the filtered numeric scores
Fall back to counting "ok" outcomes when no numeric scores are available

Testing

Verified the fix handles all scenarios correctly:

✅ All numeric scores: No regression, averages calculated correctly
✅ Mixed numeric/non-numeric: NaN issue resolved, averages only numeric scores
✅ All non-numeric scores: Shows count of successful outcomes
✅ Empty metrics: Handles gracefully

Fixes #174.

Warning

Firewall rules blocked me from connecting to one or more addresses

I tried to connect to the following addresses, but was blocked by firewall rules:

cdn.sheetjs.com
- Triggering command: npm install (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to my firewall allow list

💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

pelikhan · 2025-07-01T19:49:45Z

merge with latest 'dev' branch

- Merged dev branch containing Docker configurations, CI/CD improvements, and enhanced eval model support - Resolved conflict in reports.mts by applying NaN fix to new multi-eval-model structure - Preserved original NaN handling fix while adopting new evalModels functionality

Copilot · 2025-07-01T19:53:51Z

merge with latest 'dev' branch

Done! Successfully merged the latest dev branch (92 commits) while preserving the NaN fix. Resolved merge conflict in reports.mts by applying the original fix to the new multi-eval-model structure. The fix now works with both single and multiple eval models. Commit: 10bd4a6

The base branch was changed.

Initial plan for issue

438e004

Copilot AI assigned Copilot and pelikhan Jun 20, 2025

Copilot started work on behalf of pelikhan June 20, 2025 20:27 View session

Fix NaN in Results Overview when eval metric score is not a number

2f4870b

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Results Overview shows NaN when eval metric score is not a number~~ Fix NaN in Results Overview when eval metric score is not a number Jun 20, 2025

Copilot AI requested a review from pelikhan June 20, 2025 20:37

Copilot finished work on behalf of pelikhan June 20, 2025 20:37

pelikhan approved these changes Jun 20, 2025

View reviewed changes

pelikhan requested a review from bzorn June 20, 2025 20:42

bzorn previously approved these changes Jun 20, 2025

View reviewed changes

Copilot started work on behalf of pelikhan July 1, 2025 19:49 View session

pelikhan changed the base branch from main to dev July 1, 2025 19:49

Copilot AI requested a review from pelikhan July 1, 2025 19:54

Copilot finished work on behalf of pelikhan July 1, 2025 19:54

Base automatically changed from dev to main July 1, 2025 19:54

pelikhan closed this Jul 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix NaN in Results Overview when eval metric score is not a number #179

Fix NaN in Results Overview when eval metric score is not a number #179

Uh oh!

Copilot AI commented Jun 20, 2025 •

edited

Loading

Uh oh!

pelikhan commented Jul 1, 2025

Uh oh!

Copilot AI commented Jul 1, 2025

Uh oh!

Uh oh!

Fix NaN in Results Overview when eval metric score is not a number #179

Fix NaN in Results Overview when eval metric score is not a number #179

Uh oh!

Conversation

Copilot AI commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Example

Solution

Testing

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

pelikhan commented Jul 1, 2025

Uh oh!

Copilot AI commented Jul 1, 2025

Uh oh!

Uh oh!

Copilot AI commented Jun 20, 2025 •

edited

Loading