Skip to content

Fix NaN in Results Overview when eval metric score is not a number #179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Copilot
Copy link

@Copilot Copilot AI commented Jun 20, 2025

When eval metrics return non-numeric scores (like "❓" for unknown results), the Results Overview table incorrectly shows "NaN" instead of the appropriate fallback value.

Problem

The issue occurs in the computeOverview function when:

  1. An eval metric returns a non-numeric response (e.g., "❓", "Unknown", etc.)
  2. The parseScore function returns undefined for these cases
  3. The scorer detection logic ms.some((m) => !isNaN(m.score)) incorrectly identifies these as numeric scorers
  4. The average calculation ms.reduce((total, m) => total + m.score, 0) / ms.length includes undefined values, resulting in NaN

Example

Before fix:

model | tests | tests positive | accuracy with eval
-- | -- | -- | --
qwen2.5:3b | 20 | 20 | 100
llama3.2:1b | 20 | 20 | NaN  ← Problem: should show count of OK results

After fix:

model | tests | tests positive | accuracy with eval
-- | -- | -- | --
qwen2.5:3b | 20 | 20 | 100
llama3.2:1b | 20 | 20 | 15   ← Fixed: shows count of OK results

Solution

Updated the scorer detection and calculation logic to:

  • Only consider metrics with valid numeric scores: typeof m.score === "number" && !isNaN(m.score)
  • Calculate averages using only the filtered numeric scores
  • Fall back to counting "ok" outcomes when no numeric scores are available

Testing

Verified the fix handles all scenarios correctly:

  • ✅ All numeric scores: No regression, averages calculated correctly
  • ✅ Mixed numeric/non-numeric: NaN issue resolved, averages only numeric scores
  • ✅ All non-numeric scores: Shows count of successful outcomes
  • ✅ Empty metrics: Handles gracefully

Fixes #174.

Warning

Firewall rules blocked me from connecting to one or more addresses

I tried to connect to the following addresses, but was blocked by firewall rules:

  • cdn.sheetjs.com
    • Triggering command: npm install (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] Results Overview shows NaN when eval metric score is not a number Fix NaN in Results Overview when eval metric score is not a number Jun 20, 2025
@Copilot Copilot AI requested a review from pelikhan June 20, 2025 20:37
Copilot finished work on behalf of pelikhan June 20, 2025 20:37
@pelikhan pelikhan requested a review from bzorn June 20, 2025 20:42
bzorn
bzorn previously approved these changes Jun 20, 2025
@pelikhan
Copy link
Member

pelikhan commented Jul 1, 2025

merge with latest 'dev' branch

@pelikhan pelikhan changed the base branch from main to dev July 1, 2025 19:49
- Merged dev branch containing Docker configurations, CI/CD improvements, and enhanced eval model support
- Resolved conflict in reports.mts by applying NaN fix to new multi-eval-model structure
- Preserved original NaN handling fix while adopting new evalModels functionality
Copy link
Author

Copilot AI commented Jul 1, 2025

merge with latest 'dev' branch

Done! Successfully merged the latest dev branch (92 commits) while preserving the NaN fix. Resolved merge conflict in reports.mts by applying the original fix to the new multi-eval-model structure. The fix now works with both single and multiple eval models. Commit: 10bd4a6

@Copilot Copilot AI requested a review from pelikhan July 1, 2025 19:54
Copilot finished work on behalf of pelikhan July 1, 2025 19:54
Base automatically changed from dev to main July 1, 2025 19:54
@pelikhan pelikhan dismissed bzorn’s stale review July 1, 2025 19:54

The base branch was changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Error loading sessions

Retrying...

Successfully merging this pull request may close these issues.

Results Overview shows NaN when eval metric score is not a number
3 participants