[evals] Fix tool calls score rendering by dgageot · Pull Request #1613 · docker/cagent

dgageot · 2026-02-05T14:12:53Z

No description provided.

Signed-off-by: David Gageot <david.gageot@docker.com>

github-actions

Review Summary

Found 1 issue with precision loss in the Relevance metric display. The casting from float64 to int truncates fractional values, which is inconsistent with how these values are stored and accumulated.

github-actions · 2026-02-05T14:14:19Z

pkg/evaluation/scoring.go

 	printMetric(out, "Sizes", summary.SizesPassed, summary.SizesTotal)
-	printMetricFloat(out, "Tool Calls", summary.ToolsPassed, summary.ToolsTotal)
+	printF1Score(out, "Tool Calls", summary.ToolsF1Sum, summary.ToolsCount)
 	printMetric(out, "Handoffs", summary.HandoffsPassed, summary.HandoffsTotal)


Loss of precision when casting Relevance scores from float64 to int

The RelevancePassed and RelevanceTotal fields in the Summary struct are defined as float64 (types.go lines 99-100) and can contain fractional values from accumulated relevance scores. Casting them to int here truncates decimal portions, losing precision.

This is inconsistent with:

How the Tool Calls metric preserves float precision (line 120 uses printF1Score)

How the underlying values are stored (float64 in the Summary struct)

How relevance scores are accumulated (can be fractional)

For example, a RelevancePassed value of 8.5 would display as 8, making the displayed metric inconsistent with the actual accumulated scores.

Suggestion: If you want to maintain float precision for Relevance like Tool Calls, consider creating a similar display function or using the previous printMetricFloat approach. If integer display is intentional, consider documenting why relevance is treated differently than other float-based metrics.

Fix tool calls score rendering

57ab075

Signed-off-by: David Gageot <david.gageot@docker.com>

dgageot requested a review from a team as a code owner February 5, 2026 14:12

dgageot changed the title ~~Fix tool calls score rendering~~ [evals] Fix tool calls score rendering Feb 5, 2026

rumpl approved these changes Feb 5, 2026

View reviewed changes

github-actions bot reviewed Feb 5, 2026

View reviewed changes

dgageot merged commit 69fa161 into docker:main Feb 5, 2026
8 checks passed

BrewTestBot mentioned this pull request Feb 7, 2026

cagent 1.20.6 Homebrew/homebrew-core#266303

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[evals] Fix tool calls score rendering#1613

[evals] Fix tool calls score rendering#1613
dgageot merged 1 commit intodocker:mainfrom
dgageot:improve-evals-4

dgageot commented Feb 5, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot Feb 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dgageot commented Feb 5, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Review Summary

Uh oh!

github-actions bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants