Skip to content

Pass LLM as a judge on the full transcript#1605

Merged
dgageot merged 1 commit intodocker:mainfrom
dgageot:judge-full-transcript
Feb 5, 2026
Merged

Pass LLM as a judge on the full transcript#1605
dgageot merged 1 commit intodocker:mainfrom
dgageot:judge-full-transcript

Conversation

@dgageot
Copy link
Member

@dgageot dgageot commented Feb 5, 2026

No description provided.

Signed-off-by: David Gageot <david.gageot@docker.com>
@dgageot dgageot requested a review from a team as a code owner February 5, 2026 09:04
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

I reviewed the code changes in this PR and found the implementation to be generally well-structured with comprehensive test coverage. The buildTranscript function correctly preserves temporal ordering of events for LLM evaluation.

However, I identified one issue with defensive programming consistency that should be addressed.

func getToolCallInfo(event map[string]any) (name, args string) {
tc, _ := event["tool_call"].(map[string]any)
fn, _ := tc["function"].(map[string]any)
name, _ = fn["name"].(string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe type assertions that silently fail on malformed events

The getToolCallInfo function uses unchecked type assertions with the blank identifier _, which means it will silently return empty strings if the event structure is malformed:

tc, _ := event["tool_call"].(map[string]any)
fn, _ := tc["function"].(map[string]any)

While Go handles nil map access safely (won't panic), this produces confusing output like [Agent root calls tool "" with arguments: ] when events are malformed, instead of indicating a data problem.

This is inconsistent with the defensive programming style used in parseContainerEvents (lines 481-489) which properly checks assertion results:

if tc, ok := event["tool_call"].(map[string]any); ok {
    if fn, ok := tc["function"].(map[string]any); ok {
        // ...
    }
}

Recommendation: Use the comma-ok idiom to check type assertions and either return empty strings explicitly when checks fail, or add a comment explaining that empty strings are the intended behavior for malformed events.

@dgageot dgageot merged commit 89c666d into docker:main Feb 5, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants