Skip to content

Produce a session db for the evals#1583

Merged
dgageot merged 1 commit intodocker:mainfrom
dgageot:eval-sessions
Feb 4, 2026
Merged

Produce a session db for the evals#1583
dgageot merged 1 commit intodocker:mainfrom
dgageot:eval-sessions

Conversation

@dgageot
Copy link
Member

@dgageot dgageot commented Feb 4, 2026

Now, when we run cagent eval, it'll produce a full session store with all the sessions.
That makes it super easy to go back to an evaluation session and investigate what went right or wrong.

Signed-off-by: David Gageot <david.gageot@docker.com>
@dgageot dgageot requested a review from a team as a code owner February 4, 2026 12:35
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Found one issue with timestamp handling in the new SessionFromEvents function. All reconstructed messages use time.Now() which assigns the database creation time rather than preserving when events actually occurred. This reduces the utility of the evaluation session database for timing analysis.

The core functionality looks good otherwise - the database creation, session reconstruction, and error handling are all working correctly.

ToolDefinitions: currentToolDefinitions,
CreatedAt: time.Now().Format(time.RFC3339),
Model: currentModel,
Usage: currentUsage,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISSUE: Message timestamps use time.Now() instead of preserving event timing

The SessionFromEvents function reconstructs sessions from historical evaluation events, but uses time.Now() for message timestamps. This means:

  1. All messages get the timestamp of when the database was created, not when events actually occurred
  2. Messages that happened seconds/minutes apart during execution will appear simultaneous
  3. Temporal ordering and latency analysis is lost

This affects three locations:

  • Line 92 (this location): Assistant messages in flushAssistantMessage()
  • Line 161: Tool call response messages
  • Line 201: Error messages

Since runtime events don't contain timestamp information (verified in pkg/runtime/event.go), consider either:

  • Adding timestamps to runtime events, or
  • Using incrementing timestamps based on event sequence to preserve ordering, or
  • Using eval run start time + offset based on position

While this doesn't break functionality, it does reduce the debugging value of the eval session database for investigating timing and latency issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that our runtime events don't have a timing for now. I'll add this to my TODO

@dgageot dgageot merged commit 21cff5d into docker:main Feb 4, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants