Skip to content

Conversation

miguelg719
Copy link
Collaborator

@miguelg719 miguelg719 commented Sep 23, 2025

why

To help make sense of eval test cases and results

what changed

Added metadata to eval runs, cleaned deprecated code

test plan

Copy link

changeset-bot bot commented Sep 23, 2025

⚠️ No Changeset found

Latest commit: 904bc53

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@miguelg719 miguelg719 changed the base branch from agent-evals-tracing to main September 23, 2025 00:27
@miguelg719 miguelg719 marked this pull request as ready for review September 23, 2025 00:28
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR enhances the evaluation system with comprehensive error handling and metadata improvements. The changes introduce a structured ErrorType enum for better error categorization, expand the EvalOutput interface with additional metadata fields, and improve task loading with better error reporting. The PR also adds process-level error handlers and enhances resource cleanup with safer stagehand closing logic.

Key improvements:

  • Enhanced error categorization with specific error types (timeout, network, parsing, etc.)
  • Better task loading logic with fallback paths for .js and .ts files
  • Improved metadata tracking with category information in testcases
  • Safer resource cleanup with proper stagehand instance management
  • More comprehensive error reporting with stack traces and error details

Confidence score: 4/5

  • This PR is generally safe to merge with proper testing
  • Score reflects solid improvements to error handling and metadata tracking, with only one minor issue around deterministic file loading that should be addressed
  • evals/index.eval.ts needs attention for the task loading logic

2 files reviewed, 1 comment

Edit Code Review Bot Settings | Greptile

@miguelg719 miguelg719 merged commit f89b13e into main Sep 23, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants