Eval metadata #1092

miguelg719 · 2025-09-23T00:26:34Z

why

To help make sense of eval test cases and results

what changed

Added metadata to eval runs, cleaned deprecated code

test plan

changeset-bot · 2025-09-23T00:26:38Z

⚠️ No Changeset found

Latest commit: 904bc53

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

greptile-apps

Greptile Summary

This PR enhances the evaluation system with comprehensive error handling and metadata improvements. The changes introduce a structured ErrorType enum for better error categorization, expand the EvalOutput interface with additional metadata fields, and improve task loading with better error reporting. The PR also adds process-level error handlers and enhances resource cleanup with safer stagehand closing logic.

Key improvements:

Enhanced error categorization with specific error types (timeout, network, parsing, etc.)
Better task loading logic with fallback paths for .js and .ts files
Improved metadata tracking with category information in testcases
Safer resource cleanup with proper stagehand instance management
More comprehensive error reporting with stack traces and error details

Confidence score: 4/5

This PR is generally safe to merge with proper testing
Score reflects solid improvements to error handling and metadata tracking, with only one minor issue around deterministic file loading that should be addressed
evals/index.eval.ts needs attention for the task loading logic

_{2 files reviewed, 1 comment}

_{Edit Code Review Bot Settings | Greptile}

eval metadata updates

f05c62d

miguelg719 changed the base branch from agent-evals-tracing to main September 23, 2025 00:27

prettier

4a2c144

miguelg719 marked this pull request as ready for review September 23, 2025 00:28

remove unnecessary filter

bb5af20

greptile-apps bot reviewed Sep 23, 2025

View reviewed changes

runners

904bc53

seanmcguire12 approved these changes Sep 23, 2025

View reviewed changes

miguelg719 merged commit f89b13e into main Sep 23, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval metadata #1092

Eval metadata #1092

miguelg719 commented Sep 23, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Sep 23, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Eval metadata #1092

Eval metadata #1092

Conversation

miguelg719 commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

why

what changed

test plan

Uh oh!

changeset-bot bot commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 4/5

Uh oh!

Uh oh!

Uh oh!

miguelg719 commented Sep 23, 2025 •

edited

Loading

changeset-bot bot commented Sep 23, 2025 •

edited

Loading