Feat: Row-level eval details by JeanKaddour · Pull Request #35 · PySpur-Dev/pyspur

JeanKaddour · 2024-12-02T18:08:50Z

This pull request implements row-level eval details. The changes involve modifications to the evaluation logic, updates to the frontend components, and the addition of a new results page.

Backend Changes:

Evaluation Logic Updates:
- Replaced task_id with example_id in async def evaluate_dataset_batch to better reflect the context of the evaluations. [1] [2] [3]
- Added full_prompt to the loop in async def evaluate_dataset_batch to include the full prompt in the evaluation.
- Updated the response storage and logging to use example_id instead of task_id.

Frontend Changes:

Component Cleanup:
- Removed unused imports from Header.jsx to clean up the code.
Evaluation Page Updates:
- Simplified the handleViewResults function in evals.jsx to navigate to a dedicated results page instead of using a modal.
- Removed the modal for displaying evaluation results from evals.jsx.
New Results Page:
- Added a new results page (evals/[id].js) to display evaluation results in a table format, including example ID, problem, predicted answer, ground truth, and correctness status. (frontend/src/pages/evals/[id].jsR1-R164)

JeanKaddour added 2 commits December 2, 2024 17:43

feat: initial table page

b950258

feat: add colored status

aa10a83

JeanKaddour merged commit 20b3a12 into main Dec 2, 2024

JeanKaddour deleted the feat/detailed_evals branch December 28, 2024 01:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Row-level eval details#35

Feat: Row-level eval details#35
JeanKaddour merged 2 commits intomainfrom
feat/detailed_evals

JeanKaddour commented Dec 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JeanKaddour commented Dec 2, 2024

Backend Changes:

Frontend Changes:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant