Skip to content

dev #138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 157 commits into
base: main
Choose a base branch
from
Open

dev #138

wants to merge 157 commits into from

Conversation

pelikhan
Copy link
Member

@pelikhan pelikhan commented Jun 2, 2025

No description provided.

pelikhan and others added 30 commits June 2, 2025 23:45
* 🎛️ feat: switch evalModel to evalModelSet for test evaluation

Replaces evalModel with evalModelSet, allowing multiple evaluation models.

* ✨ feat: add multi-model evaluation to metrics and compliance

Support evaluating metrics and compliance with multiple models via evalModelSet.

* ✨ Refine evaluation model handling and debug logging

Improved evalModelSet defaults, header levels, and debugging output.

* ✨ Enhance evalModelSet sourcing and logging in promptpex

Now supports sourcing evalModelSet from env var, adds validation, and logging.

* ✨ refactor test metric evaluation and overview model handling

Refined evalModelSet parsing and updated test metric iteration logic.

* ✨ feat: Add combined avg metric across eval models

Compute and store average metric score for all evaluation models used.

* ✨ Enhance promptpex test evaluation and script logic

Added separate eval-only/test-run modes, improved metric evaluations

* ♻️ Rename evalModelSet to evalModel throughout codebase

Standardizes config and variable naming from evalModelSet to evalModel.
…#141)

* ✨ Enhance test results saving and eval metrics workflow

Improved control of results file writing and evaluation metrics assignment.

* ✨ Add evals config flag to control evaluation execution

Introduces evals boolean for toggling evaluation of test results.

* ✨: Enable direct context-loading from JSON files

Refactored CLI to load PromptPexContext from JSON, updating file flow.
* ✨ Add scripts and logic for multi-stage sample evaluations

Introduces zx scripts for gen/run/eval sample tests and conditional test executions.

* 🔀 rename: Samples scripts renamed to .zx.mjs extensions

All run-samples-*.mjs scripts updated to .zx.mjs for zx compatibility.

* ♻️ refactor: Rename sample scripts to .zx.mjs extensions

Updated script names in package.json and renamed a sample file for zx compatibility
Introduces groundtruth model option, result tracking, and output storage.
Extended PromptPexTest and PromptPexTestResult with groundtruth support.
Add lmstudio to settings, expand UI model suggestions, tidy runTests.
✨ Add support for groundtruth model and outputs
pelikhan and others added 30 commits June 17, 2025 17:44
Groundtruth scores are now tracked for tests, with improved debug output.
* ♻️ clean: Auto-hide zero-filled columns, add llama3.3 to tests

Overview report now drops zero/empty cols; llama3.3 added to runtests.

* ✨ Refine promptpex groundtruth checks and report handling

Improve groundtruth test generation logic and overview reporting filters.
Expanded glossary and updated diagrams to standardize GTM terms.
* feat: add groundtruth option and related parameters for test generation

* feat: add model alias for groundtruth evaluation

* feat: add model_under_test alias and update related logic in prompt generation

* feat: update groundtruth model handling and rename constants for clarity

* Update src/genaisrc/src/testrun.mts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat: add implementation plan documentation for PromptPex framework

* docs: enhance implementation plan with validation steps for test generation

* new plan

* docs: update implementation plan phases and add additional features
* ✨ Label tests with unique IDs and propagate testuid

Added unique testuid to each test and test result; updated logic to use it.

* ✨ add testuid to test run output and update indexing logic

Test run data now includes testuid; testuid index starts from 0.

* ✨: Unleash Unique IDs in PromptPex Tests with nanoid

Integrated nanoid for generating unique, consistent test UIDs.

* ✨ Fix testuid template and ensure strict equality in search

Corrected testuid generation format and used strict equality for lookup.

* Update src/genaisrc/src/testevalmetric.mts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* ✨ add groundtruth test output support to promptpex

Introduce groundtruth test results file loading and parsing support.

* ✏️ fix typo in PromptPexContext groundtruth comment
Corrected 'Groudtruth' to 'Groundtruth' in the documentation comment.

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Peli de Halleux <pelikhan@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants