dev #138

pelikhan · 2025-06-02T23:46:11Z

No description provided.

* 🎛️ feat: switch evalModel to evalModelSet for test evaluation Replaces evalModel with evalModelSet, allowing multiple evaluation models. * ✨ feat: add multi-model evaluation to metrics and compliance Support evaluating metrics and compliance with multiple models via evalModelSet. * ✨ Refine evaluation model handling and debug logging Improved evalModelSet defaults, header levels, and debugging output. * ✨ Enhance evalModelSet sourcing and logging in promptpex Now supports sourcing evalModelSet from env var, adds validation, and logging. * ✨ refactor test metric evaluation and overview model handling Refined evalModelSet parsing and updated test metric iteration logic. * ✨ feat: Add combined avg metric across eval models Compute and store average metric score for all evaluation models used. * ✨ Enhance promptpex test evaluation and script logic Added separate eval-only/test-run modes, improved metric evaluations * ♻️ Rename evalModelSet to evalModel throughout codebase Standardizes config and variable naming from evalModelSet to evalModel.

…#141) * ✨ Enhance test results saving and eval metrics workflow Improved control of results file writing and evaluation metrics assignment. * ✨ Add evals config flag to control evaluation execution Introduces evals boolean for toggling evaluation of test results. * ✨: Enable direct context-loading from JSON files Refactored CLI to load PromptPexContext from JSON, updating file flow.

* ✨ Add scripts and logic for multi-stage sample evaluations Introduces zx scripts for gen/run/eval sample tests and conditional test executions. * 🔀 rename: Samples scripts renamed to .zx.mjs extensions All run-samples-*.mjs scripts updated to .zx.mjs for zx compatibility. * ♻️ refactor: Rename sample scripts to .zx.mjs extensions Updated script names in package.json and renamed a sample file for zx compatibility

Introduces groundtruth model option, result tracking, and output storage.

Extended PromptPexTest and PromptPexTestResult with groundtruth support.

Action

Add lmstudio to settings, expand UI model suggestions, tidy runTests.

✨ Add support for groundtruth model and outputs

Create genai-issue-labeller

* ✨ Enhance groundtruth evaluation with multiple models Added support for evaluating groundtruth with multiple eval models. * Update src/genaisrc/src/promptpex.mts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * ✨ Enable multiple eval groundtruth models and results merging Add support for evaluating with multiple groundtruth models and merging results. * ✨ Update eval paths and enhance JSON context logging Refined script paths in package.json and improved debug info for JSON context. * ✨ Refine metrics reporting and output handling logic Metric keys now include model names; output directories improved. * ✨: Display groundtruth eval results in output table Show filtered groundtruth eval results in the test output section. * 🔥 refactor: Removed grounding fields from PromptPexTestResult Eliminated isGrounded and groundedText fields for streamlined interface. --------- Co-authored-by: Peli de Halleux <pelikhan@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* ✨ feat: Enable separate groundtruth metric evaluation path Adds support for filtering and evaluating groundtruth metrics via new prompt. * ✨ feat: Add groundtruthScore based on evalModels averages Groundtruth metrics are now computed and averaged per test result. * ✨ Debug and improve groundtruth metrics computation logic Log groundtruth metrics, fix average scoring, and enhance debug output. * ✨ refactor groundtruth evaluation and add retry logic Extracted groundtruth scoring to a helper and added retries for low scores. * 🚦 feat: introduce configurable groundtruth thresholds Added constants for groundtruth thresholds and retries in constants.mts. Updated testrun.mts to use these values, improving flexibility in test groundtruth score evaluation and retry handling.

Introduces a detailed overview of PromptPex's test groundtruth flow.

Groundtruth scores are now tracked for tests, with improved debug output.

pelikhan and others added 30 commits June 2, 2025 23:45

model names

e543499

removed pull

e5fad83

getting started on github models support

796b27c

passing test data

0057f5b

✨ Add support for groundtruth model and outputs

eb31c1d

Introduces groundtruth model option, result tracking, and output storage.

upgrade deps

44cf117

migrate to node v22

2152903

wiring up action

bb597eb

add files argument

dc5c4e4

✨ feat: add groundtruth fields to test data pipeline

7d5ab36

Extended PromptPexTest and PromptPexTestResult with groundtruth support.

use promt for noe

f425a16

define action

db6d3ea

Merge remote-tracking branch 'origin/main' into action

3e3d469

fid build

a60970b

fix test

9d4c99d

Merge pull request #147 from microsoft/action

3c4c7c8

Action

fix build

e0aedb2

Merge remote-tracking branch 'origin/dev' into add-ground-truth

3af89be

cleanup

0b4440b

✨ Enhance model suggestions and test config options

1ebed36

Add lmstudio to settings, expand UI model suggestions, tidy runTests.

integrate groundtruth in run test

2d400f7

Merge pull request #146 from microsoft/add-ground-truth

1c18cd0

✨ Add support for groundtruth model and outputs

Merge remote-tracking branch 'origin/dev' into githubmodels

5a1437d

Create genai-issue-labeller

1b8ce71

Merge pull request #150 from microsoft/pelikhan-patch-1

cfe006f

Create genai-issue-labeller

Merge remote-tracking branch 'origin/dev' into githubmodels

4e57915

updated typenames

fa1daad

pelikhan and others added 30 commits June 11, 2025 13:18

set default aliases

bdfa393

docs

a5fe1f9

📝 docs: Add comprehensive Test Groundtruth reference docs (#164)

8d73129

Introduces a detailed overview of PromptPex's test groundtruth flow.

always generate a github-models output

2ee522a

store reasoning in reports, generated tests

79931cb

action test

320979b

removed duplicate ci

fe3573d

set env on tests

7a08e9e

add test dependency

f624c94

upgrade version

2ee5857

node v22

9027032

remove a few workflows

26f81f3

debug log

16a2cde

merge workflows

982c342

update test script to specify model for consistency

8f257e6

model type

b4d9f2b

update permissions to include models read access

d5f6097

lower noise

5754b73

smaller model for testing

6a7f0ba

full path to prompt

45db082

cache

d55917d

fix: handle undefined originalPrompt in toModelsPrompt function

57482e1

per workflow cache

8966fb3

fix: update release.sh file permissions to executable

5a4d189

update release

1fdb100

fix: update dependencies for genaiscript and openai to latest versions

f4c481c

chore: bump version to 0.0.11

ea67aa6

✨: Add groundtruthScore to test results and improve logging (#166)

c4efacb

Groundtruth scores are now tracked for tests, with improved debug output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dev #138

dev #138

Uh oh!

pelikhan commented Jun 2, 2025

Uh oh!

Uh oh!

dev #138

Are you sure you want to change the base?

dev #138

Uh oh!

Conversation

pelikhan commented Jun 2, 2025

Uh oh!

Uh oh!