dev #138

pelikhan · 2025-06-02T23:46:11Z

No description provided.

* 🎛️ feat: switch evalModel to evalModelSet for test evaluation Replaces evalModel with evalModelSet, allowing multiple evaluation models. * ✨ feat: add multi-model evaluation to metrics and compliance Support evaluating metrics and compliance with multiple models via evalModelSet. * ✨ Refine evaluation model handling and debug logging Improved evalModelSet defaults, header levels, and debugging output. * ✨ Enhance evalModelSet sourcing and logging in promptpex Now supports sourcing evalModelSet from env var, adds validation, and logging. * ✨ refactor test metric evaluation and overview model handling Refined evalModelSet parsing and updated test metric iteration logic. * ✨ feat: Add combined avg metric across eval models Compute and store average metric score for all evaluation models used. * ✨ Enhance promptpex test evaluation and script logic Added separate eval-only/test-run modes, improved metric evaluations * ♻️ Rename evalModelSet to evalModel throughout codebase Standardizes config and variable naming from evalModelSet to evalModel.

…#141) * ✨ Enhance test results saving and eval metrics workflow Improved control of results file writing and evaluation metrics assignment. * ✨ Add evals config flag to control evaluation execution Introduces evals boolean for toggling evaluation of test results. * ✨: Enable direct context-loading from JSON files Refactored CLI to load PromptPexContext from JSON, updating file flow.

* ✨ Add scripts and logic for multi-stage sample evaluations Introduces zx scripts for gen/run/eval sample tests and conditional test executions. * 🔀 rename: Samples scripts renamed to .zx.mjs extensions All run-samples-*.mjs scripts updated to .zx.mjs for zx compatibility. * ♻️ refactor: Rename sample scripts to .zx.mjs extensions Updated script names in package.json and renamed a sample file for zx compatibility

Introduces groundtruth model option, result tracking, and output storage.

Extended PromptPexTest and PromptPexTestResult with groundtruth support.

Action

Add lmstudio to settings, expand UI model suggestions, tidy runTests.

✨ Add support for groundtruth model and outputs

Create genai-issue-labeller

Groundtruth scores are now tracked for tests, with improved debug output.

* ♻️ clean: Auto-hide zero-filled columns, add llama3.3 to tests Overview report now drops zero/empty cols; llama3.3 added to runtests. * ✨ Refine promptpex groundtruth checks and report handling Improve groundtruth test generation logic and overview reporting filters.

…o 1.142.15

…ation

…t execution

…date API operation ID sample

…imports

…dtruth documentation links

Expanded glossary and updated diagrams to standardize GTM terms.

* feat: add groundtruth option and related parameters for test generation * feat: add model alias for groundtruth evaluation * feat: add model_under_test alias and update related logic in prompt generation * feat: update groundtruth model handling and rename constants for clarity * Update src/genaisrc/src/testrun.mts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* feat: add implementation plan documentation for PromptPex framework * docs: enhance implementation plan with validation steps for test generation * new plan * docs: update implementation plan phases and add additional features

* ✨ Label tests with unique IDs and propagate testuid Added unique testuid to each test and test result; updated logic to use it. * ✨ add testuid to test run output and update indexing logic Test run data now includes testuid; testuid index starts from 0. * ✨: Unleash Unique IDs in PromptPex Tests with nanoid Integrated nanoid for generating unique, consistent test UIDs. * ✨ Fix testuid template and ensure strict equality in search Corrected testuid generation format and used strict equality for lookup. * Update src/genaisrc/src/testevalmetric.mts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* ✨ add groundtruth test output support to promptpex Introduce groundtruth test results file loading and parsing support. * ✏️ fix typo in PromptPexContext groundtruth comment Corrected 'Groudtruth' to 'Groundtruth' in the documentation comment. * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Peli de Halleux <pelikhan@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

pelikhan and others added 30 commits June 2, 2025 23:45

model names

e543499

removed pull

e5fad83

getting started on github models support

796b27c

passing test data

0057f5b

✨ Add support for groundtruth model and outputs

eb31c1d

Introduces groundtruth model option, result tracking, and output storage.

upgrade deps

44cf117

migrate to node v22

2152903

wiring up action

bb597eb

add files argument

dc5c4e4

✨ feat: add groundtruth fields to test data pipeline

7d5ab36

Extended PromptPexTest and PromptPexTestResult with groundtruth support.

use promt for noe

f425a16

define action

db6d3ea

Merge remote-tracking branch 'origin/main' into action

3e3d469

fid build

a60970b

fix test

9d4c99d

Merge pull request #147 from microsoft/action

3c4c7c8

Action

fix build

e0aedb2

Merge remote-tracking branch 'origin/dev' into add-ground-truth

3af89be

cleanup

0b4440b

✨ Enhance model suggestions and test config options

1ebed36

Add lmstudio to settings, expand UI model suggestions, tidy runTests.

integrate groundtruth in run test

2d400f7

Merge pull request #146 from microsoft/add-ground-truth

1c18cd0

✨ Add support for groundtruth model and outputs

Merge remote-tracking branch 'origin/dev' into githubmodels

5a1437d

Create genai-issue-labeller

1b8ce71

Merge pull request #150 from microsoft/pelikhan-patch-1

cfe006f

Create genai-issue-labeller

Merge remote-tracking branch 'origin/dev' into githubmodels

4e57915

updated typenames

fa1daad

pelikhan and others added 30 commits June 17, 2025 17:44

fix: update release.sh file permissions to executable

5a4d189

update release

1fdb100

fix: update dependencies for genaiscript and openai to latest versions

f4c481c

chore: bump version to 0.0.11

ea67aa6

✨: Add groundtruthScore to test results and improve logging (#166)

c4efacb

Groundtruth scores are now tracked for tests, with improved debug output.

feat: update groundtruth documentation and bump genaiscript version t…

e71173d

…o 1.142.15

feat: add Groundtruth card to documentation for expected output gener…

6a00094

…ation

support for authors

d186f93

feat: allow 'serve' command in addition to 'configure' for genaiscrip…

c775b1e

…t execution

feat: add support for 'serve' command in genaiscript execution and up…

f6460cc

…date API operation ID sample

better error message on missing run

4cd6471

refactor: remove unused renderEvaluation and renderEvaluationOutcome …

1f633e2

…imports

feat: implement parseStrings function for flexible string parsing

78d9f5c

use .bin path

fb74bcd

feat: add front matter schema for prompty definition and update groun…

72ddf67

…dtruth documentation links

refactor: remove npm test step from CI workflow

f014ad8

chore: bump version to 0.0.12

77f3afa

📝 docs: Add and clarify Ground Truth test terminology (#172)

5714aaf

Expanded glossary and updated diagrams to standardize GTM terms.

feat: add GenAI Pull Request Descriptor workflow

89916ad

refactor: update groundtruth handling and improve code clarity

3fe8cd7

refactor: improve code formatting and enhance groundtruth handling

dfe7942

more options work

b252280

fix: update groundtruth assignment in evaluateTestMetric function

109a56a

Planner (#176)

e1f4af3

* feat: add implementation plan documentation for PromptPex framework * docs: enhance implementation plan with validation steps for test generation * new plan * docs: update implementation plan phases and add additional features

refactor: update documentation structure and add implementation plan

7cbe0b3

fix: update implementation plan link in documentation

46d6571

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dev #138

dev #138

Uh oh!

pelikhan commented Jun 2, 2025

Uh oh!

Uh oh!

dev #138

Are you sure you want to change the base?

dev #138

Uh oh!

Conversation

pelikhan commented Jun 2, 2025

Uh oh!

Uh oh!