PDX-470/471/473: feat(mcp): add detail level, diff mode, and completeness score to validation tools by mrdailey99 · Pull Request #168 · ProvarTesting/provardx-cli

mrdailey99 · 2026-05-13T14:58:16Z

Summary

PDX-470: Add detail=summary|standard|full parameter to all four structured-response validation tools (provar_testsuite_validate, provar_testplan_validate, provar_testcase_validate, provar_project_validate). summary returns only key scores and the stop signal — eliminates token explosion on repeat calls.
PDX-471: Add baseline_run_id diff mode to provar_testsuite_validate and provar_project_validate. Each validation call saves a violation snapshot keyed by run_id; subsequent calls with baseline_run_id return only {added, resolved, unchanged_count} instead of the full inventory.
PDX-473: Add completeness_score (0–100) and recommended_next_action (stop | inspect_failures | fix_and_revalidate) to all four tools. Agents can use recommended_next_action=stop as an unambiguous signal to end the fix-validate loop.

New shared utilities

File	Exports
`src/mcp/utils/detailLevel.ts`	`applyDetailLevel`, `DetailLevel`
`src/mcp/utils/validationScore.ts`	`calcCompletenessScore`, `calcNextAction`
`src/mcp/utils/validationDiff.ts`	`generateRunId`, `saveRun`, `hasAnyRun`, `loadBaselineViolations`, `computeDiff`

Test plan

validationScore.test.ts — 8 tests for calcCompletenessScore and calcNextAction
validationDiff.test.ts — 13 tests for run persistence, cap-at-20 eviction, and diff computation
testSuiteValidate.test.ts — 30 tests (19 existing + 11 new for PDX-470/471/473)
testPlanValidate.test.ts — 28 tests (19 existing + 9 new for PDX-470/473)
All 79 tests pass locally in the isolated worktree

🤖 Generated with Claude Code

github-actions · 2026-05-13T14:58:42Z

Quality Orchestrator

🟢 LOW · 24 / 100 · Touches: utils. All changed files have mapped tests.

🧪 Tests to Run · Running 7 of 47 tests

unit/mcp/projectValidateFromPath.test.ts
unit/mcp/testCaseValidate.test.ts
unit/mcp/testPlanValidate.test.ts
unit/mcp/testSuiteValidate.test.ts
unit/mcp/detailLevel.test.ts
unit/mcp/validationDiff.test.ts
unit/mcp/validationScore.test.ts

▶ Run command

npx vitest run \
  unit/mcp/projectValidateFromPath.test.ts \
  unit/mcp/testCaseValidate.test.ts \
  unit/mcp/testPlanValidate.test.ts \
  unit/mcp/testSuiteValidate.test.ts \
  unit/mcp/detailLevel.test.ts \
  unit/mcp/validationDiff.test.ts \
  unit/mcp/validationScore.test.ts

_{⚡ quality-orchestrator · /qo stub <file> · qo analyze-local}

Copilot

Pull request overview

Adds response shaping, validation scoring, and diff-run persistence utilities for MCP validation tools so agents can request smaller responses, compare runs, and decide whether to continue validation loops.

Changes:

Adds detail response levels and completeness_score / recommended_next_action fields across validation tools.
Adds run snapshot persistence and diff computation utilities, with baseline diff mode wired into suite, testcase, and project validation.
Adds unit tests for score/diff utilities plus suite/plan response changes.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
`src/mcp/utils/detailLevel.ts`	Adds shared summary/standard/full response filtering helper.
`src/mcp/utils/validationScore.ts`	Adds completeness score and next-action helpers.
`src/mcp/utils/validationDiff.ts`	Adds run id generation, snapshot persistence, baseline loading, and diff computation.
`src/mcp/tools/testSuiteValidate.ts`	Wires detail level, run ids, diff mode, and next-action scoring into suite validation.
`src/mcp/tools/testPlanValidate.ts`	Wires detail level and next-action scoring into plan validation.
`src/mcp/tools/testCaseValidate.ts`	Wires detail level, run ids, diff mode, and next-action scoring into testcase validation.
`src/mcp/tools/projectValidateFromPath.ts`	Wires detail level, run ids, diff mode, and next-action scoring into project validation.
`src/mcp/tools/descHelper.ts`	Adds an environment-driven description helper.
`test/unit/mcp/validationScore.test.ts`	Adds unit coverage for scoring and next-action helpers.
`test/unit/mcp/validationDiff.test.ts`	Adds unit coverage for run persistence and diff helper behavior.
`test/unit/mcp/testSuiteValidate.test.ts`	Adds suite validation tests for detail, score/action, and baseline diff behavior.
`test/unit/mcp/testPlanValidate.test.ts`	Adds plan validation tests for detail and score/action behavior.

Comments suppressed due to low confidence (3)

src/mcp/tools/testSuiteValidate.ts:172

hasAnyRun(storageDir) is evaluated after saveRun() has already written the current run, so it will be true even on the first validation for this suite. A failing first run will therefore return fix_and_revalidate instead of the intended inspect_failures first-run action.

        const completeness_score = calcCompletenessScore(summary.test_cases_valid, summary.total_test_cases);
        const hasBaseline = hasAnyRun(storageDir);
        const recommended_next_action = calcNextAction(completeness_score, hasBaseline);

src/mcp/tools/testCaseValidate.ts:190

The current run is persisted before loading baseline_run_id. When the requested baseline is the oldest retained run, this write can trigger eviction at the 20-run cap and make an otherwise valid baseline return BASELINE_NOT_FOUND.

        try {
          saveRun(storageDir, runId, currentViolations);
        } catch (saveErr) {
          log('warn', 'provar_testcase_validate: could not save run for diff', {
            requestId,
            error: (saveErr as Error).message,
          });
        }

        // Diff mode
        if (baseline_run_id !== undefined && baseline_run_id !== '') {
          const baseline = loadBaselineViolations(storageDir, baseline_run_id);

src/mcp/tools/projectValidateFromPath.ts:264

The current project run is saved before baseline_run_id is loaded. If the supplied baseline is the oldest of the 20 retained runs, saving the new run can evict it first, so a valid baseline id fails with BASELINE_NOT_FOUND.

        if (save_results !== false) {
          try {
            saveRun(storageDir, runId, currentViolations);
          } catch (saveErr) {
            log('warn', 'provar_project_validate: could not save run for diff', {
              requestId,
              error: (saveErr as Error).message,
            });
          }
        }

        // Diff mode
        if (baseline_run_id !== undefined && baseline_run_id !== '') {
          const baseline = loadBaselineViolations(storageDir, baseline_run_id);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        const storageDir = suiteStorageDir();
+        const runId = generateRunId(suite_name);
+        const currentViolations = result.violations as unknown as DiffableViolation[];
+
+        try {
+          saveRun(storageDir, runId, currentViolations);


+function suiteStorageDir(): string {
+  return path.join(os.homedir(), '.provardx', 'validation');
+}


+        try {
+          saveRun(storageDir, runId, currentViolations);
+        } catch (saveErr) {
+          log('warn', 'provar_testsuite_validate: could not save run for diff', {
+            requestId,
+            error: (saveErr as Error).message,
+          });
+        }
+
+        // Diff mode
+        if (baseline_run_id !== undefined && baseline_run_id !== '') {
+          const baseline = loadBaselineViolations(storageDir, baseline_run_id);


+        const completeness_score = calcCompletenessScore(baseResult.is_valid ? 1 : 0, 1);
+        const hasBaseline = hasAnyRun(storageDir);
+        const recommended_next_action = calcNextAction(completeness_score, hasBaseline);


-        // No API key configured — run local validation with onboarding message
+        const storageDir = tcStorageDir();
+        const runId = generateRunId(tcRunContext(file_path, source));
+        const currentViolations = baseResult.issues as unknown as DiffableViolation[];


+  const message = String(v['message'] ?? '').slice(0, 120);
+  return `${rule_id}||${applies_to}||${message}`;


+ * Recommend what the agent should do next based on the completeness score and
+ * whether any prior runs exist on disk for this validation context.
+ *
+ * - `stop`              → score is 100 — nothing left to fix
+ * - `inspect_failures`  → first run (no baseline on disk) — review what's failing before trying to fix
+ * - `fix_and_revalidate`→ subsequent run — agent knows the failure set, should fix and re-run
+ */
+export function calcNextAction(score: number, hasBaseline: boolean): NextAction {
+  if (score === 100) return 'stop';


+        detail: z
+          .enum(['summary', 'standard', 'full'])
+          .optional()
+          .default('standard')
+          .describe(
+            'Response verbosity. "summary": is_valid, scores, and stop signal only. "standard"/"full": full issues list (default).'
+          ),
+        baseline_run_id: z
+          .string()
+          .optional()
+          .describe(
+            'run_id from a previous call. When provided, returns only issues that are new or resolved since that run: { added, resolved, unchanged_count, run_id }. If not found, returns error BASELINE_NOT_FOUND.'


+        detail: z
+          .enum(['summary', 'standard', 'full'])
+          .optional()
+          .default('standard')
+          .describe(
+            'Response verbosity. "summary": key scores and stop signal only. "standard": slim violation summary (default). "full": full per-suite and per-test-case data (implies include_plan_details:true).'
+          ),
+        baseline_run_id: z
+          .string()
+          .optional()
+          .describe(
+            'run_id from a previous call. When provided, returns only project-level violations that are new or resolved since that run: { added, resolved, unchanged_count, run_id }. If not found, returns error BASELINE_NOT_FOUND.'


+
+/**
+ * Returns `compact` when PROVAR_MCP_SCHEMA_MODE=compact, otherwise `standard`.
+ * Reads the env var on each call so tests can set it without resetting module cache.
+ */
+export function desc(standard: string, compact: string): string {
+  return process.env['PROVAR_MCP_SCHEMA_MODE'] === 'compact' ? compact : standard;
+}


RCA: Multiple correctness and code quality issues identified in review of the detail/diff/completeness PDX-470/471/473 implementation. Fix: - generateRunId: add random suffix to prevent sub-millisecond collisions - testSuiteValidate: collect full violation hierarchy (recursive helper) - All three validate tools: load baseline before saveRun to prevent eviction race; call hasAnyRun before saveRun for first-run heuristic - testCaseValidate: include best_practices_violations in diff snapshot; extract resolveBaseResult helper to reduce handler complexity to 17 - projectValidateFromPath: omit run_id when save_results=false; extract classifyError helper to reduce handler complexity to 18 - Remove dead code: delete unused descHelper.ts - Tests: update run_id regex, add 8 TC tests, add detailLevel.test.ts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…eness_score for PR #168 RCA: CLAUDE.md requires docs updates for every PR that adds or modifies tool parameters; PR #168 added detail, baseline_run_id, run_id, completeness_score, and recommended_next_action to 4 validation tools without updating docs/mcp.md Fix: Updated provar_testcase_validate, provar_testsuite_validate, provar_testplan_validate, and provar_project_validate docs with new input params and output fields; added BASELINE_NOT_FOUND error code; marked include_plan_details/max_uncovered/max_violations as deprecated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…re to validation tools RCA: Iterative fix-validate loops re-emit full violation inventories on every call, compounding token cost with no stop signal; agents have no way to know when to stop iterating or which violations changed since the prior run. Fix: Add detail=summary|standard|full, baseline_run_id diff mode (returns only added/resolved violations), and completeness_score/recommended_next_action to all four validation tools. New utilities: detailLevel.ts, validationScore.ts, validationDiff.ts. 79 unit tests across validationScore, validationDiff, testSuiteValidate, testPlanValidate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

RCA: Multiple correctness and code quality issues identified in review of the detail/diff/completeness PDX-470/471/473 implementation. Fix: - generateRunId: add random suffix to prevent sub-millisecond collisions - testSuiteValidate: collect full violation hierarchy (recursive helper) - All three validate tools: load baseline before saveRun to prevent eviction race; call hasAnyRun before saveRun for first-run heuristic - testCaseValidate: include best_practices_violations in diff snapshot; extract resolveBaseResult helper to reduce handler complexity to 17 - projectValidateFromPath: omit run_id when save_results=false; extract classifyError helper to reduce handler complexity to 18 - Remove dead code: delete unused descHelper.ts - Tests: update run_id regex, add 8 TC tests, add detailLevel.test.ts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@deprecated

RCA: Five correctness issues remained after the Copilot follow-up commit: Bug 5 — tcStorageDir/suiteStorageDir both wrote to same path allowing cross-tool baseline collisions; Bug 7 — computeDiff Map collapsed duplicate violations; Bug 8 — 120-char truncation caused false-equal keys; Bug 9 — calcNextAction returned stop even when quality/BP violations remained; Missing AC — include_plan_details and max_* params not marked @deprecated. Fix: - Namespace storage dirs (testcase/, testsuite/) to prevent cross-tool baseline collisions — computeDiff now uses multiset (counts per key) so duplicate violations are distinct events — remove 120-char message truncation — calcNextAction gains remainingViolationCount param (default 0); stop only fires when score=100 AND count=0 — all three tools pass currentViolations.length — projectValidateFromPath marks include_plan_details/max_uncovered/max_violations @deprecated — 2 multiset tests, 2 secondary-check tests, updated TC test, 7 projectValidate tests for run_id, detail=summary, BASELINE_NOT_FOUND, diff round-trip Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…eness_score for PR #168 RCA: CLAUDE.md requires docs updates for every PR that adds or modifies tool parameters; PR #168 added detail, baseline_run_id, run_id, completeness_score, and recommended_next_action to 4 validation tools without updating docs/mcp.md Fix: Updated provar_testcase_validate, provar_testsuite_validate, provar_testplan_validate, and provar_project_validate docs with new input params and output fields; added BASELINE_NOT_FOUND error code; marked include_plan_details/max_uncovered/max_violations as deprecated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 13, 2026 14:58

Copilot started reviewing on behalf of mrdailey99 May 13, 2026 14:59 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

mrdailey99 and others added 4 commits May 14, 2026 13:46

mrdailey99 force-pushed the feature/pdx-470-471-473-validation branch from 6ffb10d to 806fc26 Compare May 14, 2026 19:49

mrdailey99 merged commit b26d3a7 into develop May 14, 2026
9 of 24 checks passed

mrdailey99 mentioned this pull request May 15, 2026

PDX-473/471: fix(mcp) — all-level violations in stop decision; read-only diff #180

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDX-470/471/473: feat(mcp): add detail level, diff mode, and completeness score to validation tools#168

PDX-470/471/473: feat(mcp): add detail level, diff mode, and completeness score to validation tools#168
mrdailey99 merged 4 commits into
developfrom
feature/pdx-470-471-473-validation

mrdailey99 commented May 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		const message = String(v['message'] ?? '').slice(0, 120);
		return `${rule_id}\|\|${applies_to}\|\|${message}`;

Conversation

mrdailey99 commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New shared utilities

Test plan

Uh oh!

github-actions Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Quality Orchestrator

🧪 Tests to Run · Running 7 of 47 tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mrdailey99 commented May 13, 2026 •

edited

Loading

github-actions Bot commented May 13, 2026 •

edited

Loading