Skip to content

PDX-470/471/473: feat(mcp): add detail level, diff mode, and completeness score to validation tools#168

Merged
mrdailey99 merged 4 commits into
developfrom
feature/pdx-470-471-473-validation
May 14, 2026
Merged

PDX-470/471/473: feat(mcp): add detail level, diff mode, and completeness score to validation tools#168
mrdailey99 merged 4 commits into
developfrom
feature/pdx-470-471-473-validation

Conversation

@mrdailey99
Copy link
Copy Markdown
Collaborator

@mrdailey99 mrdailey99 commented May 13, 2026

Summary

  • PDX-470: Add detail=summary|standard|full parameter to all four structured-response validation tools (provar_testsuite_validate, provar_testplan_validate, provar_testcase_validate, provar_project_validate). summary returns only key scores and the stop signal — eliminates token explosion on repeat calls.
  • PDX-471: Add baseline_run_id diff mode to provar_testsuite_validate and provar_project_validate. Each validation call saves a violation snapshot keyed by run_id; subsequent calls with baseline_run_id return only {added, resolved, unchanged_count} instead of the full inventory.
  • PDX-473: Add completeness_score (0–100) and recommended_next_action (stop | inspect_failures | fix_and_revalidate) to all four tools. Agents can use recommended_next_action=stop as an unambiguous signal to end the fix-validate loop.

New shared utilities

File Exports
src/mcp/utils/detailLevel.ts applyDetailLevel, DetailLevel
src/mcp/utils/validationScore.ts calcCompletenessScore, calcNextAction
src/mcp/utils/validationDiff.ts generateRunId, saveRun, hasAnyRun, loadBaselineViolations, computeDiff

Test plan

  • validationScore.test.ts — 8 tests for calcCompletenessScore and calcNextAction
  • validationDiff.test.ts — 13 tests for run persistence, cap-at-20 eviction, and diff computation
  • testSuiteValidate.test.ts — 30 tests (19 existing + 11 new for PDX-470/471/473)
  • testPlanValidate.test.ts — 28 tests (19 existing + 9 new for PDX-470/473)
  • All 79 tests pass locally in the isolated worktree

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 13, 2026 14:58
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Quality Orchestrator

🟢 LOW · 24 / 100 · Touches: utils. All changed files have mapped tests.


🧪 Tests to Run · Running 7 of 47 tests

  • unit/mcp/projectValidateFromPath.test.ts
  • unit/mcp/testCaseValidate.test.ts
  • unit/mcp/testPlanValidate.test.ts
  • unit/mcp/testSuiteValidate.test.ts
  • unit/mcp/detailLevel.test.ts
  • unit/mcp/validationDiff.test.ts
  • unit/mcp/validationScore.test.ts
▶ Run command
npx vitest run \
  unit/mcp/projectValidateFromPath.test.ts \
  unit/mcp/testCaseValidate.test.ts \
  unit/mcp/testPlanValidate.test.ts \
  unit/mcp/testSuiteValidate.test.ts \
  unit/mcp/detailLevel.test.ts \
  unit/mcp/validationDiff.test.ts \
  unit/mcp/validationScore.test.ts

⚡ quality-orchestrator  ·  /qo stub <file>  ·  qo analyze-local

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds response shaping, validation scoring, and diff-run persistence utilities for MCP validation tools so agents can request smaller responses, compare runs, and decide whether to continue validation loops.

Changes:

  • Adds detail response levels and completeness_score / recommended_next_action fields across validation tools.
  • Adds run snapshot persistence and diff computation utilities, with baseline diff mode wired into suite, testcase, and project validation.
  • Adds unit tests for score/diff utilities plus suite/plan response changes.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
src/mcp/utils/detailLevel.ts Adds shared summary/standard/full response filtering helper.
src/mcp/utils/validationScore.ts Adds completeness score and next-action helpers.
src/mcp/utils/validationDiff.ts Adds run id generation, snapshot persistence, baseline loading, and diff computation.
src/mcp/tools/testSuiteValidate.ts Wires detail level, run ids, diff mode, and next-action scoring into suite validation.
src/mcp/tools/testPlanValidate.ts Wires detail level and next-action scoring into plan validation.
src/mcp/tools/testCaseValidate.ts Wires detail level, run ids, diff mode, and next-action scoring into testcase validation.
src/mcp/tools/projectValidateFromPath.ts Wires detail level, run ids, diff mode, and next-action scoring into project validation.
src/mcp/tools/descHelper.ts Adds an environment-driven description helper.
test/unit/mcp/validationScore.test.ts Adds unit coverage for scoring and next-action helpers.
test/unit/mcp/validationDiff.test.ts Adds unit coverage for run persistence and diff helper behavior.
test/unit/mcp/testSuiteValidate.test.ts Adds suite validation tests for detail, score/action, and baseline diff behavior.
test/unit/mcp/testPlanValidate.test.ts Adds plan validation tests for detail and score/action behavior.
Comments suppressed due to low confidence (3)

src/mcp/tools/testSuiteValidate.ts:172

  • hasAnyRun(storageDir) is evaluated after saveRun() has already written the current run, so it will be true even on the first validation for this suite. A failing first run will therefore return fix_and_revalidate instead of the intended inspect_failures first-run action.
        const completeness_score = calcCompletenessScore(summary.test_cases_valid, summary.total_test_cases);
        const hasBaseline = hasAnyRun(storageDir);
        const recommended_next_action = calcNextAction(completeness_score, hasBaseline);

src/mcp/tools/testCaseValidate.ts:190

  • The current run is persisted before loading baseline_run_id. When the requested baseline is the oldest retained run, this write can trigger eviction at the 20-run cap and make an otherwise valid baseline return BASELINE_NOT_FOUND.
        try {
          saveRun(storageDir, runId, currentViolations);
        } catch (saveErr) {
          log('warn', 'provar_testcase_validate: could not save run for diff', {
            requestId,
            error: (saveErr as Error).message,
          });
        }

        // Diff mode
        if (baseline_run_id !== undefined && baseline_run_id !== '') {
          const baseline = loadBaselineViolations(storageDir, baseline_run_id);

src/mcp/tools/projectValidateFromPath.ts:264

  • The current project run is saved before baseline_run_id is loaded. If the supplied baseline is the oldest of the 20 retained runs, saving the new run can evict it first, so a valid baseline id fails with BASELINE_NOT_FOUND.
        if (save_results !== false) {
          try {
            saveRun(storageDir, runId, currentViolations);
          } catch (saveErr) {
            log('warn', 'provar_project_validate: could not save run for diff', {
              requestId,
              error: (saveErr as Error).message,
            });
          }
        }

        // Diff mode
        if (baseline_run_id !== undefined && baseline_run_id !== '') {
          const baseline = loadBaselineViolations(storageDir, baseline_run_id);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +128 to +133
const storageDir = suiteStorageDir();
const runId = generateRunId(suite_name);
const currentViolations = result.violations as unknown as DiffableViolation[];

try {
saveRun(storageDir, runId, currentViolations);
Comment on lines +67 to +69
function suiteStorageDir(): string {
return path.join(os.homedir(), '.provardx', 'validation');
}
Comment thread src/mcp/tools/testSuiteValidate.ts Outdated
Comment on lines +132 to +143
try {
saveRun(storageDir, runId, currentViolations);
} catch (saveErr) {
log('warn', 'provar_testsuite_validate: could not save run for diff', {
requestId,
error: (saveErr as Error).message,
});
}

// Diff mode
if (baseline_run_id !== undefined && baseline_run_id !== '') {
const baseline = loadBaselineViolations(storageDir, baseline_run_id);
Comment thread src/mcp/tools/testCaseValidate.ts Outdated
Comment on lines +217 to +219
const completeness_score = calcCompletenessScore(baseResult.is_valid ? 1 : 0, 1);
const hasBaseline = hasAnyRun(storageDir);
const recommended_next_action = calcNextAction(completeness_score, hasBaseline);
Comment thread src/mcp/tools/testCaseValidate.ts Outdated
// No API key configured — run local validation with onboarding message
const storageDir = tcStorageDir();
const runId = generateRunId(tcRunContext(file_path, source));
const currentViolations = baseResult.issues as unknown as DiffableViolation[];
Comment thread src/mcp/utils/validationDiff.ts Outdated
Comment on lines +50 to +51
const message = String(v['message'] ?? '').slice(0, 120);
return `${rule_id}||${applies_to}||${message}`;
Comment thread src/mcp/utils/validationScore.ts Outdated
Comment on lines +17 to +25
* Recommend what the agent should do next based on the completeness score and
* whether any prior runs exist on disk for this validation context.
*
* - `stop` → score is 100 — nothing left to fix
* - `inspect_failures` → first run (no baseline on disk) — review what's failing before trying to fix
* - `fix_and_revalidate`→ subsequent run — agent knows the failure set, should fix and re-run
*/
export function calcNextAction(score: number, hasBaseline: boolean): NextAction {
if (score === 100) return 'stop';
Comment thread src/mcp/tools/testCaseValidate.ts Outdated
Comment on lines +89 to +100
detail: z
.enum(['summary', 'standard', 'full'])
.optional()
.default('standard')
.describe(
'Response verbosity. "summary": is_valid, scores, and stop signal only. "standard"/"full": full issues list (default).'
),
baseline_run_id: z
.string()
.optional()
.describe(
'run_id from a previous call. When provided, returns only issues that are new or resolved since that run: { added, resolved, unchanged_count, run_id }. If not found, returns error BASELINE_NOT_FOUND.'
Comment on lines +200 to +211
detail: z
.enum(['summary', 'standard', 'full'])
.optional()
.default('standard')
.describe(
'Response verbosity. "summary": key scores and stop signal only. "standard": slim violation summary (default). "full": full per-suite and per-test-case data (implies include_plan_details:true).'
),
baseline_run_id: z
.string()
.optional()
.describe(
'run_id from a previous call. When provided, returns only project-level violations that are new or resolved since that run: { added, resolved, unchanged_count, run_id }. If not found, returns error BASELINE_NOT_FOUND.'
Comment on lines +7 to +14

/**
* Returns `compact` when PROVAR_MCP_SCHEMA_MODE=compact, otherwise `standard`.
* Reads the env var on each call so tests can set it without resetting module cache.
*/
export function desc(standard: string, compact: string): string {
return process.env['PROVAR_MCP_SCHEMA_MODE'] === 'compact' ? compact : standard;
}
mrdailey99 added a commit that referenced this pull request May 13, 2026
RCA: Multiple correctness and code quality issues identified in review
     of the detail/diff/completeness PDX-470/471/473 implementation.
Fix: - generateRunId: add random suffix to prevent sub-millisecond collisions
     - testSuiteValidate: collect full violation hierarchy (recursive helper)
     - All three validate tools: load baseline before saveRun to prevent
       eviction race; call hasAnyRun before saveRun for first-run heuristic
     - testCaseValidate: include best_practices_violations in diff snapshot;
       extract resolveBaseResult helper to reduce handler complexity to 17
     - projectValidateFromPath: omit run_id when save_results=false;
       extract classifyError helper to reduce handler complexity to 18
     - Remove dead code: delete unused descHelper.ts
     - Tests: update run_id regex, add 8 TC tests, add detailLevel.test.ts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mrdailey99 added a commit that referenced this pull request May 13, 2026
…eness_score for PR #168

RCA: CLAUDE.md requires docs updates for every PR that adds or modifies tool parameters; PR #168 added detail, baseline_run_id, run_id, completeness_score, and recommended_next_action to 4 validation tools without updating docs/mcp.md
Fix: Updated provar_testcase_validate, provar_testsuite_validate, provar_testplan_validate, and provar_project_validate docs with new input params and output fields; added BASELINE_NOT_FOUND error code; marked include_plan_details/max_uncovered/max_violations as deprecated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mrdailey99 and others added 4 commits May 14, 2026 13:46
…re to validation tools

RCA: Iterative fix-validate loops re-emit full violation inventories on every call, compounding token cost with no stop signal; agents have no way to know when to stop iterating or which violations changed since the prior run.
Fix: Add detail=summary|standard|full, baseline_run_id diff mode (returns only added/resolved violations), and completeness_score/recommended_next_action to all four validation tools. New utilities: detailLevel.ts, validationScore.ts, validationDiff.ts. 79 unit tests across validationScore, validationDiff, testSuiteValidate, testPlanValidate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
RCA: Multiple correctness and code quality issues identified in review
     of the detail/diff/completeness PDX-470/471/473 implementation.
Fix: - generateRunId: add random suffix to prevent sub-millisecond collisions
     - testSuiteValidate: collect full violation hierarchy (recursive helper)
     - All three validate tools: load baseline before saveRun to prevent
       eviction race; call hasAnyRun before saveRun for first-run heuristic
     - testCaseValidate: include best_practices_violations in diff snapshot;
       extract resolveBaseResult helper to reduce handler complexity to 17
     - projectValidateFromPath: omit run_id when save_results=false;
       extract classifyError helper to reduce handler complexity to 18
     - Remove dead code: delete unused descHelper.ts
     - Tests: update run_id regex, add 8 TC tests, add detailLevel.test.ts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
RCA: Five correctness issues remained after the Copilot follow-up commit: Bug 5 — tcStorageDir/suiteStorageDir both wrote to same path allowing cross-tool baseline collisions; Bug 7 — computeDiff Map collapsed duplicate violations; Bug 8 — 120-char truncation caused false-equal keys; Bug 9 — calcNextAction returned stop even when quality/BP violations remained; Missing AC — include_plan_details and max_* params not marked @deprecated.
Fix: - Namespace storage dirs (testcase/, testsuite/) to prevent cross-tool baseline collisions — computeDiff now uses multiset (counts per key) so duplicate violations are distinct events — remove 120-char message truncation — calcNextAction gains remainingViolationCount param (default 0); stop only fires when score=100 AND count=0 — all three tools pass currentViolations.length — projectValidateFromPath marks include_plan_details/max_uncovered/max_violations @deprecated — 2 multiset tests, 2 secondary-check tests, updated TC test, 7 projectValidate tests for run_id, detail=summary, BASELINE_NOT_FOUND, diff round-trip

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eness_score for PR #168

RCA: CLAUDE.md requires docs updates for every PR that adds or modifies tool parameters; PR #168 added detail, baseline_run_id, run_id, completeness_score, and recommended_next_action to 4 validation tools without updating docs/mcp.md
Fix: Updated provar_testcase_validate, provar_testsuite_validate, provar_testplan_validate, and provar_project_validate docs with new input params and output fields; added BASELINE_NOT_FOUND error code; marked include_plan_details/max_uncovered/max_violations as deprecated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@mrdailey99 mrdailey99 force-pushed the feature/pdx-470-471-473-validation branch from 6ffb10d to 806fc26 Compare May 14, 2026 19:49
@mrdailey99 mrdailey99 merged commit b26d3a7 into develop May 14, 2026
9 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants