Skip to content

PDX-482: feat(mcp): harden testcase tool descriptions for single-call contract#174

Merged
mrdailey99 merged 2 commits into
developfrom
fix/PDX-482-harden-tool-descriptions
May 15, 2026
Merged

PDX-482: feat(mcp): harden testcase tool descriptions for single-call contract#174
mrdailey99 merged 2 commits into
developfrom
fix/PDX-482-harden-tool-descriptions

Conversation

@mrdailey99
Copy link
Copy Markdown
Collaborator

Summary

  • Adds a three-line construction contract to the top of provar_testcase_generate's TOOL_DESCRIPTION: pass the FULL step tree in a single call; step_edit is for AMENDING; stop-and-assemble guidance for the common mistake
  • Tightens the steps[] field description to call out the COMPLETE step tree requirement and warn against the multi-call append pattern
  • Mirrors the contract on provar_testcase_step_edit: self-identifies as AMENDMENT-ONLY, rejects construct-from-scratch usage, points agents at provar_testcase_generate for new test cases, spells out the structural defects (dropped scenarios, flat asserts, inconsistent step types) from misuse
  • Adds 6 regression-guard unit tests asserting the canonical phrasing in both tool descriptions
  • Adds scripts/pdx-482-validate.cjs — 13 protocol-surface assertions via JSON-RPC, all PASS

Jira

Why

PDX-479 surfaced that authoring guidance lives in three places — orchestration prompt, tool-guide resource, and tool descriptions. A regression in any one of them can drift the LLM. PDX-481 fixed the prompt + resource but left the tool descriptions silent: the existing steps[] field description was just "Ordered list of test steps" with no anti-pattern protection. If a future upstream prompt re-introduces the multi-call pattern, only the tool description can push back at the call site — and it currently says nothing about it.

This PR closes that gap. The tool description is the strongest, most-cited signal in an MCP client's working context because it surfaces on every tool-discovery cycle. With the contract embedded there, the single-call requirement survives any subsequent prompt/resource drift.

Test plan

  • yarn compile clean
  • node_modules/.bin/nyc node_modules/.bin/mocha "test/**/*.test.ts"1127 passing, 0 failing (6 new regression-guard tests)
  • yarn lint clean
  • node scripts/pdx-482-validate.cjs13 / 13 PASS — confirms both tool descriptions carry the canonical phrasing at the MCP protocol surface

Adversarial check

The PDX-482 AC asks for a defensive validation: even if PDX-481's prompts were intentionally reverted in a scratch branch, the in-tool description alone should be sufficient to keep an LLM on the single-call pattern. The 13 protocol-surface assertions in `pdx-482-validate.cjs` are the codified form of that check: they read what an MCP client surfaces to its LLM at the call site, which is where the contract has to fire.

Changes

  • `src/mcp/tools/testCaseGenerate.ts`: TOOL_DESCRIPTION gets the three-line construction contract prepended; `steps[]` field description tightened
  • `src/mcp/tools/testCaseStepTools.ts`: description self-identifies as AMENDMENT-ONLY with construct rejection + structural-defect callout
  • `test/unit/mcp/testCaseGenerate.test.ts`: 3 regression-guard tests for the construction contract on generate
  • `test/unit/mcp/testCaseStepTools.test.ts`: 3 regression-guard tests for the amendment-only contract on step_edit
  • `scripts/pdx-482-validate.cjs` (new): 13-assertion JSON-RPC validation harness

Out of scope

  • Functional changes to the XML emitter, step builder, or validator (none)
  • Runtime guard rejecting step_edit against freshly-generated empty skeletons (would be over-engineering for this story)
  • MCP tool registration plumbing changes

Customer-facing docs

`docs/mcp.md` already carries the construct-vs-amend split from PDX-481. `docs/provar-mcp-public-docs.md` and `docs/university-of-provar-mcp-course.md` are maintained separately by the Provar team and may need a mirror update if they describe tool descriptions verbatim.

🤖 Generated with Claude Code

… contract

RCA: PDX-479 surfaced that authoring guidance lives in three places (prompts, resource, tool descriptions) and a regression in any one of them — like the multi-call author-test flow that shipped in PR #153 — can drift the LLM away from correct test case construction. PDX-481 fixed the prompts + resource but the tool descriptions themselves still carried no construct-vs-amend contract. The steps[] field description was just "Ordered list of test steps" with no anti-pattern protection. If a future upstream prompt re-introduces the multi-call pattern, only the tool description can push back at the call site — and it currently says nothing about it.
Fix: Added a three-line construction contract to the top of testCaseGenerate.ts TOOL_DESCRIPTION (single-call pattern, step_edit is for AMENDING, stop-and-assemble guidance for the common mistake). Tightened the steps[] field description to call out the FULL/COMPLETE step tree in one call and warn against the multi-call append pattern. Mirrored the contract in testCaseStepTools.ts: the step_edit description now self-identifies as AMENDMENT-ONLY, rejects construction usage, points agents at provar_testcase_generate for new test cases, and spells out the structural defects (dropped scenarios, flat asserts, inconsistent step types) from misuse. Added 6 regression-guard unit tests asserting the canonical phrasing in both tool descriptions. Added scripts/pdx-482-validate.cjs (13 protocol-surface assertions, 13/13 PASS) for direct JSON-RPC verification. Full gate green: 1127 mocha tests, lint clean, compile clean.
Copilot AI review requested due to automatic review settings May 15, 2026 14:32
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

Quality Orchestrator

🟢 LOW · 4 / 100 · All changed files have mapped tests.


🧪 Tests to Run · Running 2 of 51 tests

  • unit/mcp/testCaseGenerate.test.ts
  • unit/mcp/testCaseStepTools.test.ts
▶ Run command
npx vitest run \
  unit/mcp/testCaseGenerate.test.ts \
  unit/mcp/testCaseStepTools.test.ts

⚡ quality-orchestrator  ·  /qo stub <file>  ·  qo analyze-local

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens the MCP tool descriptions for provar_testcase_generate and provar_testcase_step_edit with an explicit single-call construction contract vs. amendment-only contract, so that the guidance survives any future drift in upstream prompts/resources. Adds regression-guard unit tests and a JSON-RPC protocol-surface validation script.

Changes:

  • Prepend a three-line construction contract to provar_testcase_generate's TOOL_DESCRIPTION and tighten the steps[] field description.
  • Mark provar_testcase_step_edit as AMENDMENT-ONLY, explicitly rejecting construct-from-scratch usage and listing structural defects.
  • Add 6 unit tests plus a new scripts/pdx-482-validate.cjs JSON-RPC validator.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/mcp/tools/testCaseGenerate.ts Adds construction-contract preamble to TOOL_DESCRIPTION and tightens steps[] field description
src/mcp/tools/testCaseStepTools.ts Adds AMENDMENT-ONLY framing and updates the short description
test/unit/mcp/testCaseGenerate.test.ts 3 regression-guard tests for the construction contract
test/unit/mcp/testCaseStepTools.test.ts 3 regression-guard tests for the amendment-only contract
scripts/pdx-482-validate.cjs New JSON-RPC validation harness with 13 protocol-surface assertions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

RCA: An adversarial review of the original PDX-482 commit identified three critical defects in the construct/amend contract: (1) PROVAR_MCP_SCHEMA_MODE=compact silently swapped the description for a contract-free one-liner — the LLM would never see the contract in compact mode, making it a regression highway; (2) the regex /step_edit[^.]*not for CONSTRUCTING|CONSTRUCTING[^.]*not/i had a false-positive that could pass on hostile rewordings like "constructing...not via generate"; (3) no test asserted the contract appears EARLY in the description, so a future refactor could move it down where LLM attention is lower. Additionally the step_edit test used an OR-clause across the three structural defects so dropping two of them would silently dilute the warning.
Fix: (1) Compact form on provar_testcase_generate now reads "Generate a Provar test case in ONE call with the FULL steps[] tree. Do NOT call with steps=[] then append via provar_testcase_step_edit (step_edit is for AMENDING existing test cases, not for CONSTRUCTING new ones)." — protocol-surface validator now spawns the server twice (standard + PROVAR_MCP_SCHEMA_MODE=compact) and runs 6 contract assertions against the compact form. (2) Replaced the false-positive regex with literal includes('not for CONSTRUCTING one from scratch') in both the unit test and the validator — locked on the canonical phrasing. (3) Added a leading-position assertion in both the unit test and validator: indexOf('Construction pattern') < 200 to prevent silent drift. (4) Tightened the step_edit "structural defects" test from OR-clause to three separate AND-style assertions on "dropped scenarios", "flat asserts", and "inconsistent step types" — dropping any one now fails the test. Gate: 1129 mocha tests, lint clean, validator 20/20 (was 13/13) covering both schema modes.
@mrdailey99
Copy link
Copy Markdown
Collaborator Author

Independent adversarial review + fixes — 47c75e2

Copilot's automated review returned zero comments. Because this is a defense-in-depth feature, I ran an additional adversarial review (independent reviewer, brief: "try to break the contract"). It surfaced three critical defects + one important nit that the regression-guard tests would not have caught.

Critical findings & resolutions

# Finding Resolution
1 PROVAR_MCP_SCHEMA_MODE=compact was a regression highway. The desc() helper returns the compact form when that env var is set; my compact string was the pre-PDX-482 contract-free one-liner "Generate a Provar XML test case skeleton with UUID guids and steps structure." — the entire contract would silently disappear in compact mode. Rewrote the compact form to carry the contract: "Generate a Provar test case in ONE call with the FULL steps[] tree. Do NOT call with steps=[] then append via provar_testcase_step_edit (step_edit is for AMENDING existing test cases, not for CONSTRUCTING new ones)." Validator now spawns the server twice (standard + compact) and runs 6 contract assertions against the compact form.
2 Regex false-positive in the "step_edit not for CONSTRUCTING" assertion. /step_edit[^.]*not for CONSTRUCTING|CONSTRUCTING[^.]*not/i would pass on hostile rewordings like "constructing... is the only way... not via generate". Replaced both the unit test and validator with literal includes('not for CONSTRUCTING one from scratch') — locked on the canonical phrasing so the assertion can only pass when the exact contract phrase is present.
3 No leading-position assertion. A future refactor could move the contract to the bottom of the joined array — every other assertion would still pass, but the LLM would see 4 KB of mechanics before reaching the contract. Added indexOf('Construction pattern') < 200 in both the unit test and the validator. Current position: 0 (literally the first chars).

Important nit & resolution

Finding Resolution
step_edit test for the "structural defects" warning used an OR-clause across "dropped scenarios", "flat asserts", "inconsistent step types". A future cleanup that removes two of the three would silently dilute the warning while the test still passes. Split into three separate assert.ok calls — dropping any one of the three defects now fails the test. The full consequence chain stays in the description.

Gate after fixes

Check Before After
Validator assertions 13 / 13 PASS 20 / 20 PASS (added 6 compact-mode + 1 leading-position)
Mocha tests 1127 passing 1129 passing (+1 leading-position test, +1 compact-mode describe block)
Lint clean clean
Compile clean clean

Findings deliberately deferred to a follow-up

The adversarial review also suggested:

  • Runtime guard — return STEPS_REQUIRED when steps.length === 0 && !dry_run. This is a behaviour change with broader implications (some callers may legitimately want a skeleton-only generation). Out of scope for this hardening story; worth a separate ticket if we decide the contract should be enforced at runtime, not just at the description surface.
  • Title field — adding the contract to the tool title: field. Most MCP clients show titles in tool-picker chips. Considered but skipped because the title field is also constrained by client UIs (Cursor shows full title, Claude Desktop truncates at ~50 chars). The description is the higher-leverage surface.
  • Audit guidePrompts.ts for parallel wording — already aligned by PDX-481 (which I merged into develop earlier today). Re-verified by hand; no divergence.

PR is now updated. Reviewer-facing changes: 47c75e2 on top of ac77154.

@mrdailey99 mrdailey99 merged commit 348af9a into develop May 15, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants