spec: add fixture-driven conformance suite by mickdarling · Pull Request #248 · MCPAQL/spec

mickdarling · 2026-04-15T16:49:28Z

Summary

add a fixture-driven conformance runner with test, verify-fixtures, report, and version commands
add reference evidence bundles for passing Level 1, passing Level 2, semantic-warning, and failing cases
wire the conformance fixtures into npm test, CI, and the conformance/generator docs

Testing

npm test
npm run docs:dates:check

mickdarling · 2026-04-15T19:04:52Z

@claude please review this PR for content, quality, gaps, and consistency.

claude · 2026-04-15T19:05:08Z

Claude finished @mickdarling's task in 3m 17s —— View job

PR Review: `spec: add fixture-driven conformance suite`

Tasks

Read CLAUDE.md for repo-specific guidelines
Read and analyze changed files
Review conformance runner script
Review fixture/evidence files
Review CI workflow
Review documentation changes
Post comprehensive review feedback

Overview

The fixture-driven evidence-bundle approach is well-conceived. Instead of requiring a live adapter for CI, fixtures capture pre-recorded responses and the runner evaluates them deterministically. The four reference cases (passing L1, passing L2, semantic-warn, failure) cover the important paths, and wiring verify-fixtures into npm test and CI gives this real teeth from day one. The two-tier semantic model and the evaluateSemantic tie-breaking logic are solid.

Below are findings organized by severity.

Bugs / Inconsistencies

1. --verbose is dead code · scripts/run-conformance-tests.mjs:779

options.verbose is set when --verbose/-v is passed but is never read anywhere in the evaluation or rendering pipeline. Users will expect verbose output and get none.

Fix this →

2. --timeout documented but not implemented · docs/conformance-testing.md:461

The CLI options table lists --timeout with a default of 30, and exit code 4 is documented for timeouts. Neither exists in parseArgs(). Since the current runner is purely in-process (no network), timeout may not be meaningful yet — but the docs should say so or the option should be removed.

Fix this →

3. Exit code 3 semantics mismatch · scripts/run-conformance-tests.mjs:860 vs docs/conformance-testing.md:469

The spec documents exit code 3 as "Configuration error — invalid adapter path or configuration". The script uses it for unknown command (Unknown command: ${command}). These are different failure modes. The docs table should say "Unknown command or configuration error".

4. JSON report format: required type and testDate field · docs/conformance-testing.md:397-407 vs scripts/run-conformance-tests.mjs:636-647

Two concrete mismatches between the example report in the docs and what summarize() actually produces:

Docs show "required": "MUST" (string). Code produces "required": true (boolean).
Docs show a "testDate" field. summarize() never writes testDate.

Anyone consuming JSON output programmatically and following the spec example will get different shapes.

Fix this →

5. Test count table is stale · docs/conformance-testing.md:283-288

Section 4.1 declares "Total MUST PASS tests: 9" with breakdown 2+3+2+2. The actual runner produces more:

Category	Spec says	Runner produces
Introspection Fidelity	2	4 (operations, types, detail payloads, completeness)
Parameter Handling	3	4 (required, unknown, defaults, snake_case)
Error Quality	2	3 (schema, leakage, actionable)
Round-Trip Integrity	2	2 ✓

The table and total count should be updated to match.

Fix this →

Gaps

6. CI workflow doesn't trigger on schema changes · .github/workflows/conformance-tests.yml:7-22

The conformance runner loads schemas/introspection-response.schema.json and schemas/operation-result.schema.json at startup (lines 51-57 of the script). If those schemas change without touching any conformance file, CI won't re-run and a broken validator could silently pass.

Add 'schemas/introspection-response.schema.json' and 'schemas/operation-result.schema.json' to both push.paths and pull_request.paths.

Note: I'm unable to modify .github/workflows/ files — you'll need to make this change directly.

7. No error handling around loadJson · scripts/run-conformance-tests.mjs:24-26

function loadJson(path) {
  return JSON.parse(readFileSync(path, "utf8"));
}

If a fixture is missing or malformed, Node throws a raw error with a stack trace rather than a clean exit message. A try/catch with process.stderr.write and process.exit(3) would give users actionable output.

Fix this →

8. runReportCommand doesn't validate input or exit with a code · scripts/run-conformance-tests.mjs:835-843

Passing a non-report JSON (e.g., a fixture file) to report will silently produce malformed output. The command also never calls process.exit(), unlike test. The docs don't note this asymmetry. At minimum, basic duck-typing (checking for categories, summary) and a final process.exit(0) would be consistent.

9. --tier 2 and --tier both are undifferentiated · scripts/run-conformance-tests.mjs:492-502

evaluateSemantic branches only on tier === "1". Both "2" and "both" execute identical code. A user passing --tier 2 may expect to skip structural pattern matching and run only LLM-style checks — that's not what happens. The docs list all three values without explaining this equivalence.

10. export-conformance-evidence.sh reference in generator docs · docs/adapter/generator.md:913

The updated example references a shell script ./generated-adapter/scripts/export-conformance-evidence.sh that doesn't exist and has no spec. This will confuse implementors who follow the example literally. Either add a note that this is a placeholder, or replace with a more realistic example.

Minor / Style

11. Issue reference may be wrong in conformance-testing.md · docs/conformance-testing.md:539

The References section links "GitHub Issue #55 — Conformance test requirements". The PR body closes #10 and #56. If #55 covers distinct remaining work, that's fine — but it's worth a comment confirming this is intentional and not a stale link.

12. reference-failure.json tests multiple failures simultaneously

The failure fixture combines camelCase parameter name (widgetId), completeness mismatch (widget_id/include_deleted not in introspection), and implementation leakage — all in one file. This is a valid smoke test, but if the runner's behavior changes for one failure type, it can mask regressions in others. Consider whether separating failure modes into distinct fixtures would give better diagnostic signal.

13. conformance-testing.md section 6.1 example has "required": "MUST" inconsistency with Markdown render section · docs/conformance-testing.md:707-711

renderMarkdown() writes `Requirement: MUST` as plain text, not as a JSON field. The markdown output is fine, but the JSON output uses true/false. Both are correct for their format, but the two representations should be called out explicitly so implementors don't conflate them.

Summary

The core design — evidence bundles, two-tier semantic evaluation, verify-fixtures gate in CI — is clean and practical. The main items to address before merge:

Fix --verbose (dead code) and --timeout (documented but missing)
Align the JSON report format (required type, testDate) between docs and code
Update section 4.1 test count table to match the actual runner
Add schema file paths to CI trigger paths (requires manual edit to the workflow)
Add loadJson error handling

mickdarling · 2026-04-15T20:41:58Z

@claude please review this PR for content, quality, gaps, and consistency.

claude · 2026-04-15T20:42:14Z

Claude finished @mickdarling's task in 4m 56s —— View job

Tasks

Read CLAUDE.md for repo-specific guidelines
Read and analyze changed files
Review conformance runner script (scripts/run-conformance-tests.mjs)
Review fixture/evidence files
Review CI workflow and package.json
Review documentation changes
Post comprehensive review

PR Review: `spec: add fixture-driven conformance suite`

The evidence-bundle approach is the right call for a spec repo — static, reviewable, and deterministic. The second commit (45a5c5b) already resolved the issues flagged in the first pass (--verbose dead code, --timeout documentation, exit code 3 semantics, required type alignment, test count table, CI schema paths, loadJson error handling, runReportCommand validation, export-conformance-evidence.mjs note). What remains is smaller.

Bugs

1. Uncaught TypeError when fixture/report path is omitted · scripts/run-conformance-tests.mjs:789,836

runTestCommand and runReportCommand call resolve(positional[0]) before entering loadJson. When no path is given, positional[0] is undefined, and path.resolve(undefined) throws a TypeError [ERR_INVALID_ARG_TYPE] that bypasses loadJson's try/catch — users see a raw stack trace instead of a clean error.

// runTestCommand (line 789) and runReportCommand (line 836)
// resolve() throws before loadJson's try/catch can help
const fixture = loadJson(resolve(fixturePath));  // fixturePath may be undefined

Guard both entry points:

if (!fixturePath) {
  process.stderr.write(`No fixture path provided.\nUsage: node scripts/run-conformance-tests.mjs test <path> [options]\n`);
  process.exit(3);
}

Fix this →

Documentation Gaps

2. Section 4.2 SHOULD PASS table is missing "Level 2 Features" · docs/conformance-testing.md:289-297

evaluateFixture adds a third optional category — evaluateLevel2Capabilities (which checks endpoint modes, field selection, and batch operations) — whenever level >= 2. With required: false, failures produce exit code 2. The section 4.2 table only lists Constraint Documentation and Semantic Evaluation; Level 2 Features is absent.

Category	Tests	Rationale
Constraint Documentation	2	Discoverable constraints
Level 2 Features	3	Endpoint modes, field selection, batch
Semantic Evaluation	Per implementation	LLM discoverability

Fix this →

3. Section 5.4 mermaid diagram has a phantom decision branch · docs/conformance-testing.md:360-371

The diagram shows:

B -->|Yes| C{Tier 2 Required?}
C -->|No| E[PASS]

But in evaluateSemantic (lines 500-562), there is no "Tier 2 required?" check — when tier !== "1", both tier1Pass and tier2Pass are always computed and combined. The C -->|No| PASS path doesn't correspond to any code path.

Either simplify the diagram to show the actual four-outcome logic (tier1 ∧ tier2 → PASS, ¬tier1 ∧ tier2 → WARN, tier1 ∧ ¬tier2 → FAIL, ¬tier1 ∧ ¬tier2 → FAIL), or add a note that the diagram describes the full intended design rather than the current in-process fixture runner.

Fix this →

Design Gaps

4. Empty round_trip_tests trivially satisfies a MUST PASS category

evaluateRoundTrip iterates fixture.round_trip_tests and fixture.update_preservation_tests. When both arrays are empty, all individual checks accumulate zero failures and both sub-tests report PASS. This means an implementation that provides no round-trip evidence at all is considered conformant at Level 1 for that category.

reference-failure.json exploits this inadvertently — it has "round_trip_tests": [] and "update_preservation_tests": [], so Round-Trip Integrity PASSes even though the fixture is supposed to demonstrate a comprehensive failure. A note in the fixture or a minimum-evidence check (at least one entry for each, or a SKIP result when empty rather than trivial PASS) would close this gap.

5. No fixture schema — malformed evidence bundles fail silently or cryptically

There is no JSON Schema for the fixture format itself. A missing accepted_parameters key silently defaults to {} (producing WARNs). A misspelled introspection.operations_list key silently skips operations validation. An implementor building their first evidence bundle has no machine-checkable spec to validate against before running the tool.

Consider adding a schemas/conformance-fixture.schema.json and validating fixtures against it at the start of runTestCommand and runVerifyFixtures.

Minor

6. version command is out of sync with package.json · scripts/run-conformance-tests.mjs:865

process.stdout.write("mcpaql-conformance-fixtures 0.1.0\n");

package.json version is 1.0.0-alpha.1. The separate versioning is defensible (the runner is a sub-artifact), but neither the code nor the docs explain that these are intentionally different. Either read the version from package.json dynamically, or add a comment.

7. verify-fixtures prints no totals line

PASS reference-failure.json -> expected 1, got 1
PASS reference-level1.json -> expected 0, got 0
...

A trailing "All N fixtures passed." (or "N/M fixtures failed.") would make the output scannable in CI logs without having to count lines.

8. FORBIDDEN_ERROR_PATTERNS includes /at\s+\S+/i

This catches stack-frame lines (at Function.eval), but would also flag legitimate error messages like "available at docs.example.com" or "look at the field name". For in-fixture strings this is low-risk since implementors control the message, but worth documenting the trade-off in a comment so maintainers understand why the pattern is written broadly.

Summary

The second commit addressed the bulk of the initial review. The remaining items break down as:

Priority	Item
Bug	Guard `runTestCommand`/`runReportCommand` against undefined path before calling `resolve()`
Docs	Add "Level 2 Features" row to section 4.2 SHOULD PASS table
Docs	Fix or annotate the section 5.4 mermaid diagram
Design	Document/decide behavior when round-trip evidence arrays are empty
Design	Consider adding a fixture JSON Schema for self-validation
Minor	Sync or explain the `version` string vs package.json version
Minor	Add summary line to `verify-fixtures` output

The core design is solid. Items 1–3 are quick fixes; 4–5 can be addressed in a follow-on issue if preferred.

mickdarling · 2026-04-15T21:30:47Z

@claude please review this PR for content, quality, gaps, and consistency.

claude · 2026-04-15T21:31:03Z

Claude finished @mickdarling's task in 3m 13s —— View job

Tasks

Read CLAUDE.md for repo guidelines
Read conformance runner script
Read documentation changes
Read fixture/evidence files
Read CI workflow and package.json
Post comprehensive third-pass review

PR Review (Pass 3): `spec: add fixture-driven conformance suite`

The third commit (98e5e6d) resolved most of the second pass's findings. Below is a status audit plus new findings from reading the current file state.

Status of Prior Review Items

All first-pass items: resolved.

From second pass:

Item	Status
Guard `runTestCommand`/`runReportCommand` against undefined path	✅ Fixed (`scripts/run-conformance-tests.mjs:789-793, 841-845`)
Add "Level 2 Features" row to section 4.2 SHOULD PASS table	✅ Fixed
Fix/annotate section 5.4 mermaid diagram	✅ Fixed (diagram now matches actual four-outcome logic)
Document/decide behavior for empty round-trip arrays	❌ Open — still trivially passes (see item 4 below)
Consider adding a fixture JSON Schema	❌ Open — no `schemas/conformance-fixture.schema.json`
Sync or explain `version` string vs `package.json`	❌ Open — `"mcpaql-conformance-fixtures 0.1.0"` vs `1.0.0-alpha.1` with no comment
Add summary line to `verify-fixtures` output	❌ Open — still no totals line
Comment on `/at\s+\S+/i` regex trade-off	❌ Open — no comment added

New Bugs

1. --category filter doesn't affect exit code · scripts/run-conformance-tests.mjs:796-809

summarize() computes report.exitCode across all categories before the category filter is applied. The filtered output shown to the user reflects one category, but process.exit(report.exitCode) uses the unfiltered result. A user scripting against --category "Introspection Fidelity" will get a misleading exit code if a different category fails.

const report = evaluateFixture(fixture, options.level, options.tier); // exitCode set here
if (options.category) {
  report.categories = report.categories.filter(...); // display-only filter
}
process.exit(report.exitCode); // reflects ALL categories, not the filtered one

The fix is to recompute report.exitCode from the filtered category set, or to document explicitly that --category is display-only and exit code always reflects the full evaluation.

Fix this →

2. Exit code 5 is documented but never emitted · docs/conformance-testing.md:469

The exit codes table lists:

5 | Internal error | Unexpected error in test runner

There is no process.exit(5) anywhere in the script. An unhandled exception falls through to Node's default exit code 1 — not 5. Either remove exit code 5 from the table, or add a top-level process.on('uncaughtException', ...) handler that writes to stderr and exits with 5.

Fix this →

Documentation Gaps

3. Section 2.2 "Test Categories Required" omits "Level 2 Features" · docs/conformance-testing.md:98-101

Section 2.2 lists only two optional test categories for Level 2:

Constraint Documentation (SHOULD PASS)

Semantic Evaluation (SHOULD PASS)

But the runner always adds evaluateLevel2Capabilities (endpoint modes, field selection, batch) as a third SHOULD category at level ≥ 2 (line 665), and section 4.2 correctly lists it. Section 2.2 is incomplete — implementors reading only that section won't know their evidence bundle needs endpoint_modes, capabilities, and batch_example fields.

Fix this →

4. Endpoint mode naming inconsistency: "Semantic" vs "crude" · docs/conformance-testing.md:90, scripts/run-conformance-tests.mjs:573

Section 2.2 requirements table says:

| Endpoint modes | Semantic endpoint mode and Single mode supported |

But evaluateLevel2Capabilities tests:

endpointModes.includes("crude") && endpointModes.includes("single")

And the failure message reads: "Level 2 evidence must include both crude and single endpoint modes."

All four reference fixtures use "crude" as the value. An implementor writing their first evidence bundle who reads section 2.2 would likely write "semantic" and get a confusing failure. Either update section 2.2 to say "CRUDE endpoint mode and Single mode", or add a note that "crude" is the fixture value for what the spec calls the semantic endpoint mode.

Fix this →

5. Section 6.1 JSON example omits requestedLevel field · docs/conformance-testing.md:381-404

summarize() (line 647) always emits a requestedLevel field alongside conformanceLevel. The example in section 6.1 doesn't include requestedLevel, so anyone building a report consumer who follows the spec example will miss the field. A one-line addition to the example JSON resolves this.

Remaining from Pass 2 (Minor, no change)

6. verify-fixtures prints no totals line — still ends with the last per-fixture line, no "N/M fixtures passed." summary.

7. version string mismatch — "mcpaql-conformance-fixtures 0.1.0" (line 877) vs package.json 1.0.0-alpha.1 with no explanatory comment.

8. /at\s+\S+/i pattern comment missing — would flag "available at docs.example.com", worth a one-line comment noting the intentional trade-off.

9. Empty round_trip_tests/update_preservation_tests trivially PASS — reference-failure.json has [] for both arrays, so Round-Trip Integrity PASSes in a fixture designed to demonstrate failure. A note in the fixture (or a SKIP result when arrays are empty) would improve diagnostic clarity.

Summary

The runner is solid and the two previous passes cleaned up the substantive bugs. What remains:

Priority	Item
Bug	`--category` exit code doesn't reflect filtered output
Bug	Exit code 5 documented, never emitted
Docs	Section 2.2 missing "Level 2 Features" test category
Docs	"Semantic" vs `"crude"` endpoint mode naming inconsistency
Docs	Section 6.1 JSON example missing `requestedLevel` field
Minor	`verify-fixtures` no totals line (pass 2, still open)
Minor	`version` string / `package.json` mismatch (pass 2, still open)
Minor	`/at\s+\S+/i` comment missing (pass 2, still open)
Design	Empty round-trip arrays trivially PASS (pass 2, still open)

Items 1–5 are quick fixes. Items 6–9 can be addressed as follow-on cleanup.

mickdarling force-pushed the feature/conformance-suite branch from 879b86f to 98e5e6d Compare April 15, 2026 21:15

mickdarling mentioned this pull request Apr 15, 2026

Follow up conformance-suite polish after PR #248 #253

Open

mickdarling added 4 commits April 15, 2026 17:53

spec: add fixture-driven conformance suite

73a0555

spec: align conformance runner docs and behavior

39e3f7e

spec: tighten conformance runner docs

be3221b

spec: tighten conformance suite reporting

a1f46e0

mickdarling force-pushed the feature/conformance-suite branch from 93c322a to a1f46e0 Compare April 15, 2026 21:53

mickdarling merged commit c83e1e3 into develop Apr 15, 2026
5 checks passed

mickdarling deleted the feature/conformance-suite branch April 15, 2026 21:53

mickdarling mentioned this pull request Apr 15, 2026

release: merge develop into main #254

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec: add fixture-driven conformance suite#248

spec: add fixture-driven conformance suite#248
mickdarling merged 4 commits intodevelopfrom
feature/conformance-suite

mickdarling commented Apr 15, 2026

Uh oh!

mickdarling commented Apr 15, 2026

Uh oh!

claude bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

mickdarling commented Apr 15, 2026

Uh oh!

claude bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

mickdarling commented Apr 15, 2026

Uh oh!

claude bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mickdarling commented Apr 15, 2026

Summary

Related

Testing

Uh oh!

mickdarling commented Apr 15, 2026

Uh oh!

claude bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: spec: add fixture-driven conformance suite

Overview

Bugs / Inconsistencies

Gaps

Minor / Style

Summary

Uh oh!

mickdarling commented Apr 15, 2026

Uh oh!

claude bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: spec: add fixture-driven conformance suite

Bugs

Documentation Gaps

Design Gaps

Minor

Summary

Uh oh!

mickdarling commented Apr 15, 2026

Uh oh!

claude bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review (Pass 3): spec: add fixture-driven conformance suite

Status of Prior Review Items

New Bugs

Documentation Gaps

Remaining from Pass 2 (Minor, no change)

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Apr 15, 2026 •

edited

Loading

PR Review: `spec: add fixture-driven conformance suite`

claude bot commented Apr 15, 2026 •

edited

Loading

PR Review: `spec: add fixture-driven conformance suite`

claude bot commented Apr 15, 2026 •

edited

Loading

PR Review (Pass 3): `spec: add fixture-driven conformance suite`