Skip to content

blackwell-systems/mcp-assert

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mcp-assert

Test any MCP server in any language. No SDK required. No LLM required.

A single Go binary that starts your MCP server over stdio, calls your tools, and asserts the results. Define assertions in YAML, run them in CI. Works with servers written in Go, TypeScript, Python, Rust, Java — anything that speaks MCP.

Why

Every existing MCP eval framework uses LLM-as-judge: send a prompt, get a response, ask GPT "was this good?" on a 1-5 scale. This makes sense for subjective outputs. It's the wrong approach for deterministic tools.

When read_file is called with a known path, the correct answer is the file's contents. When search_nodes is called after creating an entity, the entity should appear. The tools either return the right results or they don't. No LLM needed. No API costs. No false variance.

mcp-assert tests MCP server tools the way you test code: given this input, assert this output.

Why not just write tests in Go/Python/etc?

You could. The assertion logic is straightforward. What you'd have to build yourself:

  • MCP protocol bootstrapping — stdio transport, JSON-RPC framing, initialize/initialized handshake, tool call request/response lifecycle. This is ~200 lines of boilerplate per test suite, and easy to get wrong.
  • Server-agnostic test runner — your Go tests are coupled to your Go server. mcp-assert tests any server from any language with the same YAML. Switch server.command from npx my-ts-server to python -m my_server and the assertions don't change.
  • Eval-framework features — pass@k/pass^k reliability metrics, baseline regression detection, JUnit XML output, Docker isolation, cross-language matrix mode. These are eval concerns, not unit test concerns. Go's testing package doesn't have opinions about them.

The value isn't in the assertion logic. It's in not writing MCP client boilerplate, having one tool that works across every MCP server regardless of implementation language, and getting CI-grade reporting for free.

Quick Start

go install github.com/blackwell-systems/mcp-assert@latest

Define an assertion for any MCP server — here's one for the TypeScript filesystem server:

# evals/read_file.yaml
name: read_file returns file contents
server:
  command: npx
  args: ["@modelcontextprotocol/server-filesystem", "{{fixture}}"]
assert:
  tool: read_file
  args:
    path: "{{fixture}}/hello.txt"
  expect:
    not_error: true
    contains: ["Hello, world!"]

Run it:

mcp-assert run --suite evals/ --fixture ./fixtures

Works the same for a Go server, a Python server, or anything else that speaks MCP:

# Same assertion format, different server
server:
  command: python
  args: ["-m", "my_mcp_server"]
server:
  command: agent-lsp
  args: ["go:gopls"]

Example Suites

mcp-assert ships with example assertions for three MCP servers:

Filesystem server (examples/filesystem/)

Tests the official @modelcontextprotocol/server-filesystem. 5 assertions: read file, list directory, get file info, search files, and a negative test that verifies path traversal is rejected.

npm install -g @modelcontextprotocol/server-filesystem
mcp-assert run --suite examples/filesystem --fixture examples/filesystem/fixtures

Memory server (examples/memory/)

Tests the official @modelcontextprotocol/server-memory. 5 assertions with stateful setup: create entities, add observations, create relations, search nodes, and verify empty search returns nothing.

npm install -g @modelcontextprotocol/server-memory
mcp-assert run --suite examples/memory

agent-lsp (examples/agent-lsp-go/)

Tests agent-lsp with gopls. 7 assertions: hover, definition, references, diagnostics, symbols, completions, and speculative execution.

mcp-assert run --suite examples/agent-lsp-go --fixture /path/to/go/fixtures

Server Override

Override the server config from CLI instead of repeating it in every YAML file:

mcp-assert run --suite evals/ --server "agent-lsp go:gopls" --fixture test/fixtures/go

Cross-Language Matrix

Run the same assertions across multiple language servers:

mcp-assert matrix \
  --suite evals/ \
  --languages go:gopls,typescript:typescript-language-server,python:pyright-langserver
                     hover           definition        references     completions
Go (gopls)           PASS            PASS              PASS           PASS
TypeScript (tsserver) PASS            PASS              PASS           PASS
Python (pyright)     PASS            PASS              SKIP           PASS

CI Integration

# Fail the build if any assertion regresses
mcp-assert ci --suite evals/ --fail-on-regression

# Set a minimum pass threshold
mcp-assert ci --suite evals/ --threshold 95

# Override server from CLI
mcp-assert ci --suite evals/ --server "my-mcp-server" --threshold 100

GitHub Action:

- name: Assert MCP server correctness
  run: |
    go install github.com/blackwell-systems/mcp-assert@latest
    mcp-assert ci --suite evals/ --threshold 95 --junit results.xml

- name: Upload test results
  if: always()
  uses: actions/upload-artifact@v4
  with:
    name: mcp-assert-results
    path: results.xml

Structured Reporting

# JUnit XML for CI test result tabs (GitHub Actions, Jenkins, CircleCI)
mcp-assert run --suite evals/ --junit results.xml

# GitHub Step Summary (auto-detects $GITHUB_STEP_SUMMARY in ci mode)
mcp-assert ci --suite evals/ --markdown summary.md

# shields.io badge endpoint
mcp-assert run --suite evals/ --badge badge.json
# Then use: ![mcp-assert](https://img.shields.io/endpoint?url=<badge-url>)

How It Differs

Dimension Existing MCP evals mcp-assert
Grading LLM-as-judge (subjective, costly) Deterministic assertions (exact, free)
Speed Seconds per test (LLM round-trip) Milliseconds per test (no LLM)
CI cost API calls on every run Zero external dependencies
Reliability Not measured pass@k / pass^k per assertion
Regression Not supported Baseline comparison, fail on backslide
Docker Not supported Per-assertion container isolation
Multi-language Not supported Same assertion across N language servers

Assertion Types

Assertion What it checks
contains Response text contains all specified strings
not_contains Response text does not contain any of the specified strings
equals Response exactly matches expected value (whitespace-trimmed)
matches_regex Response matches all specified regex patterns
json_path JSON field at $.dot.path matches expected value
min_results Array result has at least N items
max_results Array result has at most N items
not_empty Response is non-empty and not null/[]/{}
not_error Tool response has isError: false
is_error Tool response has isError: true (for negative testing)
file_contains After tool execution, file on disk contains expected text
file_unchanged File on disk was not modified by the tool
net_delta Speculative execution diagnostic delta equals N
in_order Substrings appear in the specified order within the response

Assertion File Format

name: Human-readable description
server:
  command: path/to/mcp-server
  args: ["arg1", "arg2"]
  env:
    KEY: value
setup:
  - tool: setup_tool
    args: { key: value }
  - tool: another_setup_tool
    args: { key: value }
assert:
  tool: tool_under_test
  args: { key: value }
  expect:
    not_error: true
    contains: ["expected", "strings"]
    not_contains: ["unexpected"]
    matches_regex: ["\\d+ items"]
    json_path:
      "$.locations[0].file": "main.go"
    min_results: 3
timeout: 30s

The {{fixture}} placeholder in args is replaced with the --fixture directory at runtime.

Docker Isolation

Run each assertion in a fresh Docker container for reproducibility:

mcp-assert run --suite evals/ --docker ghcr.io/blackwell-systems/agent-lsp:go --fixture /workspace

The fixture directory is mounted into the container. Each assertion gets a clean environment — no cross-test contamination, no "works on my machine."

Reliability Metrics

Run multiple trials to measure consistency:

mcp-assert run --suite evals/ --trials 5
PASS  hover returns type info                 690ms
PASS  hover returns type info                 650ms
PASS  hover returns type info                 710ms
FAIL  get_references finds cross-file callers 90001ms
      tool call get_references failed: context deadline exceeded
PASS  get_references finds cross-file callers 27305ms

Reliability:
  Assertion                                     Trials  Passed    pass@k  pass^k
  ------------------------------------------    ------  ------  --------  ------
  hover returns type info                            3       3       YES     YES
  get_references finds cross-file callers            2       1       YES      NO

  pass@k: 2/2 capable, pass^k: 1/2 reliable
  • pass@k (capability): Did the assertion pass at least once? If NO, the tool is broken.
  • pass^k (reliability): Did the assertion pass every time? If NO, the tool is flaky.

Regression Detection

Save a baseline, then detect regressions on future runs:

# Save current results as baseline
mcp-assert run --suite evals/ --save-baseline baseline.json

# Later: compare against baseline
mcp-assert ci --suite evals/ --baseline baseline.json --fail-on-regression
Regressions detected (1):
  get_references finds cross-file callers: was PASS, now FAIL
error: 1 regression(s) detected

Only flags transitions from PASS to FAIL. Previously-failing tests that still fail are not regressions. New tests that fail are not regressions.

License

MIT

About

Deterministic correctness testing for MCP servers. Assert your tools return the right results, not just any results. No LLM-as-judge.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages