mcp-assert

Test any MCP server in any language. No SDK required. No LLM required.

A single Go binary that starts your MCP server over stdio, calls your tools, and asserts the results. Define assertions in YAML, run them in CI. Works with servers written in Go, TypeScript, Python, Rust, Java — anything that speaks MCP.

Why

Every existing MCP eval framework uses LLM-as-judge: send a prompt, get a response, ask GPT "was this good?" on a 1-5 scale. This makes sense for subjective outputs. It's the wrong approach for deterministic tools.

When read_file is called with a known path, the correct answer is the file's contents. When search_nodes is called after creating an entity, the entity should appear. The tools either return the right results or they don't. No LLM needed. No API costs. No false variance.

mcp-assert tests MCP server tools the way you test code: given this input, assert this output.

Why not just write tests in Go/Python/etc?

You could. The assertion logic is straightforward. What you'd have to build yourself:

MCP protocol bootstrapping — stdio transport, JSON-RPC framing, initialize/initialized handshake, tool call request/response lifecycle. This is ~200 lines of boilerplate per test suite, and easy to get wrong.
Server-agnostic test runner — your Go tests are coupled to your Go server. mcp-assert tests any server from any language with the same YAML. Switch server.command from npx my-ts-server to python -m my_server and the assertions don't change.
Eval-framework features — pass@k/pass^k reliability metrics, baseline regression detection, JUnit XML output, Docker isolation, cross-language matrix mode. These are eval concerns, not unit test concerns. Go's testing package doesn't have opinions about them.

The value isn't in the assertion logic. It's in not writing MCP client boilerplate, having one tool that works across every MCP server regardless of implementation language, and getting CI-grade reporting for free.

Quick Start

go install github.com/blackwell-systems/mcp-assert@latest

Define an assertion for any MCP server — here's one for the TypeScript filesystem server:

# evals/read_file.yaml
name: read_file returns file contents
server:
  command: npx
  args: ["@modelcontextprotocol/server-filesystem", "{{fixture}}"]
assert:
  tool: read_file
  args:
    path: "{{fixture}}/hello.txt"
  expect:
    not_error: true
    contains: ["Hello, world!"]

Run it:

mcp-assert run --suite evals/ --fixture ./fixtures

Works the same for a Go server, a Python server, or anything else that speaks MCP:

# Same assertion format, different server
server:
  command: python
  args: ["-m", "my_mcp_server"]

server:
  command: agent-lsp
  args: ["go:gopls"]

Example Suites

mcp-assert ships with example assertions for three MCP servers:

Filesystem server (`examples/filesystem/`)

Tests the official @modelcontextprotocol/server-filesystem. 5 assertions: read file, list directory, get file info, search files, and a negative test that verifies path traversal is rejected.

npm install -g @modelcontextprotocol/server-filesystem
mcp-assert run --suite examples/filesystem --fixture examples/filesystem/fixtures

Memory server (`examples/memory/`)

Tests the official @modelcontextprotocol/server-memory. 5 assertions with stateful setup: create entities, add observations, create relations, search nodes, and verify empty search returns nothing.

npm install -g @modelcontextprotocol/server-memory
mcp-assert run --suite examples/memory

agent-lsp (`examples/agent-lsp-go/`)

Tests agent-lsp with gopls. 7 assertions: hover, definition, references, diagnostics, symbols, completions, and speculative execution.

mcp-assert run --suite examples/agent-lsp-go --fixture /path/to/go/fixtures

Server Override

Override the server config from CLI instead of repeating it in every YAML file:

mcp-assert run --suite evals/ --server "agent-lsp go:gopls" --fixture test/fixtures/go

Cross-Language Matrix

Run the same assertions across multiple language servers:

mcp-assert matrix \
  --suite evals/ \
  --languages go:gopls,typescript:typescript-language-server,python:pyright-langserver

                     hover           definition        references     completions
Go (gopls)           PASS            PASS              PASS           PASS
TypeScript (tsserver) PASS            PASS              PASS           PASS
Python (pyright)     PASS            PASS              SKIP           PASS

CI Integration

# Fail the build if any assertion regresses
mcp-assert ci --suite evals/ --fail-on-regression

# Set a minimum pass threshold
mcp-assert ci --suite evals/ --threshold 95

# Override server from CLI
mcp-assert ci --suite evals/ --server "my-mcp-server" --threshold 100

GitHub Action:

- name: Assert MCP server correctness
  run: |
    go install github.com/blackwell-systems/mcp-assert@latest
    mcp-assert ci --suite evals/ --threshold 95 --junit results.xml

- name: Upload test results
  if: always()
  uses: actions/upload-artifact@v4
  with:
    name: mcp-assert-results
    path: results.xml

Structured Reporting

# JUnit XML for CI test result tabs (GitHub Actions, Jenkins, CircleCI)
mcp-assert run --suite evals/ --junit results.xml

# GitHub Step Summary (auto-detects $GITHUB_STEP_SUMMARY in ci mode)
mcp-assert ci --suite evals/ --markdown summary.md

# shields.io badge endpoint
mcp-assert run --suite evals/ --badge badge.json
# Then use: ![mcp-assert](https://img.shields.io/endpoint?url=<badge-url>)

How It Differs

Dimension	Existing MCP evals	mcp-assert
Grading	LLM-as-judge (subjective, costly)	Deterministic assertions (exact, free)
Speed	Seconds per test (LLM round-trip)	Milliseconds per test (no LLM)
CI cost	API calls on every run	Zero external dependencies
Reliability	Not measured	pass@k / pass^k per assertion
Regression	Not supported	Baseline comparison, fail on backslide
Docker	Not supported	Per-assertion container isolation
Multi-language	Not supported	Same assertion across N language servers

Assertion Types

Assertion	What it checks
`contains`	Response text contains all specified strings
`not_contains`	Response text does not contain any of the specified strings
`equals`	Response exactly matches expected value (whitespace-trimmed)
`matches_regex`	Response matches all specified regex patterns
`json_path`	JSON field at `$.dot.path` matches expected value
`min_results`	Array result has at least N items
`max_results`	Array result has at most N items
`not_empty`	Response is non-empty and not `null`/`[]`/`{}`
`not_error`	Tool response has `isError: false`
`is_error`	Tool response has `isError: true` (for negative testing)
`file_contains`	After tool execution, file on disk contains expected text
`file_unchanged`	File on disk was not modified by the tool
`net_delta`	Speculative execution diagnostic delta equals N
`in_order`	Substrings appear in the specified order within the response

Assertion File Format

name: Human-readable description
server:
  command: path/to/mcp-server
  args: ["arg1", "arg2"]
  env:
    KEY: value
setup:
  - tool: setup_tool
    args: { key: value }
  - tool: another_setup_tool
    args: { key: value }
assert:
  tool: tool_under_test
  args: { key: value }
  expect:
    not_error: true
    contains: ["expected", "strings"]
    not_contains: ["unexpected"]
    matches_regex: ["\\d+ items"]
    json_path:
      "$.locations[0].file": "main.go"
    min_results: 3
timeout: 30s

The {{fixture}} placeholder in args is replaced with the --fixture directory at runtime.

Docker Isolation

Run each assertion in a fresh Docker container for reproducibility:

mcp-assert run --suite evals/ --docker ghcr.io/blackwell-systems/agent-lsp:go --fixture /workspace

The fixture directory is mounted into the container. Each assertion gets a clean environment — no cross-test contamination, no "works on my machine."

Reliability Metrics

Run multiple trials to measure consistency:

mcp-assert run --suite evals/ --trials 5

PASS  hover returns type info                 690ms
PASS  hover returns type info                 650ms
PASS  hover returns type info                 710ms
FAIL  get_references finds cross-file callers 90001ms
      tool call get_references failed: context deadline exceeded
PASS  get_references finds cross-file callers 27305ms

Reliability:
  Assertion                                     Trials  Passed    pass@k  pass^k
  ------------------------------------------    ------  ------  --------  ------
  hover returns type info                            3       3       YES     YES
  get_references finds cross-file callers            2       1       YES      NO

  pass@k: 2/2 capable, pass^k: 1/2 reliable

pass@k (capability): Did the assertion pass at least once? If NO, the tool is broken.
pass^k (reliability): Did the assertion pass every time? If NO, the tool is flaky.

Regression Detection

Save a baseline, then detect regressions on future runs:

# Save current results as baseline
mcp-assert run --suite evals/ --save-baseline baseline.json

# Later: compare against baseline
mcp-assert ci --suite evals/ --baseline baseline.json --fail-on-regression

Regressions detected (1):
  get_references finds cross-file callers: was PASS, now FAIL
error: 1 regression(s) detected

Only flags transitions from PASS to FAIL. Previously-failing tests that still fail are not regressions. New tests that fail are not regressions.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
cmd/mcp-assert		cmd/mcp-assert
examples		examples
internal		internal
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mcp-assert

Why

Why not just write tests in Go/Python/etc?

Quick Start

Example Suites

Filesystem server (`examples/filesystem/`)

Memory server (`examples/memory/`)

agent-lsp (`examples/agent-lsp-go/`)

Server Override

Cross-Language Matrix

CI Integration

Structured Reporting

How It Differs

Assertion Types

Assertion File Format

Docker Isolation

Reliability Metrics

Regression Detection

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mcp-assert

Why

Why not just write tests in Go/Python/etc?

Quick Start

Example Suites

Filesystem server (examples/filesystem/)

Memory server (examples/memory/)

agent-lsp (examples/agent-lsp-go/)

Server Override

Cross-Language Matrix

CI Integration

Structured Reporting

How It Differs

Assertion Types

Assertion File Format

Docker Isolation

Reliability Metrics

Regression Detection

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Filesystem server (`examples/filesystem/`)

Memory server (`examples/memory/`)

agent-lsp (`examples/agent-lsp-go/`)

Packages