Test any MCP server in any language. No SDK required. No LLM required.
A single Go binary that starts your MCP server over stdio, calls your tools, and asserts the results. Define assertions in YAML, run them in CI. Works with servers written in Go, TypeScript, Python, Rust, Java — anything that speaks MCP.
Every existing MCP eval framework uses LLM-as-judge: send a prompt, get a response, ask GPT "was this good?" on a 1-5 scale. This makes sense for subjective outputs. It's the wrong approach for deterministic tools.
When read_file is called with a known path, the correct answer is the file's contents. When search_nodes is called after creating an entity, the entity should appear. The tools either return the right results or they don't. No LLM needed. No API costs. No false variance.
mcp-assert tests MCP server tools the way you test code: given this input, assert this output.
You could. The assertion logic is straightforward. What you'd have to build yourself:
- MCP protocol bootstrapping — stdio transport, JSON-RPC framing, initialize/initialized handshake, tool call request/response lifecycle. This is ~200 lines of boilerplate per test suite, and easy to get wrong.
- Server-agnostic test runner — your Go tests are coupled to your Go server. mcp-assert tests any server from any language with the same YAML. Switch
server.commandfromnpx my-ts-servertopython -m my_serverand the assertions don't change. - Eval-framework features — pass@k/pass^k reliability metrics, baseline regression detection, JUnit XML output, Docker isolation, cross-language matrix mode. These are eval concerns, not unit test concerns. Go's
testingpackage doesn't have opinions about them.
The value isn't in the assertion logic. It's in not writing MCP client boilerplate, having one tool that works across every MCP server regardless of implementation language, and getting CI-grade reporting for free.
go install github.com/blackwell-systems/mcp-assert@latestDefine an assertion for any MCP server — here's one for the TypeScript filesystem server:
# evals/read_file.yaml
name: read_file returns file contents
server:
command: npx
args: ["@modelcontextprotocol/server-filesystem", "{{fixture}}"]
assert:
tool: read_file
args:
path: "{{fixture}}/hello.txt"
expect:
not_error: true
contains: ["Hello, world!"]Run it:
mcp-assert run --suite evals/ --fixture ./fixturesWorks the same for a Go server, a Python server, or anything else that speaks MCP:
# Same assertion format, different server
server:
command: python
args: ["-m", "my_mcp_server"]server:
command: agent-lsp
args: ["go:gopls"]mcp-assert ships with example assertions for three MCP servers:
Tests the official @modelcontextprotocol/server-filesystem. 5 assertions: read file, list directory, get file info, search files, and a negative test that verifies path traversal is rejected.
npm install -g @modelcontextprotocol/server-filesystem
mcp-assert run --suite examples/filesystem --fixture examples/filesystem/fixturesTests the official @modelcontextprotocol/server-memory. 5 assertions with stateful setup: create entities, add observations, create relations, search nodes, and verify empty search returns nothing.
npm install -g @modelcontextprotocol/server-memory
mcp-assert run --suite examples/memoryTests agent-lsp with gopls. 7 assertions: hover, definition, references, diagnostics, symbols, completions, and speculative execution.
mcp-assert run --suite examples/agent-lsp-go --fixture /path/to/go/fixturesOverride the server config from CLI instead of repeating it in every YAML file:
mcp-assert run --suite evals/ --server "agent-lsp go:gopls" --fixture test/fixtures/goRun the same assertions across multiple language servers:
mcp-assert matrix \
--suite evals/ \
--languages go:gopls,typescript:typescript-language-server,python:pyright-langserver hover definition references completions
Go (gopls) PASS PASS PASS PASS
TypeScript (tsserver) PASS PASS PASS PASS
Python (pyright) PASS PASS SKIP PASS
# Fail the build if any assertion regresses
mcp-assert ci --suite evals/ --fail-on-regression
# Set a minimum pass threshold
mcp-assert ci --suite evals/ --threshold 95
# Override server from CLI
mcp-assert ci --suite evals/ --server "my-mcp-server" --threshold 100GitHub Action:
- name: Assert MCP server correctness
run: |
go install github.com/blackwell-systems/mcp-assert@latest
mcp-assert ci --suite evals/ --threshold 95 --junit results.xml
- name: Upload test results
if: always()
uses: actions/upload-artifact@v4
with:
name: mcp-assert-results
path: results.xml# JUnit XML for CI test result tabs (GitHub Actions, Jenkins, CircleCI)
mcp-assert run --suite evals/ --junit results.xml
# GitHub Step Summary (auto-detects $GITHUB_STEP_SUMMARY in ci mode)
mcp-assert ci --suite evals/ --markdown summary.md
# shields.io badge endpoint
mcp-assert run --suite evals/ --badge badge.json
# Then use: | Dimension | Existing MCP evals | mcp-assert |
|---|---|---|
| Grading | LLM-as-judge (subjective, costly) | Deterministic assertions (exact, free) |
| Speed | Seconds per test (LLM round-trip) | Milliseconds per test (no LLM) |
| CI cost | API calls on every run | Zero external dependencies |
| Reliability | Not measured | pass@k / pass^k per assertion |
| Regression | Not supported | Baseline comparison, fail on backslide |
| Docker | Not supported | Per-assertion container isolation |
| Multi-language | Not supported | Same assertion across N language servers |
| Assertion | What it checks |
|---|---|
contains |
Response text contains all specified strings |
not_contains |
Response text does not contain any of the specified strings |
equals |
Response exactly matches expected value (whitespace-trimmed) |
matches_regex |
Response matches all specified regex patterns |
json_path |
JSON field at $.dot.path matches expected value |
min_results |
Array result has at least N items |
max_results |
Array result has at most N items |
not_empty |
Response is non-empty and not null/[]/{} |
not_error |
Tool response has isError: false |
is_error |
Tool response has isError: true (for negative testing) |
file_contains |
After tool execution, file on disk contains expected text |
file_unchanged |
File on disk was not modified by the tool |
net_delta |
Speculative execution diagnostic delta equals N |
in_order |
Substrings appear in the specified order within the response |
name: Human-readable description
server:
command: path/to/mcp-server
args: ["arg1", "arg2"]
env:
KEY: value
setup:
- tool: setup_tool
args: { key: value }
- tool: another_setup_tool
args: { key: value }
assert:
tool: tool_under_test
args: { key: value }
expect:
not_error: true
contains: ["expected", "strings"]
not_contains: ["unexpected"]
matches_regex: ["\\d+ items"]
json_path:
"$.locations[0].file": "main.go"
min_results: 3
timeout: 30sThe {{fixture}} placeholder in args is replaced with the --fixture directory at runtime.
Run each assertion in a fresh Docker container for reproducibility:
mcp-assert run --suite evals/ --docker ghcr.io/blackwell-systems/agent-lsp:go --fixture /workspaceThe fixture directory is mounted into the container. Each assertion gets a clean environment — no cross-test contamination, no "works on my machine."
Run multiple trials to measure consistency:
mcp-assert run --suite evals/ --trials 5PASS hover returns type info 690ms
PASS hover returns type info 650ms
PASS hover returns type info 710ms
FAIL get_references finds cross-file callers 90001ms
tool call get_references failed: context deadline exceeded
PASS get_references finds cross-file callers 27305ms
Reliability:
Assertion Trials Passed pass@k pass^k
------------------------------------------ ------ ------ -------- ------
hover returns type info 3 3 YES YES
get_references finds cross-file callers 2 1 YES NO
pass@k: 2/2 capable, pass^k: 1/2 reliable
- pass@k (capability): Did the assertion pass at least once? If NO, the tool is broken.
- pass^k (reliability): Did the assertion pass every time? If NO, the tool is flaky.
Save a baseline, then detect regressions on future runs:
# Save current results as baseline
mcp-assert run --suite evals/ --save-baseline baseline.json
# Later: compare against baseline
mcp-assert ci --suite evals/ --baseline baseline.json --fail-on-regressionRegressions detected (1):
get_references finds cross-file callers: was PASS, now FAIL
error: 1 regression(s) detected
Only flags transitions from PASS to FAIL. Previously-failing tests that still fail are not regressions. New tests that fail are not regressions.
MIT