Pragma

The test-gaming detector for the agentic era.

Your AI assistant just made the tests pass. The question is whether it tested anything. Pragma reads what got written, and refuses the patterns that look like work but verify nothing.

A Claude Code plugin. A small CLI. Three tiers of defense, each one catching a different kind of cheat.

Three Tiers, Layered

Tier 1 — AST classifier. Fast, deterministic, ~10ms. Catches the obvious stuff: assert True, mock.patch on the function under test, pytest.skip smuggled into a body, vi.spyOn(...).mockReturnValue. Always on.
Tier 2 — coverage-of-target gate. Runs the test under coverage instrumentation, then asks: did the production code's lines actually execute? If the answer is no, the test isn't a test. Opt in with --with-coverage.
Tier 3 — the LLM judge. A small model reads the test alongside the production code and decides whether the test verifies behavior or just confidently asserts on its own mocks. Powered by DeepSeek by default; any OpenAI-compatible endpoint works. Opt in with --with-llm.

Each tier catches what the previous one misses. Combined, they reach patterns AST alone cannot — orphan tests, monkeypatched fakes, inline shadow classes, vi.mock on default exports, the lot.

What It Catches — Python

Verdict	Pattern	Blocked?
`python.tautological`	`assert True` / `assert 1 == 1` / `assert x == x`	yes
`python.mocked-away`	`mock.patch("auth.login.login")` inside a test of `auth.login.login`	yes
`python.monkeypatched`	`monkeypatch.setattr` targets the function under test	yes
`python.module_attr_reassignment`	`import pricing; pricing.discount = stub`	yes
`python.module_shimmed`	`sys.modules["X"] = types.ModuleType("X")` swap	yes
`python.swallowed`	`try: <call>; except: pass` swallows the call under test	yes
`python.skipped`	`pytest.skip(...)` / `xfail` smuggled at top of body	yes
`python.xfail_gaming`	`@pytest.mark.xfail(strict=True)` lets the stub ship green	yes
`python.mismatched`	name says `test__rejects_` etc. but body has no `pytest.raises`	yes
`python.conditional`	every assertion lives inside an `if`/`for`/`while` branch	yes
`python.orphan_test`	`test_X.py` never imports `X`; redefines a fake locally	yes
`python.stub_error_match`	every `pytest.raises(...)` is `NotImplementedError`, `Exception`, or `match="not implemented"`, no other assert validates real value	yes
`python.no_success_assertion`	file-level: imports a target but no test calls it and asserts on a real return value	yes
`python.target_not_covered`	tier 2: test ran but the target's lines had zero hits	yes
`python.semantic_gaming`	tier 3: the LLM judge says the test verifies nothing	warn
`python.empty_body`	test body has no assertion and no `pytest.raises`	warn
`python.parametrize_thin`	`@parametrize` with 0 or 1 case values	warn
`python.weak`	`assert x is not None` when an exact value was expected	warn
`python.verified`	calls the production target, asserts on return / raised exception	pass

What It Catches — Vitest (TypeScript / JavaScript)

Verdict	Pattern	Blocked?
`vitest.tautological`	`expect(true).toBe(true)` / `expect(x).toBe(x)`	yes
`vitest.mocked-away`	`vi.mock("./module")` or `vi.spyOn(...).mockReturnValue(...)` on the target	yes
`vitest.swallowed`	`try { call(); } catch (_) {}` swallows the call	yes
`vitest.skipped`	`it.skip(...)` / `xit(...)` / `it.todo(...)`	yes
`vitest.mismatched`	name says `_throws_` but no `expect(...).toThrow*()`	yes
`vitest.stub_error_match`	every `.toThrow(...)` is stub-shaped — stub-phrase string/regex, bare `.toThrow()`, or bare `Error` class — and no other `expect(value)...` validates real behavior	yes
`vitest.no_success_assertion`	file-level: imports a target but no test calls it and asserts on a real return value	yes
`vitest.conditional`	every `expect()` lives inside an `if`/`for`/`while`	yes
`vitest.orphan_mock`	`const m = vi.fn().mockReturnValue(L); expect(m()).toEqual(L)`	yes
`vitest.target_not_covered`	tier 2: test ran but the target's lines had zero V8 hits	yes
`vitest.semantic_gaming`	tier 3: the LLM judge says the test verifies nothing	warn
`vitest.empty_body`	test callback has no `expect()`	warn
`vitest.verified`	calls the production target, asserts on return / thrown error	pass

What It Catches — Jest (TypeScript / JavaScript)

Jest support uses the same rule chain as Vitest, plus one Jest-only verdict for the test.failing shape that Vitest doesn't have. Substitute jest. for the vitest. prefix in the table above (the vi.mock patterns become jest.mock, vi.fn becomes jest.fn, etc.). Plus:

Verdict	Pattern	Blocked?
`jest.test_failing_gaming`	`test.failing("name", () => { throw ... })` / `it.failing(...)` — pins a stub's throw, the runner's xfail-strict equivalent	yes

expected: success | reject is inferred from the test name. Production target (module.symbol) is inferred from the imports. Zero config to start.

Install

pipx install pragma

In Claude Code:

/plugin install pragma@joncik91/pragma

That's it. The plugin's PreToolUse hook scans every Write of a file matching test_*.py / */tests/*.py / *.test.ts etc.; PostToolUse re-scans on disk to catch Edit cases. Tier 1 always runs. Tier 2 is on by default in the hook (set PRAGMA_COVERAGE_DEFAULT_OFF=1 to disable). Tier 3 is opt-in via PRAGMA_HOOK_WITH_LLM=1.

Tier 3 — Bring Your Own LLM

Tier 3 is provider-agnostic. Set an API key for any OpenAI-compatible endpoint. DeepSeek is the default — fast, cheap, caches on its own.

export PRAGMA_LLM_API_KEY=sk-...           # DeepSeek by default
export PRAGMA_HOOK_WITH_LLM=1              # turn it on in the plugin

Want a different provider? Override the URL and model:

export PRAGMA_LLM_BASE_URL=https://api.openai.com/v1
export PRAGMA_LLM_MODEL=gpt-4o-mini

Local models work too — point at Ollama, LM Studio, vLLM, anything that speaks /v1/chat/completions.

Use Without Claude Code

The CLI works on its own.

# tier 1 only — fast, deterministic
pragma verify tests path/to/test_login.py

# tier 1 + tier 2 — slower, but catches "imported but never called"
pragma verify tests path/to/test_login.py --with-coverage

# all three tiers — catches semantic gaming AST and coverage can't see
pragma verify tests path/to/test_login.py --with-coverage --with-llm

Exit 1 + JSON when blocking. --human for one-line-per-test output.

To wire the AST classifier into pre-commit:

pragma init-precommit

Drops a .pre-commit-config.yaml calling pragma verify tests on staged test files. See docs/PRECOMMIT.md for the manual snippet.

Why

Ask an AI assistant to make the tests pass and it will. Sometimes by writing real code. Sometimes by writing assert True, mocking the function under test, or redefining the production class right there in the test file. Coverage is green, CI is green, nothing is verified.

Static rules alone become a whack-a-mole — every new evasion pattern a new rule. Pragma plays a different game: tier 1 catches the obvious shapes, tier 2 demands the production code actually run, tier 3 reads both files and asks whether the test verifies behavior.

Security

Tier 3 is the only tier that touches the network. The LLM judge sends the test source plus the production source to whichever endpoint PRAGMA_LLM_BASE_URL points at. Two implications:

Treat the test+production payload as you would any other code review send-off. If your codebase is proprietary, point Tier 3 at a self-hosted endpoint (Ollama, LM Studio, vLLM, an internal OpenAI-compatible gateway) rather than a third-party SaaS.
Tier 3 is opt-in for exactly this reason — the default plugin wiring runs Tier 1 and Tier 2 only. Set PRAGMA_HOOK_WITH_LLM=1 when you've decided the endpoint is acceptable for your code.

Tiers 1 and 2 stay fully local: AST parsing happens in-process, coverage instrumentation runs in the local interpreter. Nothing leaves the machine unless Tier 3 is on.

Contributing

PRs welcome. Useful directions:

More languages. The verdict-table shape generalises — Go's _test.go, Rust's #[test], Ruby's RSpec all have analogous gaming patterns.
Tier 1 verdicts for patterns the AST classifier doesn't yet catch (open issue with a real-world test file showing the shape).
Replay corpora — anonymised test files where a tier got it wrong, so the classifier can be tuned.

See docs/PRECOMMIT.md for the existing pre-commit integration if you want to wire Tier 1 into another project's hooks.

Name		Name	Last commit message	Last commit date
Latest commit History 223 Commits
.claude-plugin		.claude-plugin
.github/workflows		.github/workflows
docs		docs
plugin		plugin
src/pragma		src/pragma
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pragma

Table of Contents

Three Tiers, Layered

What It Catches — Python

What It Catches — Vitest (TypeScript / JavaScript)

What It Catches — Jest (TypeScript / JavaScript)

Install

Tier 3 — Bring Your Own LLM

Use Without Claude Code

Why

Security

Contributing

License

About

Uh oh!

Releases 20

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pragma

Table of Contents

Three Tiers, Layered

What It Catches — Python

What It Catches — Vitest (TypeScript / JavaScript)

What It Catches — Jest (TypeScript / JavaScript)

Install

Tier 3 — Bring Your Own LLM

Use Without Claude Code

Why

Security

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 20

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages