Test your AI agents like you test your code.
Documentation | Examples | Contributing
Attest is an open-source testing framework purpose-built for AI agents and LLM-powered systems. It treats deterministic assertions as first-class citizens alongside probabilistic evaluation — because 70% of your agent's testable surface is deterministic.
Status: v0.6.0 — 8 assertion layers, 11 adapters (Python) + 7 adapters (TypeScript incl. LangChain.js), continuous eval with drift detection, dynamic thresholds, plugin system, result history (SQLite), CLI (Python + TypeScript), budget tracking, engine timeouts, concurrent evaluation, schema caching, branded types, CJS/ESM dual output, pytest + vitest integrations (incl. simulation mode),
queryDrift+generateUserMessageSDK methods, 4 built-in simulation personas, comprehensive test coverage.
The current ecosystem defaults to LLM-as-judge for everything. This creates cost explosion, flaky tests, and slow suites. Attest provides a graduated 8-layer assertion model that reaches for the cheapest valid assertion first:
Layer 1: Schema Validation — free, instant, deterministic
Layer 2: Constraint Checks — free, instant, deterministic
Layer 3: Trace/Behavioral Checks — free, instant, deterministic
Layer 4: Content Pattern Matching — free, instant, near-deterministic
Layer 5: Embedding Similarity — ~$0.001, ~100ms, near-deterministic
Layer 6: LLM-as-Judge — ~$0.01+, ~1-3s, non-deterministic
Layer 7: Simulation — free (mocked), variable
Layer 8: Multi-Agent — free, instant, deterministic
from attest import agent, expect
@agent("refund-agent")
def refund_agent(builder, order_id):
builder.add_tool_call("lookup_order", args={"order_id": order_id},
result={"status": "delivered", "amount": 89.99})
builder.add_tool_call("process_refund", args={"order_id": order_id},
result={"refund_id": "RFD-001"})
builder.set_metadata(cost_usd=0.005, total_tokens=150, latency_ms=1200)
return {"message": "Your refund of $89.99 has been processed.",
"structured": {"refund_id": "RFD-001"}}
def test_refund(attest):
result = refund_agent(order_id="ORD-123456")
chain = (
expect(result)
# Layer 1 — schema (free, instant)
.output_matches_schema({"type": "object", "required": ["refund_id"]})
# Layer 2 — constraints (free, instant)
.cost_under(0.05)
# Layer 3 — trace (free, instant)
.tools_called_in_order(["lookup_order", "process_refund"])
# Layer 4 — content (free, instant)
.output_contains("refund")
.output_not_contains("sorry")
# Layer 6 — LLM judge (only when needed)
.passes_judge("Is the response helpful and accurate?")
)
attest.evaluate(chain)Attest follows a protocol-first, engine-centric architecture inspired by LSP and Playwright:
- Core engine (Go) — single binary, handles all evaluation, assertion, and simulation logic
- Language SDKs (Python, TypeScript, Go) — thin, idiomatic wrappers that communicate with the engine via JSON-RPC 2.0 over stdio
- Protocol — SDK spawns engine as subprocess; capability negotiation decouples SDK and engine release cycles
SDK (Python/TS/Go) ──stdin/stdout──▶ Engine Process (Go)
├── Trace Processor
├── 8-Layer Assertion Pipeline
├── Result History (SQLite)
├── Drift Detection (σ-based)
├── Simulation Runtime
└── Report Generator
- 8-layer assertion pipeline — graduated from free/deterministic to paid/probabilistic
- 11 Python adapters — OpenAI, Anthropic, Gemini, Ollama, LangChain, Google ADK, LlamaIndex, CrewAI, OTel, Manual, Simulation
- 7 TypeScript adapters — OpenAI, Anthropic, Gemini, Ollama, LangChain.js, OTel, Manual
- Soft failure budgets — scores between 0.5–0.8 warn without blocking CI
- Cost as a test metric — assert on token usage, API cost, and latency
- Budget enforcement — configurable max cost per evaluation batch via
ATTEST_BUDGET_MAX_COST - Python & TypeScript SDKs —
attest-ai(PyPI) +@attest-ai/core/@attest-ai/vitest(npm) - TypeScript CLI —
npx @attest-ai/core init,validate,cache stats/clear - CJS/ESM dual output — TypeScript SDK supports both
require()andimport, tree-shakeable - Branded types —
TraceId,AssertionId,AgentIdprevent accidental string assignment (TypeScript) - Discriminated union specs — type-safe assertion specs per layer (TypeScript)
- Local ONNX embeddings — optional all-MiniLM-L6-v2 provider, zero API cost for Layer 5
- Judge meta-evaluation — 3x judge runs with median scoring and variance detection
- CI-ready — composite GitHub Action, tiered testing workflow, adversarial hardening
- Simulation runtime — simulated users with personas, mock tools, fault injection, multi-turn orchestration
- Multi-agent testing — hierarchical trace trees, cross-agent assertions, delegation tracking, temporal assertions, aggregate metrics
- Continuous eval & drift detection — sample production traces, σ-based drift alerts to webhooks/Slack (Python + TypeScript)
- Plugin system — extend with custom assertions via
attest.pluginsentry points (Python + TypeScript) - Result history — SQLite-backed with configurable retention, automatic pruning
- Engine performance — schema compiler cache, deferred LRU writes, batch SQL eviction,
segmentio/encoding(2.4x JSON speedup) - Configurable timeouts — engine read timeout, concurrent request support, judge cache sizing
- CLI —
attest initscaffolding andattest validateconfig checking (Python + TypeScript) - Single binary engine — no runtime dependencies, cross-platform
attest/
├── proto/ # Protocol specification (JSON-RPC 2.0)
├── engine/ # Core engine (Go)
│ ├── cmd/ # CLI entrypoint
│ ├── internal/ # Assertion pipeline, trace model, simulation
│ └── pkg/ # Public Go packages
├── sdks/
│ ├── python/ # Python SDK (PyPI: attest-ai)
│ ├── typescript/ # TypeScript SDK (npm: @attest-ai/core)
│ └── go/ # Go SDK
├── .github/
│ └── actions/ # Reusable composite actions (setup-attest)
├── docs/ # Documentation
├── examples/ # Standalone example projects (incl. CI workflows)
└── scripts/ # Build and development scripts
git clone https://github.com/attest-framework/attest.git
cd attest
# Build engine + install Python SDK
make dev-setup
# Run all tests
make test
# Verify engine
./bin/attest-engine version
# attest-engine 0.4.0cd sdks/typescript
pnpm install
pnpm build
pnpm testInstall in your project:
pnpm add @attest-ai/core @attest-ai/vitest| Phase | Version | Status | Description |
|---|---|---|---|
| 0 | — | Complete | Repository scaffolding, toolchain, protocol spec |
| 1 | v0.1 | Complete | Go engine (Layers 1–4), Python SDK, pytest plugin, 4 LLM adapters |
| 2 | v0.2 | Complete | Layers 5–6 (embeddings, LLM-as-judge), soft failures, CI integration |
| 3 | v0.3 | Complete | Simulation runtime, multi-agent testing, TypeScript SDK, framework adapters, temporal assertions |
| 4 | v0.4 | Complete | Continuous eval, drift detection, plugin system, CrewAI adapter, result history, CLI |
| 5 | v0.5.0 | Complete | The Alignment Release — engine/SDK version unification, schema cache, segmentio/encoding, full TypeScript parity, test coverage |
| 6 | v0.6.0 | Complete | SDK parity with engine — queryDrift, generateUserMessage, dynamic thresholds, vitest simulation mode, 4 personas, sampleRate |
| 7 | v0.7 | Planned | Go SDK, Attest Cloud MVP, benchmark registry |
See CONTRIBUTING.md for development setup, workflow, and code standards.
