Local-first regression intelligence for AI builders.
DriftCheck helps catch behavior drift when prompts, tools, RAG flows, SDKs, or models change. It runs locally first, writes JSON and markdown reports, and publishes a hosted proof card only when you choose.
npx @a2zai-ai/driftcheck init
npx @a2zai-ai/driftcheck checkDuring local package development:
npm install
npm run smokeDriftCheck creates:
.driftcheck/checks/*.ymlstarter packs.driftcheck/runs/latest.jsondriftcheck-report.md
- Tool-Calling Reliability: schema-valid tool arguments, safe fallback behavior, and hallucinated tools.
- RAG Faithfulness: grounded answers, citations, missing-context refusal, and source scope.
- Model Migration: quality, cost, latency, and safety drift when moving between models.
To also write a live model comparison pack:
npx @a2zai-ai/driftcheck init --livenpx @a2zai-ai/driftcheck check --pack tool-calling
npx @a2zai-ai/driftcheck check --pack rag-faithfulness
npx @a2zai-ai/driftcheck check --pack model-migrationFor packs with a live execution block, you can override the baseline and candidate models without editing YAML:
OPENAI_API_KEY="sk-..." npx @a2zai-ai/driftcheck check \
--pack model-migration \
--baseline-model gpt-4o-mini \
--candidate-model gpt-4.1-miniThe same values can be set with environment variables:
DRIFTCHECK_BASELINE_MODEL=gpt-4o-mini \
DRIFTCHECK_CANDIDATE_MODEL=gpt-4.1-mini \
OPENAI_API_KEY="sk-..." \
npx @a2zai-ai/driftcheck check --pack model-migrationStatic packs still run without API keys. Model overrides only affect packs that define execution.provider.
Use compare when you want to check a model migration without editing YAML first:
OPENAI_API_KEY="sk-..." npx @a2zai-ai/driftcheck compare \
--baseline-model gpt-4o-mini \
--candidate-model gpt-4.1-miniThis runs the built-in Live Model Compare pack and writes the same .driftcheck/runs/latest.json and driftcheck-report.md outputs.
npx @a2zai-ai/driftcheck summary --run .driftcheck/runs/latest.jsonThe GitHub Action writes this summary to the workflow run automatically, so PR authors can see the overall score, dimension scores, model pair, and cases needing review without opening artifacts.
Publishing is explicit. Reports stay local unless you run publish.
DRIFTCHECK_TOKEN="paste-token-here" npx @a2zai-ai/driftcheck publish --run .driftcheck/runs/latest.json --publicThe hosted proof layer currently lives at A2ZAI:
DRIFTCHECK_API_URL="https://www.a2zai.ai" npx @a2zai-ai/driftcheck publish --run .driftcheck/runs/latest.json --publicPacks live in .driftcheck/checks/*.yml.
id: tool-calling
name: Tool-Calling Reliability
category: tool-calling
description: Catch schema drift, hallucinated tool calls, and weak fallback behavior before agent changes ship.
cases:
- name: Valid tool arguments
dimension: quality
weight: 3
threshold: 80
baselineOutput: "call_tool({ user: 'acct_123', action: 'refund_review' })"
candidateOutput: "call_tool({ userId: 'acct_123', action: 'refund_review' })"
expectedContains:
- userId
- action
forbiddenContains:
- malformed
- undefinedSupported categories:
tool-callingrag-faithfulnessmodel-migration
Supported score dimensions:
qualitysafetylatencycost
Static outputs work without API keys. To compare live OpenAI model responses, add an execution block and set OPENAI_API_KEY.
execution:
provider: openai
baselineModel: gpt-4o-mini
candidateModel: gpt-4.1-mini
temperature: 0
maxTokens: 140After this repo is public, use:
name: DriftCheck
on:
pull_request:
jobs:
driftcheck:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: a2zai-ai/driftcheck@v0
with:
fail-threshold: 70
baseline-model: gpt-4o-mini
candidate-model: gpt-4.1-miniDriftCheck is local-first:
- Pack files stay in your repo.
- Reports are written locally.
- Publish is opt-in.
- Known secret patterns are redacted from generated reports before publish.
- npm package publication as
@a2zai-ai/driftcheck - standalone
a2zai-ai/driftcheckpublic repo - richer GitHub Action summaries
- more starter packs for agents, support bots, coding workflows, and RAG apps