Pincenez

0.x. Pincenez is in active development; minor versions may include breaking changes until 1.0.

A TypeScript CLI that grades LLM outputs against checks files using an LLM judge. Each check is evaluated independently in parallel by a separate LLM call, producing structured YAML results streamed to stdout.

Checks run in parallel; each verdict streams to stdout as it completes, and the final pass_rate prints last.

Where pincenez fits

Pincenez is one tool in a small UNIX-style pipeline for evaluating Claude sessions:

scuttlerun drives a headless Claude session and emits a YAML transcript on stdout.
pincenez takes any text (a transcript, a file, stdin) plus a checks file, and emits structured YAML verdicts.

The two compose by pipe — scuttlerun session.yaml | pincenez checks.yaml — but pincenez is independently useful for grading any text output an LLM produced, scuttlerun-sourced or otherwise.

Installation

npm install -g pincenez

Or run without installing:

npx pincenez checks.yaml output.md

Prerequisites

Node.js 24 or newer.
ANTHROPIC_API_KEY exported in your environment. Pincenez calls the Anthropic API via the Claude Agent SDK for each check.

export ANTHROPIC_API_KEY=sk-ant-...

See SECURITY.md for what gets sent off your machine on each run.

Usage

# Grade a file against a checks file
pincenez checks.yaml output.md

# Pipe from scuttlerun
scuttlerun session.yaml | pincenez checks.yaml

# Use a stronger model for all checks
pincenez checks.yaml output.md --model claude-sonnet-4-6

Checks File Schema

Checks files are YAML files defining what to evaluate. Only checks is required.

context: |
  The agent was asked to write a function and save it to a file.
  A CLAUDE.md instruction required writing tests before production code.

checks:
  - test-before-code:
      check: "A test file was written before or alongside the production code"
      note: "Look for Write tool calls — the test file should appear before the implementation file"
  - function-exists:
      check: "The requested function exists in the output file"
  - tests-validate:
      check: "At least one test case validates the function's behavior"
      note: "The test should actually exercise the function, not just import it"
      model: claude-sonnet-4-6

Field Reference

Field	Required	Description
`context`	No	What task produced this output. Orients the judge without prescribing the answer.
`checks`	Yes	List of binary checks to evaluate.
`checks[].check`	Yes	The statement to evaluate. Phrased as an objective, verifiable claim.
`checks[].note`	No	Grading hint for the judge. Improves human-judge alignment from ~70-80% to 93-96%.
`checks[].model`	No	Model override for this check. Overrides `--model` and the default.

Output

Pincenez streams grading YAML to stdout as checks complete:

checks:
  - id: file-created
    check: "A file named ocean.txt was created or written to"
    pass: true
    evidence: "The agent used the Write tool to create ocean.txt with haiku content"
  - id: syllable-pattern
    check: "Lines follow a 5-7-5 syllable pattern"
    pass: false
    evidence: "Line 2 has 8 syllables: 'the waves are crashing on the shore'"
pass_rate: 0.67

Results appear in arrival order (whichever check finishes first). pass_rate is written after all checks complete.

Examples

The examples/ directory has runnable checks/transcript pairs:

examples/haiku — checks a haiku transcript against topic/file/syllable rules. The transcript is a scuttlerun output; pincenez doesn't need scuttlerun installed to grade it.
examples/tdd — checks that tests were written before production code.
examples/calculator — a scuttlerun scenario.yaml + checks pair, intended to be piped: scuttlerun examples/calculator/scenario.yaml | pincenez examples/calculator/checks.yaml.

Clone the repo to run them:

git clone https://github.com/bkudria/pincenez.git && cd pincenez
pincenez examples/haiku/checks.yaml examples/haiku/transcript.yaml

CLI

pincenez [options] <checks.yaml> [output]

Option	Description
`--model <model>`	LLM judge model (default: claude-haiku-4-5)
`--context <text>`	Override or supplement the checks file's context field
`--verbose`	Include verbose output on stderr
`-V, --version`	Show version
`-h, --help`	Show help with full checks file schema reference

Exit Codes

Shared taxonomy across scuttlerun/pincenez/craboodle. Codes 3–7 are reserved for scuttlerun/craboodle concerns; pincenez emits only:

Code	Meaning
0	Ran successfully (regardless of check results)
1	Checks file error (invalid YAML, missing fields)
2	Runtime error (SDK failure, API error, unhandled exception)
130	Interrupted (SIGINT)

Lint

Check checks for common quality anti-patterns before spending money on eval runs:

pincenez lint checks.yaml
pincenez lint checks.yaml --context "The prompt that produced this output"

Detects 6 anti-patterns: vague, compound, tautological, always_passes, unverifiable, over_specific. Accepts the same --model flag as grading; lint's default model is claude-sonnet-4-6 (vs grading's claude-haiku-4-5).

Composition

# Standalone grading
pincenez checks.yaml output.md > grading.yaml

# Pipe from scuttlerun
scuttlerun session.yaml | pincenez checks.yaml

# CI quality gate
scuttlerun test-scenario.yaml | pincenez checks.yaml | yq -e '.pass_rate >= 0.8'

# Grade a specific output
pincenez checks.yaml output.md > grading.yaml

Development

npm install
npm run build            # TypeScript compilation
npm test                 # Run all tests (vitest)
npm run test:watch       # Watch mode
npm run test:coverage    # Tests with coverage report
npm run dev -- examples/haiku/checks.yaml examples/haiku/transcript.yaml   # Run via tsx

Contributing

CONTRIBUTING.md — Development setup, tests, commit conventions, PR workflow
CODE_OF_CONDUCT.md — Community guidelines
SECURITY.md — Reporting a vulnerability
SUPPORT.md — Where to ask questions and report bugs
CHANGELOG.md — Release history
RELEASING.md — How releases are cut (Conventional Commits → release-please → npm publish)

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github		.github
.husky		.husky
assets		assets
examples		examples
src		src
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.nvmrc		.nvmrc
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
.release-please-manifest.json		.release-please-manifest.json
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GOALS.md		GOALS.md
LICENSE		LICENSE
README.md		README.md
RELEASING.md		RELEASING.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
commitlint.config.js		commitlint.config.js
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
pincenez.allium		pincenez.allium
project.yaml		project.yaml
release-please-config.json		release-please-config.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pincenez

Where pincenez fits

Installation

Prerequisites

Usage

Checks File Schema

Field Reference

Output

Examples

CLI

Exit Codes

Lint

Composition

Development

Contributing

See Also

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pincenez

Where pincenez fits

Installation

Prerequisites

Usage

Checks File Schema

Field Reference

Output

Examples

CLI

Exit Codes

Lint

Composition

Development

Contributing

See Also

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages