xvivo

Skill test harness with autonomous prompt optimization.

xvivo tests Claude skill files (.md prompts + Python scripts) the same way pytest tests code — with two modes: fast pass/fail testing and an autonomous optimization loop that iteratively improves skill quality.

Quick start

# Install
npm install

# Run tests
xvivo run --dir ./skills

# Optimize a skill section
xvivo optimize --skill fishbone-diagram --section cause-identification --spec fishbone-diagram.optimize.yaml

# Train ML evaluators (optional)
npm run prepare:ml
xvivo train all --skill fishbone-diagram

# Check model health
xvivo models check --skill fishbone-diagram

Two modes

`xvivo run` — fast pass/fail

Discovers .test.yaml specs, runs them against skill files, reports results. Supports two execution modes per test:

Agent-driven — sends the skill prompt + test input to the Claude Agent SDK, evaluates the output
Direct — runs a Python script with fixture inputs, evaluates stdout and output files

Designed for CI/CD. Exits 0 on all pass, 1 on any failure.

`xvivo optimize` — Karpathy loop

Implements the autoresearch pattern for skill improvement:

Score the skill section against 25 boolean criteria (baseline)
An agent applies a mutation (add constraint, tighten language, add example, etc.)
Git commit the change
Re-score against the same 25 criteria
If improved → keep. If same or worse → git revert
Repeat

Runs indefinitely or until a stopping condition is met (target score, plateau, cost limit).

Three evaluator classes

Class	Engine	Speed	Deterministic	Cost
A	TypeScript (contains, regex, schema)	<1ms	Yes	Free
B	Claude API (LLM-as-judge)	1-5s	No	API credits
C	PyTorch (embeddings, classifiers)	10-100ms	Yes (once trained)	Free (local)

Class C evaluators are optional — the harness degrades gracefully to Class B if PyTorch is not installed.

Project structure

src/
  cli/             Ink TUI components and Pastel CLI commands
  runner/          Test discovery, orchestration, result collection
  evaluators/      Class A, B, C evaluator implementations
  optimizer/       Karpathy loop: mutation, keep/revert, logging
  sandbox/         Temp directory lifecycle for script execution
  parsers/         Markdown section parser, YAML spec loader
  types/           Zod schemas and TypeScript type definitions
scripts/ml/        Python ML evaluator scripts
models/            Trained model checkpoints (gitignored except metadata)
tests/             Unit and integration tests
docs/              Requirements and design documents
.claude/           References for Claude Code development sessions

Requirements

Node.js 20+
Python 3.10+ (for skill scripts and ML evaluators)
ANTHROPIC_API_KEY environment variable (for Class B evaluators and agent-driven tests)
PyTorch + sentence-transformers (optional, for Class C evaluators)

Documentation

ML Integration Requirements — full spec for the ML evaluation pipeline
.claude/references/ — best practices per technology for development

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
.claude		.claude
.planning		.planning
docs		docs
scripts/ml		scripts/ml
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
biome.json		biome.json
commitlint.config.js		commitlint.config.js
lefthook.yml		lefthook.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xvivo

Quick start

Two modes

`xvivo run` — fast pass/fail

`xvivo optimize` — Karpathy loop

Three evaluator classes

Project structure

Requirements

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

xvivo

Quick start

Two modes

xvivo run — fast pass/fail

xvivo optimize — Karpathy loop

Three evaluator classes

Project structure

Requirements

Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`xvivo run` — fast pass/fail

`xvivo optimize` — Karpathy loop

Packages