xp

Autonomous experiment daemon. Point an LLM at any benchmark, it optimizes the metric in a loop.

Install

bun run build   # compiles binary to bin/xp + symlinks to ~/.bun/bin/

Usage

# Start an experiment
xp start optimize-fft \
  --metric latency --unit ms --direction min \
  --benchmark "./bench.sh" \
  --objective "reduce FFT latency" \
  --provider claude

# Monitor
xp status            # current state
xp logs              # daemon output
xp logs -f           # tail daemon output
xp results           # all trial results
xp results --last 5  # last 5 trials

# Steer the agent mid-run
xp steer "try SIMD intrinsics instead of auto-vectorization"

# Stop
xp stop

Commands

Command	Description
`start <name>`	Initialize and start an experiment
`stop`	Stop the daemon
`status`	Show experiment state (`--json`)
`logs`	View daemon log (`-f` to follow)
`results`	Show trial results (`--last N`, `--json`)
`steer <guidance>`	Send guidance to the running experiment

`start` Flags

Flag	Description	Default
`--metric`	Metric name to optimize	required
`--unit`	Metric unit	`""`
`--direction`	`min` or `max`	required
`--benchmark`	Shell command that emits `METRIC name=value`	required
`--objective`	What the agent should optimize	required
`--provider`	`claude` or `codex`	`claude`
`--max-iterations`	Budget cap	`50`
`--max-failures`	Max consecutive failures	`5`

Benchmark Contract

The benchmark command must print metrics to stdout in this format:

METRIC latency=42.5
METRIC throughput=1200

One METRIC name=value per line. The --metric flag selects which one to optimize.

How It Works

Baseline: runs the benchmark on the current code to establish a starting point
Loop: invokes the LLM agent with context (objective, best score, dead ends, user guidance), agent makes changes in a git worktree, benchmark runs, result is kept or reverted
Persistence: all events logged to append-only JSONL, crash-safe with two-phase decisions
Worktree isolation: experiments run in .xp/worktree/ on an xp/<name> branch — your working directory stays clean

Development

bun run dev -- --help   # run from source
bun run gate            # typecheck + lint + fmt + test + build
bun test                # tests only

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.changeset		.changeset
.github/workflows		.github/workflows
fixture		fixture
scripts		scripts
src		src
tests/services		tests/services
.gitignore		.gitignore
.oxlintrc.json		.oxlintrc.json
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
bun.lock		bun.lock
lefthook.yml		lefthook.yml
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xp

Install

Usage

Commands

`start` Flags

Benchmark Contract

How It Works

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

xp

Install

Usage

Commands

start Flags

Benchmark Contract

How It Works

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`start` Flags

Packages