Performance Baselines

Performance Baseline and Regression Gates (v1)

I use this document as the frozen performance-baseline contract for the current bootstrap toolchain.

Contract version: perfbase.v1

What I Freeze in v1

I freeze deterministic throughput floors (ops/sec) for representative CLI workloads in the pinned Linux x86-64 bootstrap environment.

The gated workloads are:

verify on tests/valid_min.l0
build on tests/valid_min.l0
run on valid_min image (2-arg arithmetic kernel)
run on valid_sysv_abi_sum6_lowered image (6-arg SysV kernel)
build-elf on tests/valid_sysv_abi_sum6_lowered.l0
mapcat, schemacat, tracecat, tracejoin on deterministic trace artifacts

Enforcement

I enforce this contract in:

tests/performance_gates.sh

That harness is integrated into tests/run.sh and therefore enforced by make test.

I also run this automatically in GitHub Actions:

.github/workflows/performance.yml on nightly schedule and on version tags (v*).
CI uses a blocking smoke gate (tests/ci_smoke_bench.sh) plus a non-blocking full-gate observation run (tests/performance_gates.sh) because hosted runner throughput is less stable than my pinned local baseline environment.

Pinned Throughput Floors in v1

verify.valid_min >= 1800 ops/s
build.valid_min >= 1300 ops/s
run.add >= 2400 ops/s
run.sum6 >= 2200 ops/s
build-elf.sum6 >= 1100 ops/s
mapcat.trace_map >= 2500 ops/s
schemacat.trace_schema >= 2500 ops/s
tracecat.trace_bin >= 2500 ops/s
tracejoin.trace_bin+map >= 2300 ops/s

CI Runner Profile

I support a CI profile for hosted runners by setting:

L0_PERF_PROFILE=ci

This uses conservative floors intended for shared GitHub-hosted runner variance. My local/pinned production profile remains the default when this variable is not set.

Out of Scope in v1

Cross-machine or cross-CPU performance comparability guarantees.
Full benchmarking methodology for release marketing claims.
Auto-tuning of thresholds; v1 thresholds are pinned and explicitly versioned.

Reproducibility Notes for Comparison Reports

For apples-to-apples comparison snapshots, I now capture reproducibility metadata and confidence hints in:

docs/PERFORMANCE_COMPARISON_APPLES_TO_APPLES.md
docs/PERFORMANCE_COMPARISON_APPLES_TO_APPLES.json

I can control this benchmark behavior with:

L0_A2A_PIN_CPU (optional CPU affinity, e.g. 0)
L0_A2A_BUILD_ITERS
L0_A2A_BUILD_SAMPLES
L0_A2A_RUNTIME_ITERS
L0_A2A_RUNTIME_SAMPLES
L0_A2A_WARMUP_RUNS

The report includes:

CPU/model/topology metadata
sample counts and warmup configuration
median throughput values
CI95 on runtime sample means
outlier policy controls (L0_A2A_TRIM_COUNT) used for CI computation when enough samples are present
stability warning threshold control (L0_A2A_RUNTIME_CI95_PCT_WARN)

Performance Baselines

Performance Baseline and Regression Gates (v1)

What I Freeze in v1

Enforcement

Pinned Throughput Floors in v1

CI Runner Profile

Out of Scope in v1

Reproducibility Notes for Comparison Reports

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

L0 Wiki

Language

Tooling

Contracts

Planning

Clone this wiki locally