Optimum trial targeting for expensive black-box evaluations improves sample efficiency and operational reliability while reducing wasted trials.
Looptimum is a file-backed loop for optimum parameter targeting when each
trial is costly (time, compute, money, or operational risk).
You provide a parameter space and objective schema; Looptimum suggests the
next trial, records decisions, and resumes cleanly after interruptions.
Current stable release: v0.3.5.
For expensive black-box objectives, Looptimum starts with bounded exploration
and then shifts to surrogate-guided suggestion ranking to reduce wasted trials.
Its key differentiator is operational: a file-backed, resumable workflow that
keeps state and decision trace local, which fits restricted and client-controlled
environments. The usage model stays simple (suggest -> evaluate -> ingest,
with optional locked batches);
see docs/how-it-works.md for algorithm behavior and
tuning consequences.
For a spec-style contract summary, use
docs/quick-reference.md.
- Private contact: contact@looptimum.com
- Start here:
PILOT.md,intake.md,docs/pilot-checklist.md - Best initial fit: bounded parameter spaces, one scalar objective or explicit scalarization rule, and expensive evaluations in client-controlled environments
- Scope and delivery are tailored to the project; contact for scope
- "We're wasting time on parameter sweeps and manual tuning."
- "Each run is expensive, so we need fewer total experiments."
- "We can run evaluations, but we do not want to build optimization infra."
- "Runs sometimes fail; we need resumable state and traceability."
- "We have lots of knobs and no reliable way to tune them."
Looptimum replaces ad hoc sweep loops with a small, explicit workflow:
- Define parameter bounds, objective schema, and optional constraints.
suggestone trial by default, or allocate a locked batch with--count N.- Run that trial in your environment.
ingestthe result and repeat.
Instead of broad grid/random sweeps, Looptimum uses prior observations to choose what to test next.
Every core claim in this README has an auditable source:
- contract semantics and payload/state definitions:
docs/quick-reference.md - optimizer behavior, backend differences, and failure modes:
docs/how-it-works.md - compatibility and breaking-change policy:
docs/stability-guarantees.md - recovery and interruption handling:
docs/recovery-playbook.md - CI operational policy for persistence/parallelism/robust best:
docs/ci-knob-tuning.md - benchmark evidence and reproducibility artifacts:
benchmarks/README.md,benchmarks/summary.json,benchmarks/case_study.md
| Component | Typical Location | Responsibility |
|---|---|---|
| Looptimum controller | Local machine, CI runner, or client host | suggest, ingest, status, lifecycle + ops commands, local state |
| Evaluator | Your runtime (script, cluster job, lab workflow, API) | Execute one trial from suggested params |
| State and logs | Local files under template state/ |
Resume, audit trail, best-so-far tracking |
| Local service preview | Same host or nearby dev box | Preview-only FastAPI wrapper over registered campaign roots; metadata registry only |
Preview note:
- the optional Service API preview under
service/is explicitly preview-scoped, keeps campaign roots file-backed and authoritative, and is not part of the stablev0.3.xcompatibility surface - see
docs/service-api-preview.mdanddocs/dashboard-preview.md - optional preview auth/RBAC/SSO guidance for that local service stack is in
docs/auth-preview.md - optional preview multi-controller coordination for that local service stack
is in
docs/coordination-preview.md - example packs:
docs/examples/service_api_preview/README.mdanddocs/examples/dashboard_preview/README.mdanddocs/examples/auth_preview/README.mdanddocs/examples/coordination_preview/README.md
- Data/ETL pipelines: batch size, parallelism, retry/backoff, memory limits.
- Infra/performance tuning: concurrency, cache TTLs, connection pools, thread counts.
- Search/recommendation knobs: threshold and weighting calibration.
- Pricing/growth experiments: eligibility thresholds, ramp controls, and guardrail tradeoffs.
- Build and compile tuning: optimization flags, link-time settings, and benchmark-driven runtime tradeoffs.
- ML training loops: learning rate, batch size, regularization, early-stop settings.
- Large-model workflow tuning: training recipe knobs, evaluation-policy settings, and runtime controls for long-running jobs.
- Simulation and engineering workflows: solver tolerances, mesh controls, calibration settings.
- Operations/process tuning: throughput vs. quality/cost tradeoffs.
For many small-to-moderate parameter spaces, teams can find competitive configurations in fewer runs than naive sweeps (problem dependent).
From repo root:
python3 templates/bo_client_demo/run_bo.py demo \
--project-root templates/bo_client_demo \
--steps 5
python3 templates/bo_client_demo/run_bo.py status \
--project-root templates/bo_client_demoReal captured status output (from templates/bo_client_demo on
March 3, 2026):
{
"observations": 3,
"pending": 0,
"next_trial_id": 4,
"best": {
"trial_id": 2,
"objective_name": "loss",
"objective_value": 0.03128341826910849,
"updated_at": 1772392830.7282188
}
}Key fields:
observationspendingnext_trial_idbest
Quickstart note:
- The default template files and commands above use canonical JSON contract paths and run without compatibility/deprecation warnings on a clean copy.
For full command sets and resume behavior, see quickstart/README.md.
For an opinionated mainstream scenario, see
quickstart/etl-pipeline-knob-tuning.md.
For interruption triage and recovery actions, see
docs/recovery-playbook.md.
For the local FastAPI wrapper over the same file-backed runtime, see
docs/service-api-preview.md.
For the read-only operator shell mounted from that preview service, see
docs/dashboard-preview.md.
For optional preview auth and role separation on that same service stack, see
docs/auth-preview.md.
For optional preview multi-controller coordination on that same service stack,
see docs/coordination-preview.md.
For the dedicated tiny end-to-end objective walkthrough, see
examples/toy_objectives/03_tiny_quadratic_loop/README.md.
Evidence artifacts for optimization-credibility checks are published in
benchmarks/:
- benchmark runner script:
benchmarks/run_trial_efficiency_benchmark.py - committed compact summary (golden):
benchmarks/summary.json - generated compact case study (derived from summary):
benchmarks/case_study.md
Canonical Phase 8 protocol in this repository:
- objective:
tiny_quadratic - baseline: random search
- metric: best objective at fixed budget
- reproducibility: 10 seeds with median + IQR reporting
Re-run canonical evidence locally:
python3 benchmarks/run_trial_efficiency_benchmark.py \
--objective tiny_quadratic \
--budget 20 \
--seeds 17,29,41,53,67,79,97,113,131,149 \
--write-summary benchmarks/summary.json \
--write-case-study benchmarks/case_study.mdDrop this into client_harness_template/objective.py to get started quickly:
def evaluate(params):
x1 = float(params["x1"])
x2 = float(params["x2"])
loss = (x1 - 0.3) ** 2 + (x2 - 0.7) ** 2
return {"status": "ok", "objective": loss}Use this when your evaluator can return a scalar directly.
For fuller failure handling (failed/timeout + terminal_reason +
penalty_objective), use the
expanded stub in
docs/integration-guide.md#copy-paste-evaluator-stub-fuller-version.
- Each evaluation is expensive enough that sample efficiency matters.
- Your evaluator runs as external jobs and you want a thin outer loop above training/evaluation infrastructure.
- You can define one scalar objective or an explicit scalarization / lexicographic rule for multiple objectives.
- You have a bounded parameter set (commonly small-to-moderate dimensional).
- You want resumable, file-backed operation in local/offline/restricted environments.
- You prefer a small integration contract over building custom BO orchestration.
- Objective evaluation is cheap and simple random/grid search is sufficient.
- Reliable gradients are available and gradient-based methods are a better fit.
- Search space is extremely high-dimensional without useful structure.
- You cannot define a scalar objective or acceptable scalarization rule.
- Parameter space definition (
float,int,bool, andcategoricalin public templates; numeric params can also declarescale, and params may usewhenfor conditional activation). - Objective schema (required
primary_objective, optionalsecondary_objectives, optionalscalarizationpolicy). - Trial budget and seed/config settings.
Count 1 keeps the historical single-suggestion payload. Count > 1 emits a
bundle JSON object by default:
schema_versioncountsuggestions(array of canonical suggestion payloads)
Use --jsonl to emit one canonical suggestion JSON object per line for worker
handoff.
Each suggestion includes:
schema_version(semver string, emitted by runtime)trial_idparamssuggested_atlease_token(only whenworker_leases.enabledis true)
trial_id(must match a pending trial)params(must match suggested params exactly)objectives:status: ok-> all configured objective values must be numeric and finite- non-
okstatus -> all configured objective values must benull
status:ok,failed,killed,timeout
schema_version(semver string, optional in schema and emitted by harness/runtime flows)terminal_reason(short string for non-okoutcomes; recommended)penalty_objective(number, only for non-okstatuses; reporting/compatibility only)
schema_versionobservationspendingnext_trial_idbeststale_pendingobservations_by_statuspaths
Best ranking rule:
bestis computed only fromstatus: "ok"observations.- Single-objective campaigns rank by the primary objective value.
- Multi-objective campaigns rank by the configured scalarization or lexicographic policy while preserving raw objective vectors in status, manifests, and reports.
penalty_objectiveis never used to rankbest.
state/bo_state.json: source of truth for observations/pending/best and requiredschema_version.state/observations.csv: flattened observation export.state/acquisition_log.jsonl: append-only decision trace.state/event_log.jsonl: append-only lifecycle/operations trace, including governance override/violation events.state/trials/trial_<id>/manifest.json: per-trial audit manifest.state/report.jsonandstate/report.md: explicit report outputs fromreport, including objective-config and Pareto summaries for multi-objective campaigns.
- Canonical statuses are
ok,failed,killed, andtimeout. - For non-
okoutcomes with no reason provided, ingest synthesizesterminal_reasonasstatus=<status>. v0.2.xstate withoutschema_version(or with0.2.x) upgrades in-memory to0.3.0and persists on next mutating command.- Earlier
v0.3.xstate versions load transparently inv0.3.x.
- No breaking changes within the
v0.3.xline for CLI command names/required flags, ingest required fields/status vocabulary, and core state-file compatibility. - Breaking changes are allowed only on
0.xmajor-line increments (for example0.3 -> 0.4) and require explicit compatibility notes. - Current patch tag in this line:
v0.3.5(seeCHANGELOG.md). - Full policy:
docs/stability-guarantees.md.
- Identical replay of an already ingested trial: explicit no-op success.
- Conflicting replay for an already ingested trial: rejected with field-level diff details.
cancel --trial-id <id>: operator-cancel a pending trial (recorded as terminalkilledobservation with reason).retire --trial-id <id>orretire --stale: retire pending trials manually or by age policy.heartbeat --trial-id <id>: update liveness metadata for long-running pending trials.import-observations --input-file <path> [--import-mode strict|permissive]: seed terminal observations from canonical JSONL or flat CSV rows.export-observations --output-file <path>: export canonical JSONL or flat CSV observations from authoritative state.report: generatestate/report.json+state/report.md.reset [--yes] [--no-archive]: reset campaign runtime artifacts; archive is enabled by default.list-archives: inspect reset archives and surface manifest/legacy integrity status.restore --archive-id <id> [--yes]: restore archived runtime artifacts back into place.prune-archives [--keep-last N] [--older-than-seconds S] [--yes]: remove older reset archives with explicit retention criteria.health [--strict]: read-only runtime health snapshot with validate-aligned errors/warnings, lock visibility, and governance findings.metrics: read-only runtime metrics snapshot with counts, pending-age buckets, suggest latency, and governance summaries.validate [--strict]: sanity-check config/state; warnings are non-fatal unless--strict.doctor [--json]: print environment/backend/state diagnostics.
Lease note:
- when
worker_leases.enabledis true,suggestemitslease_tokenand workers must echo it onheartbeatandingest max_pending_trials, when configured, rejects the whole requested batch before any pending state is created- permissive warm-start imports write machine-readable reports under
state/import_reports/and preservesource_trial_idas provenance only
Governance note:
bo_config.jsoncan setgovernance.allowed_statusesplus warn-firstretention.archives.*/retention.logs.*limits.- Looptimum does not auto-prune archives or rotate logs when those limits are exceeded; operators must act explicitly.
- Mutating commands append
governance_override_usedwhen the runtime itself creates a disallowed terminal status, andgovernance_violations_detectedwhen current state/log/archive footprints breach configured policy.
| Template | Intended use | Default backend | Optional backend | CLI/lifecycle parity |
|---|---|---|---|---|
templates/bo_client_demo |
Fastest onboarding and contract validation | rbf_proxy |
none | full parity (suggest, ingest, import-observations, export-observations, status, demo, cancel, retire, heartbeat, report, reset, list-archives, restore, prune-archives, validate, doctor) |
templates/bo_client |
Recommended baseline for most integrations | rbf_proxy |
gp (config-selected) |
full parity |
templates/bo_client_full |
Same public contract with optional feature-flag GP path | rbf_proxy |
botorch_gp (--enable-botorch-gp / config flag) |
full parity |
All template variants use the same canonical JSON contract file conventions and
the same state/log artifact model under state/.
The examples/ folder shows integration patterns, not benchmark leaderboards.
examples/toy-objectives/01_python_function/: in-process evaluator patternexamples/toy-objectives/02_subprocess_cli/: subprocess/CLI wrapper patternexamples/toy_objectives/03_tiny_quadratic_loop/: dedicated tiny end-to-end objective (suggest -> evaluate -> ingest -> status, typically under one minute)docs/examples/multi_objective/: generated multi-objective report/state pack with weighted-sum and lexicographic objective-schema examplesdocs/examples/batch_async/: batch bundle, JSONL handoff, lease-token, and pending-state example packdocs/examples/starterkit/: webhook config/payload examples, rendered Airflow/Slurm assets, queue-worker plan output, and tracker payload examplesdocs/examples/warm_start/README.md: permissive import report, JSONL/CSV export, and manifest/state example pack for warm-start workflows
Run the tiny end-to-end objective from repo root:
python3 examples/toy_objectives/03_tiny_quadratic_loop/run_tiny_loop.py --steps 6- ETL throughput tuning: optimize
batch_size, worker count, and retry policy; score =cost_per_gb + latency_penalty. - API/service tuning: optimize concurrency limits, cache TTL, and timeout knobs;
score =
p95_latency + error_rate_penalty. - Search/ranking calibration: optimize blending weights and threshold gates;
score =
-relevance_metric + latency_penalty. - Simulation meshing (specialized): optimize mesh density/refinement controls;
score =
runtime + instability_penalty. - Assay/process protocol (specialized): optimize concentration/time/temperature;
score =
-yield + failure_penalty. - OpenFOAM-style workflow (specialized): optimize meshing/solver controls;
score =
wall_clock_time + nonconvergence_penalty.
Expanded gallery with equal mainstream/specialized coverage is in
docs/use-cases.md.
docs/examples/decision_trace/golden_acquisition_log.jsonldocs/examples/decision_trace/golden_acquisition_log.mddocs/examples/decision_trace/cli_transcript.md
- Self-serve: use templates directly in your environment.
- Assisted integration: wire your evaluator with the starter harness.
- Managed execution support: run a pilot loop with clear deliverables.
- Optional on-prem/offline support: operate entirely in client-controlled infrastructure.
If you are evaluating fit for a pilot, start with PILOT.md,
intake.md, or contact
contact@looptimum.com.
For first-impression and adoption feedback, use the GitHub Issues template at
.github/ISSUE_TEMPLATE/first-impressions.yml (Issues are the primary
feedback source of truth).
docs/how-it-works.mddocs/integration-guide.mddocs/integration-starter-kit.mddocs/aws-batch-integration.mddocs/operational-semantics.mddocs/recovery-playbook.mddocs/ci-knob-tuning.mddocs/stability-guarantees.mddocs/type-safety.mddocs/feedback-loop.mddocs/search-space.mddocs/constraints.mddocs/decision-trace.mddocs/pilot-checklist.mddocs/faq.mddocs/security-data-handling.mddocs/use-cases.mdclient_harness_template/README_INTEGRATION.mdquickstart/README.mdreports/phase8_release_readiness.mdreports/v0.2.0_release_execution_checklist.md
Install test dependencies:
python3 -m pip install -r requirements-dev.txtFor local work on the optional AWS Batch executor path, also install:
python3 -m pip install ".[aws]"Run repo test suites:
python3 -m pytest -q templates client_harness_template/tests service/testsOptional GP backend validation for bo_client:
RUN_GP_TESTS=1 python3 -m pytest -q \
templates/bo_client/tests/test_suggest.py::test_suggest_works_with_gp_backendFor machine parsing of suggest output, use:
python3 templates/bo_client_demo/run_bo.py suggest \
--project-root templates/bo_client_demo \
--json-onlyFor worker fan-out, use line-delimited output:
python3 templates/bo_client_demo/run_bo.py suggest \
--project-root templates/bo_client_demo \
--count 3 \
--jsonlBundle JSON, JSONL handoff, max_pending_trials, and lease-token examples are
captured in docs/examples/batch_async/README.md.
Starter webhook, scheduler, and tracker adapter examples are captured in
docs/examples/starterkit/README.md.