Skip to content

ddy6/looptimum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

174 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Looptimum

CI Latest Release

Optimum trial targeting for expensive black-box evaluations improves sample efficiency and operational reliability while reducing wasted trials.

Looptimum is a file-backed loop for optimum parameter targeting when each trial is costly (time, compute, money, or operational risk). You provide a parameter space and objective schema; Looptimum suggests the next trial, records decisions, and resumes cleanly after interruptions. Current stable release: v0.3.5. For expensive black-box objectives, Looptimum starts with bounded exploration and then shifts to surrogate-guided suggestion ranking to reduce wasted trials. Its key differentiator is operational: a file-backed, resumable workflow that keeps state and decision trace local, which fits restricted and client-controlled environments. The usage model stays simple (suggest -> evaluate -> ingest, with optional locked batches); see docs/how-it-works.md for algorithm behavior and tuning consequences. For a spec-style contract summary, use docs/quick-reference.md.

Evaluating Fit For A Pilot?

  • Private contact: contact@looptimum.com
  • Start here: PILOT.md, intake.md, docs/pilot-checklist.md
  • Best initial fit: bounded parameter spaces, one scalar objective or explicit scalarization rule, and expensive evaluations in client-controlled environments
  • Scope and delivery are tailored to the project; contact for scope

Common Triggers

  • "We're wasting time on parameter sweeps and manual tuning."
  • "Each run is expensive, so we need fewer total experiments."
  • "We can run evaluations, but we do not want to build optimization infra."
  • "Runs sometimes fail; we need resumable state and traceability."
  • "We have lots of knobs and no reliable way to tune them."

What Looptimum Does

Looptimum replaces ad hoc sweep loops with a small, explicit workflow:

  1. Define parameter bounds, objective schema, and optional constraints.
  2. suggest one trial by default, or allocate a locked batch with --count N.
  3. Run that trial in your environment.
  4. ingest the result and repeat.

Instead of broad grid/random sweeps, Looptimum uses prior observations to choose what to test next.

Trust Anchors

Every core claim in this README has an auditable source:

What Runs Where

Component Typical Location Responsibility
Looptimum controller Local machine, CI runner, or client host suggest, ingest, status, lifecycle + ops commands, local state
Evaluator Your runtime (script, cluster job, lab workflow, API) Execute one trial from suggested params
State and logs Local files under template state/ Resume, audit trail, best-so-far tracking
Local service preview Same host or nearby dev box Preview-only FastAPI wrapper over registered campaign roots; metadata registry only

Preview note:

Common Use Cases

  • Data/ETL pipelines: batch size, parallelism, retry/backoff, memory limits.
  • Infra/performance tuning: concurrency, cache TTLs, connection pools, thread counts.
  • Search/recommendation knobs: threshold and weighting calibration.
  • Pricing/growth experiments: eligibility thresholds, ramp controls, and guardrail tradeoffs.
  • Build and compile tuning: optimization flags, link-time settings, and benchmark-driven runtime tradeoffs.
  • ML training loops: learning rate, batch size, regularization, early-stop settings.
  • Large-model workflow tuning: training recipe knobs, evaluation-policy settings, and runtime controls for long-running jobs.
  • Simulation and engineering workflows: solver tolerances, mesh controls, calibration settings.
  • Operations/process tuning: throughput vs. quality/cost tradeoffs.

For many small-to-moderate parameter spaces, teams can find competitive configurations in fewer runs than naive sweeps (problem dependent).

Quickstart (2 Minutes)

From repo root:

python3 templates/bo_client_demo/run_bo.py demo \
  --project-root templates/bo_client_demo \
  --steps 5
python3 templates/bo_client_demo/run_bo.py status \
  --project-root templates/bo_client_demo

Real captured status output (from templates/bo_client_demo on March 3, 2026):

{
  "observations": 3,
  "pending": 0,
  "next_trial_id": 4,
  "best": {
    "trial_id": 2,
    "objective_name": "loss",
    "objective_value": 0.03128341826910849,
    "updated_at": 1772392830.7282188
  }
}

Key fields:

  • observations
  • pending
  • next_trial_id
  • best

Quickstart note:

  • The default template files and commands above use canonical JSON contract paths and run without compatibility/deprecation warnings on a clean copy.

For full command sets and resume behavior, see quickstart/README.md. For an opinionated mainstream scenario, see quickstart/etl-pipeline-knob-tuning.md. For interruption triage and recovery actions, see docs/recovery-playbook.md. For the local FastAPI wrapper over the same file-backed runtime, see docs/service-api-preview.md. For the read-only operator shell mounted from that preview service, see docs/dashboard-preview.md. For optional preview auth and role separation on that same service stack, see docs/auth-preview.md. For optional preview multi-controller coordination on that same service stack, see docs/coordination-preview.md. For the dedicated tiny end-to-end objective walkthrough, see examples/toy_objectives/03_tiny_quadratic_loop/README.md.

Evidence

Evidence artifacts for optimization-credibility checks are published in benchmarks/:

  • benchmark runner script: benchmarks/run_trial_efficiency_benchmark.py
  • committed compact summary (golden): benchmarks/summary.json
  • generated compact case study (derived from summary): benchmarks/case_study.md

Canonical Phase 8 protocol in this repository:

  • objective: tiny_quadratic
  • baseline: random search
  • metric: best objective at fixed budget
  • reproducibility: 10 seeds with median + IQR reporting

Re-run canonical evidence locally:

python3 benchmarks/run_trial_efficiency_benchmark.py \
  --objective tiny_quadratic \
  --budget 20 \
  --seeds 17,29,41,53,67,79,97,113,131,149 \
  --write-summary benchmarks/summary.json \
  --write-case-study benchmarks/case_study.md

Copy/Paste Evaluator Stub (Minimal)

Drop this into client_harness_template/objective.py to get started quickly:

def evaluate(params):
    x1 = float(params["x1"])
    x2 = float(params["x2"])
    loss = (x1 - 0.3) ** 2 + (x2 - 0.7) ** 2
    return {"status": "ok", "objective": loss}

Use this when your evaluator can return a scalar directly. For fuller failure handling (failed/timeout + terminal_reason + penalty_objective), use the expanded stub in docs/integration-guide.md#copy-paste-evaluator-stub-fuller-version.

When To Use Looptimum

  • Each evaluation is expensive enough that sample efficiency matters.
  • Your evaluator runs as external jobs and you want a thin outer loop above training/evaluation infrastructure.
  • You can define one scalar objective or an explicit scalarization / lexicographic rule for multiple objectives.
  • You have a bounded parameter set (commonly small-to-moderate dimensional).
  • You want resumable, file-backed operation in local/offline/restricted environments.
  • You prefer a small integration contract over building custom BO orchestration.

When Not To Use Looptimum

  • Objective evaluation is cheap and simple random/grid search is sufficient.
  • Reliable gradients are available and gradient-based methods are a better fit.
  • Search space is extremely high-dimensional without useful structure.
  • You cannot define a scalar objective or acceptable scalarization rule.

Contract (Current)

Inputs

  • Parameter space definition (float, int, bool, and categorical in public templates; numeric params can also declare scale, and params may use when for conditional activation).
  • Objective schema (required primary_objective, optional secondary_objectives, optional scalarization policy).
  • Trial budget and seed/config settings.

suggest Output

Count 1 keeps the historical single-suggestion payload. Count > 1 emits a bundle JSON object by default:

  • schema_version
  • count
  • suggestions (array of canonical suggestion payloads)

Use --jsonl to emit one canonical suggestion JSON object per line for worker handoff.

Each suggestion includes:

  • schema_version (semver string, emitted by runtime)
  • trial_id
  • params
  • suggested_at
  • lease_token (only when worker_leases.enabled is true)

ingest Required Fields

  • trial_id (must match a pending trial)
  • params (must match suggested params exactly)
  • objectives:
    • status: ok -> all configured objective values must be numeric and finite
    • non-ok status -> all configured objective values must be null
  • status: ok, failed, killed, timeout

ingest Optional Fields

  • schema_version (semver string, optional in schema and emitted by harness/runtime flows)
  • terminal_reason (short string for non-ok outcomes; recommended)
  • penalty_objective (number, only for non-ok statuses; reporting/compatibility only)

status Headline Fields

  • schema_version
  • observations
  • pending
  • next_trial_id
  • best
  • stale_pending
  • observations_by_status
  • paths

Best ranking rule:

  • best is computed only from status: "ok" observations.
  • Single-objective campaigns rank by the primary objective value.
  • Multi-objective campaigns rank by the configured scalarization or lexicographic policy while preserving raw objective vectors in status, manifests, and reports.
  • penalty_objective is never used to rank best.

Local State Files

  • state/bo_state.json: source of truth for observations/pending/best and required schema_version.
  • state/observations.csv: flattened observation export.
  • state/acquisition_log.jsonl: append-only decision trace.
  • state/event_log.jsonl: append-only lifecycle/operations trace, including governance override/violation events.
  • state/trials/trial_<id>/manifest.json: per-trial audit manifest.
  • state/report.json and state/report.md: explicit report outputs from report, including objective-config and Pareto summaries for multi-objective campaigns.

Compatibility Notes

  • Canonical statuses are ok, failed, killed, and timeout.
  • For non-ok outcomes with no reason provided, ingest synthesizes terminal_reason as status=<status>.
  • v0.2.x state without schema_version (or with 0.2.x) upgrades in-memory to 0.3.0 and persists on next mutating command.
  • Earlier v0.3.x state versions load transparently in v0.3.x.

Stability Promise (v0.3.x)

  • No breaking changes within the v0.3.x line for CLI command names/required flags, ingest required fields/status vocabulary, and core state-file compatibility.
  • Breaking changes are allowed only on 0.x major-line increments (for example 0.3 -> 0.4) and require explicit compatibility notes.
  • Current patch tag in this line: v0.3.5 (see CHANGELOG.md).
  • Full policy: docs/stability-guarantees.md.

Duplicate Ingest Behavior

  • Identical replay of an already ingested trial: explicit no-op success.
  • Conflicting replay for an already ingested trial: rejected with field-level diff details.

Runtime Ops Commands

  • cancel --trial-id <id>: operator-cancel a pending trial (recorded as terminal killed observation with reason).
  • retire --trial-id <id> or retire --stale: retire pending trials manually or by age policy.
  • heartbeat --trial-id <id>: update liveness metadata for long-running pending trials.
  • import-observations --input-file <path> [--import-mode strict|permissive]: seed terminal observations from canonical JSONL or flat CSV rows.
  • export-observations --output-file <path>: export canonical JSONL or flat CSV observations from authoritative state.
  • report: generate state/report.json + state/report.md.
  • reset [--yes] [--no-archive]: reset campaign runtime artifacts; archive is enabled by default.
  • list-archives: inspect reset archives and surface manifest/legacy integrity status.
  • restore --archive-id <id> [--yes]: restore archived runtime artifacts back into place.
  • prune-archives [--keep-last N] [--older-than-seconds S] [--yes]: remove older reset archives with explicit retention criteria.
  • health [--strict]: read-only runtime health snapshot with validate-aligned errors/warnings, lock visibility, and governance findings.
  • metrics: read-only runtime metrics snapshot with counts, pending-age buckets, suggest latency, and governance summaries.
  • validate [--strict]: sanity-check config/state; warnings are non-fatal unless --strict.
  • doctor [--json]: print environment/backend/state diagnostics.

Lease note:

  • when worker_leases.enabled is true, suggest emits lease_token and workers must echo it on heartbeat and ingest
  • max_pending_trials, when configured, rejects the whole requested batch before any pending state is created
  • permissive warm-start imports write machine-readable reports under state/import_reports/ and preserve source_trial_id as provenance only

Governance note:

  • bo_config.json can set governance.allowed_statuses plus warn-first retention.archives.* / retention.logs.* limits.
  • Looptimum does not auto-prune archives or rotate logs when those limits are exceeded; operators must act explicitly.
  • Mutating commands append governance_override_used when the runtime itself creates a disallowed terminal status, and governance_violations_detected when current state/log/archive footprints breach configured policy.

Templates (Choose Your Starting Level)

Template Matrix (Feature Parity + Intended Use)

Template Intended use Default backend Optional backend CLI/lifecycle parity
templates/bo_client_demo Fastest onboarding and contract validation rbf_proxy none full parity (suggest, ingest, import-observations, export-observations, status, demo, cancel, retire, heartbeat, report, reset, list-archives, restore, prune-archives, validate, doctor)
templates/bo_client Recommended baseline for most integrations rbf_proxy gp (config-selected) full parity
templates/bo_client_full Same public contract with optional feature-flag GP path rbf_proxy botorch_gp (--enable-botorch-gp / config flag) full parity

All template variants use the same canonical JSON contract file conventions and the same state/log artifact model under state/.

Examples and Case Studies

The examples/ folder shows integration patterns, not benchmark leaderboards.

  • examples/toy-objectives/01_python_function/: in-process evaluator pattern
  • examples/toy-objectives/02_subprocess_cli/: subprocess/CLI wrapper pattern
  • examples/toy_objectives/03_tiny_quadratic_loop/: dedicated tiny end-to-end objective (suggest -> evaluate -> ingest -> status, typically under one minute)
  • docs/examples/multi_objective/: generated multi-objective report/state pack with weighted-sum and lexicographic objective-schema examples
  • docs/examples/batch_async/: batch bundle, JSONL handoff, lease-token, and pending-state example pack
  • docs/examples/starterkit/: webhook config/payload examples, rendered Airflow/Slurm assets, queue-worker plan output, and tracker payload examples
  • docs/examples/warm_start/README.md: permissive import report, JSONL/CSV export, and manifest/state example pack for warm-start workflows

Run the tiny end-to-end objective from repo root:

python3 examples/toy_objectives/03_tiny_quadratic_loop/run_tiny_loop.py --steps 6

Case-Study Gallery (Mainstream-First)

  • ETL throughput tuning: optimize batch_size, worker count, and retry policy; score = cost_per_gb + latency_penalty.
  • API/service tuning: optimize concurrency limits, cache TTL, and timeout knobs; score = p95_latency + error_rate_penalty.
  • Search/ranking calibration: optimize blending weights and threshold gates; score = -relevance_metric + latency_penalty.
  • Simulation meshing (specialized): optimize mesh density/refinement controls; score = runtime + instability_penalty.
  • Assay/process protocol (specialized): optimize concentration/time/temperature; score = -yield + failure_penalty.
  • OpenFOAM-style workflow (specialized): optimize meshing/solver controls; score = wall_clock_time + nonconvergence_penalty.

Expanded gallery with equal mainstream/specialized coverage is in docs/use-cases.md.

Decision-Trace and CLI Transcript Assets

  • docs/examples/decision_trace/golden_acquisition_log.jsonl
  • docs/examples/decision_trace/golden_acquisition_log.md
  • docs/examples/decision_trace/cli_transcript.md

Pilot and Service Options

  • Self-serve: use templates directly in your environment.
  • Assisted integration: wire your evaluator with the starter harness.
  • Managed execution support: run a pilot loop with clear deliverables.
  • Optional on-prem/offline support: operate entirely in client-controlled infrastructure.

If you are evaluating fit for a pilot, start with PILOT.md, intake.md, or contact contact@looptimum.com. For first-impression and adoption feedback, use the GitHub Issues template at .github/ISSUE_TEMPLATE/first-impressions.yml (Issues are the primary feedback source of truth).

Deeper Docs

  • docs/how-it-works.md
  • docs/integration-guide.md
  • docs/integration-starter-kit.md
  • docs/aws-batch-integration.md
  • docs/operational-semantics.md
  • docs/recovery-playbook.md
  • docs/ci-knob-tuning.md
  • docs/stability-guarantees.md
  • docs/type-safety.md
  • docs/feedback-loop.md
  • docs/search-space.md
  • docs/constraints.md
  • docs/decision-trace.md
  • docs/pilot-checklist.md
  • docs/faq.md
  • docs/security-data-handling.md
  • docs/use-cases.md
  • client_harness_template/README_INTEGRATION.md
  • quickstart/README.md
  • reports/phase8_release_readiness.md
  • reports/v0.2.0_release_execution_checklist.md

Testing

Install test dependencies:

python3 -m pip install -r requirements-dev.txt

For local work on the optional AWS Batch executor path, also install:

python3 -m pip install ".[aws]"

Run repo test suites:

python3 -m pytest -q templates client_harness_template/tests service/tests

Optional GP backend validation for bo_client:

RUN_GP_TESTS=1 python3 -m pytest -q \
  templates/bo_client/tests/test_suggest.py::test_suggest_works_with_gp_backend

Automation Note (Machine-Readable Suggest)

For machine parsing of suggest output, use:

python3 templates/bo_client_demo/run_bo.py suggest \
  --project-root templates/bo_client_demo \
  --json-only

For worker fan-out, use line-delimited output:

python3 templates/bo_client_demo/run_bo.py suggest \
  --project-root templates/bo_client_demo \
  --count 3 \
  --jsonl

Bundle JSON, JSONL handoff, max_pending_trials, and lease-token examples are captured in docs/examples/batch_async/README.md. Starter webhook, scheduler, and tracker adapter examples are captured in docs/examples/starterkit/README.md.