Skip to content

feat: define Autoresearch metric-experiment preset #184

@devkade

Description

@devkade

Summary

Define Autoresearch as a metric-experiment preset in the RunContract Harness.

Parent roadmap: #114
Track: C - Preset evolution
Related: #118, #167, #172

Problem

Autoresearch is the bounded experiment/optimization preset, but its RunContract-facing contract is not explicit. The harness needs a clear preset boundary for metric selection, benchmark execution, experiment ledger evidence, stopping rules, and anti-Goodhart safeguards.

Scope

  • Define the Autoresearch metric-experiment preset contract.
  • Specify required inputs: goal, benchmark or metric target, constraints, evidence standard, stop conditions, and acceptable trade-offs.
  • Specify required artifacts such as contract, benchmark/checks, experiment ledger, ideas, decision report, and verification report.
  • Define how objective/evaluation signals can inform Autoresearch without becoming hidden hard-blocking authority.
  • Identify follow-up implementation slices if existing Autoresearch behavior needs alignment.

Non-goals

  • No neural/RL learning implementation.
  • No unbounded autonomous research loop.
  • No score-only completion authority.
  • No runtime plugin/module retirement behavior.
  • No command rename or storage-root migration.

Acceptance criteria

  • Autoresearch preset contract is documented or implemented with tests.
  • Metric, benchmark, guardrail, and stop-condition expectations are explicit.
  • Experiment ledger evidence requirements are defined.
  • Anti-Goodhart safeguards reference the harness evaluator policy.
  • Objective/evaluation signals remain advisory unless a separate issue authorizes stronger gates.

Verification

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions