Design: define full-lifecycle preset catalog and graph-execution runtime components

# Design: define full-lifecycle preset catalog and graph-execution runtime components

Parent: #167
Related: #168, #194, #196, #198

## Purpose

Define the runtime meaning of the default full-lifecycle phase preset catalog and the component boundary for `graph-execution`.

#167 owns the roadmap structure. #198 owns concrete schema details. This issue owns preset semantics and execution component boundaries.

## Default full-lifecycle preset catalog

```text
Run
  -> standard-intake
  -> objective-approval
  -> policy-selection
  -> graph-execution
  -> objective-evaluation
  -> gated-integration
  -> record-and-calibrate
  -> evidence-sealed-close
```

Preset meanings:

- `standard-intake`: collect goal, requirements, constraints, success criteria, and ambiguity; produce draft RunObjective and intake evidence.
- `objective-approval`: turn draft RunObjective into approved RunObjective with acceptance criteria, guardrails, stop conditions, and repair thresholds.
- `policy-selection`: compare execution strategy candidates and record PolicySelection for worker count, scheduler mode, isolation, verification depth, repair budget, and related dimensions.
- `graph-execution`: use approved RunObjective plus PolicySelection to create and run a concrete TaskGraph through readiness, claim, lease, worker, and evidence gates.
- `objective-evaluation`: judge whether execution outputs satisfy RunObjective and decide pass, repair_required, human_decision_required, or abort.
- `gated-integration`: turn execution results into an integration candidate across worktree/git/GitHub/substrate concerns, including conflict, dry-run, cleanup, and retention checks.
- `record-and-calibrate`: record execution, evaluation, and integration outcomes into the reward ledger; produce policy calibration candidates without automatic policy mutation.
- `evidence-sealed-close`: seal required evidence, artifacts, reward ledger records, cleanup/retention state, and replay/audit readiness before closing the run.

## Evaluation / integration / learning / close semantics

`objective-evaluation` is a decision phase, not a pass/fail helper.

Outputs:

- `EvaluationResult`: success, partial_success, failure, or uncertain against approved RunObjective.
- `EvidenceAssessment`: evidence sufficiency, freshness, missing proof, and trust.
- `RepairDecision`: continue, retry, repair_required, human_decision_required, or abort.
- `ObjectiveDelta`: revision proposal when the RunObjective appears wrong or incomplete.

Rule: `objective-evaluation` must not mutate RunObjective directly. It may create an ObjectiveDelta / revision proposal; changing RunObjective must go back through objective approval.

`gated-integration` may start only after evaluation allows it:

```text
Allowed:
  success
  partial_success + continue
  human_decision_required + explicit approval

Blocked:
  failure + repair_required
  uncertain + missing/stale evidence
  abort
```

`gated-integration` produces the final integration outcome for learning, not just a merge result.

Outputs:

- `IntegrationCandidate`: diff, artifact, branch, worktree, or external ref to integrate.
- `IntegrationCheckResult`: merge/dry-run/conflict/smoke/substrate state.
- `CleanupRetentionPlan`: retain/cleanup decision for workers, worktrees, tmux, and artifacts.
- `IntegrationDecision`: integrate_ready, conflict, repair_required, human_decision_required, or abort.

`record-and-calibrate` records learning signals without changing policy directly.

Outputs:

- `RewardRecord`: RunObjective result plus integration outcome, cost, risk, and evidence quality.
- `PredictionDelta`: prediction-vs-actual delta from policy-selection estimates.
- `PolicyHintUpdate`: advisory hint for future policy-selection.

Rule: policy updates are not applied here. Future runs consume hints through policy-selection.

`evidence-sealed-close` is run sealing, not a plain stop.

Outputs:

- `ClosedRunRecord`
- `FinalReport`
- `RetentionState`

Close gate checks:

- required evidence is current/fresh;
- required artifacts are stored or referenceable;
- reward ledger record exists;
- cleanup/retention state is explicit;
- no unresolved blocker remains;
- replay/audit events are sealed.

Failure leaves the run in `blocked_close` or `human_decision_required`; an unsealed run is not complete.

## TaskGraph decision

TaskGraph is the default execution representation for every Execute phase, not only for team or parallel runs.

```text
TaskGraph = runtime primitive
Team / parallel = execution policy
```

Single-agent runs may have a single-node or linear TaskGraph. Team/parallel runs use the same TaskGraph model with richer scheduling, assignment, claim, lease, and recovery behavior.

## PolicySelection / graph-execution boundary

`policy-selection` chooses the execution strategy. It may produce candidate graph sketches for simulation, but it does not create the concrete runtime TaskGraph.

`graph-execution` creates the concrete TaskGraph from approved RunObjective plus selected PolicySelection:

- assign concrete runtime task ids;
- resolve dependencies;
- compute readiness;
- attach required inputs and evidence expectations;
- dispatch ready tasks through the selected scheduler/worker policy.

## Execute phase components

```text
graph-execution
  -> Decomposer
  -> Scheduler
  -> WorkerRuntime
  -> Verifier
```

Responsibilities:

- `Decomposer`: creates the concrete TaskGraph from approved RunObjective and PolicySelection.
- `Scheduler`: computes readiness from dependencies, blockers, claims, leases, and policy limits.
- `WorkerRuntime`: dispatches ready tasks to the chosen agent/substrate and manages claim, lease, heartbeat, and worker state.
- `Verifier`: validates task evidence, but is not limited to execute; it is a runtime-wide evidence component.

## GateEngine / Verifier split

```text
Verifier = validates evidence
GateEngine = decides whether a transition is allowed
```

`GateEngine` evaluates GateSpec across hard invariants, phase preset requirements, and RunObjective requirements.

`Verifier` validates EvidenceSpec, artifact/evidence freshness, and human approval evidence.

This separation matters because evidence may be valid while a transition is still blocked, or evidence may be invalid even when an agent claims completion.

## Thin uniform PhaseEngine contract

Each lifecycle phase may have its own engine, but every engine must use the same thin contract. Phase engines produce phase outputs; they do not own transition authority.

Conceptual contract:

```ts
interface PhaseEngine {
  phase: RunPhase;
  execute(ctx: PhaseContext): Promise<PhaseResult>;
}
```

PhaseEngine responsibilities:

- read PhaseContext;
- produce outputs, evidence refs, blockers, and proposed state patch;
- keep phase-specific logic small and local.

Shared runtime responsibilities:

- `RunOrchestrator`: phase order, phase_started / phase_completed / phase_blocked events, and phase advancement.
- `RunStateStore`: persisted RunState snapshots and state patch application.
- `EventStore`: append-only runtime events.
- `Verifier`: evidence validation.
- `GateEngine`: transition decision.

Non-authority rules:

- PhaseEngine must not mutate RunState directly.
- PhaseEngine must not verify its own completion.
- PhaseEngine must not bypass GateEngine.
- PhaseEngine must not own event persistence.

This keeps the eight-phase lifecycle explicit without turning each phase into a heavyweight subsystem.

## Shared runtime core

MVP shared runtime components:

- `RunOrchestrator`: phase order, event emission, PhaseEngine invocation, Verifier/GateEngine coordination, and phase advancement.
- `RunStateStore`: operational source of truth for RunState snapshots, state patches, version checks, and optimistic concurrency.
- `EventStore`: append-only audit/replay support for runtime events.
- `GateEngine`: evaluates HardInvariantGate, PhasePresetGate, and RunObjectiveGate; returns TransitionDecision.
- `Verifier`: validates EvidenceSpec, freshness, artifact refs, and human approval evidence.
- `PhaseRegistry`: maps phase -> PhaseEngine and loads the preset catalog.
- `SideEffectRunner`: executes allowlisted external actions after durable transition intent is committed.

`RecoveryManager` is not an MVP engine. In MVP, GateEngine returns allowed recovery branches in TransitionDecision; post-MVP RecoveryManager may execute those branches.

```ts
type TransitionDecision = {
  allowed: boolean;
  onFail?: "deny" | "block" | "repair_required" | "human_decision_required";
  allowedBranches?: BranchRef[];
  blocker?: Blocker;
};
```

## State, events, and side effects

MVP state model:

```text
RunState snapshot = operational source of truth
EventStore = append-only audit / replay support
```

The runtime is not fully event-sourced in MVP. Event-sourced replay can become stronger later after snapshots, events, and recovery behavior are proven.

Transition commit is exposed as one runtime operation:

```ts
commitTransition(patch, event)
```

`commitTransition` owns:

- RunState version check;
- state patch application;
- transition event append;
- partial-commit prevention as far as the local substrate allows.

Side effect order:

```text
1. PhaseEngine.execute()
2. Verifier validates evidence
3. GateEngine returns TransitionDecision
4. commitTransition(state patch, transition event)
5. SideEffectRunner executes external action, if any
6. EventStore appends side-effect result event
7. RunStateStore records side-effect result / recovery state
```

Rule:

```text
Durable transition intent before side effect.
```

## SideEffectRunner

PhaseEngine must not perform external actions directly. It returns SideEffectRequest values; Orchestrator commits transition intent first, then SideEffectRunner acts.

Allowlisted MVP side effect kinds:

```ts
type SideEffectKind =
  | "launch_worker"
  | "send_worker_input"
  | "create_worktree"
  | "write_artifact"
  | "run_command"
  | "post_external_update";
```

Common request shape:

```ts
type SideEffectRequest = {
  id: string;
  kind: SideEffectKind;
  phase: RunPhase;
  reason: string;
  idempotencyKey: string;
  requiresApproval?: boolean;
};
```

Rules:

- SideEffectRunner only executes allowlisted requests.
- Every side effect is linked to phase, reason, and idempotency key.
- High-risk/destructive requests must use `requiresApproval`.
- Side-effect results are recorded as events and reflected back into RunState.

## Acceptance criteria

- [ ] Define the default preset catalog as implementation-ready preset definitions using #198 schemas.
- [ ] Define the shared PhaseEngine contract and shared runtime authority boundaries.
- [ ] Define shared runtime core interfaces for RunOrchestrator, RunStateStore, EventStore, GateEngine, Verifier, PhaseRegistry, and SideEffectRunner.
- [ ] Define `commitTransition(patch, event)` semantics and side-effect ordering.
- [ ] Define SideEffectRequest allowlist, idempotency requirements, and result-event behavior.
- [ ] Define graph-execution component interfaces for Decomposer, Scheduler, WorkerRuntime, Verifier, and GateEngine.
- [ ] Define objective-evaluation, gated-integration, record-and-calibrate, and evidence-sealed-close as implementation-ready preset definitions using #198 schemas.
- [ ] Add examples for sequential single-agent, linear multi-step, DAG-parallel, and team-parallel runs.
- [ ] Define the boundary between policy graph sketches and concrete runtime TaskGraph creation.
- [ ] Link graph-execution decisions to #194 and #196 runtime phase implementation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design: define full-lifecycle preset catalog and graph-execution runtime components #199

Design: define full-lifecycle preset catalog and graph-execution runtime components

Purpose

Default full-lifecycle preset catalog

Evaluation / integration / learning / close semantics

TaskGraph decision

PolicySelection / graph-execution boundary

Execute phase components

GateEngine / Verifier split

Thin uniform PhaseEngine contract

Shared runtime core

State, events, and side effects

SideEffectRunner

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Design: define full-lifecycle preset catalog and graph-execution runtime components #199

Description

Design: define full-lifecycle preset catalog and graph-execution runtime components

Purpose

Default full-lifecycle preset catalog

Evaluation / integration / learning / close semantics

TaskGraph decision

PolicySelection / graph-execution boundary

Execute phase components

GateEngine / Verifier split

Thin uniform PhaseEngine contract

Shared runtime core

State, events, and side effects

SideEffectRunner

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions