Skip to content

Design: define full-lifecycle preset catalog and graph-execution runtime components #199

@devkade

Description

@devkade

Design: define full-lifecycle preset catalog and graph-execution runtime components

Parent: #167
Related: #168, #194, #196, #198

Purpose

Define the runtime meaning of the default full-lifecycle phase preset catalog and the component boundary for graph-execution.

#167 owns the roadmap structure. #198 owns concrete schema details. This issue owns preset semantics and execution component boundaries.

Default full-lifecycle preset catalog

Run
  -> standard-intake
  -> objective-approval
  -> policy-selection
  -> graph-execution
  -> objective-evaluation
  -> gated-integration
  -> record-and-calibrate
  -> evidence-sealed-close

Preset meanings:

  • standard-intake: collect goal, requirements, constraints, success criteria, and ambiguity; produce draft RunObjective and intake evidence.
  • objective-approval: turn draft RunObjective into approved RunObjective with acceptance criteria, guardrails, stop conditions, and repair thresholds.
  • policy-selection: compare execution strategy candidates and record PolicySelection for worker count, scheduler mode, isolation, verification depth, repair budget, and related dimensions.
  • graph-execution: use approved RunObjective plus PolicySelection to create and run a concrete TaskGraph through readiness, claim, lease, worker, and evidence gates.
  • objective-evaluation: judge whether execution outputs satisfy RunObjective and decide pass, repair_required, human_decision_required, or abort.
  • gated-integration: turn execution results into an integration candidate across worktree/git/GitHub/substrate concerns, including conflict, dry-run, cleanup, and retention checks.
  • record-and-calibrate: record execution, evaluation, and integration outcomes into the reward ledger; produce policy calibration candidates without automatic policy mutation.
  • evidence-sealed-close: seal required evidence, artifacts, reward ledger records, cleanup/retention state, and replay/audit readiness before closing the run.

Evaluation / integration / learning / close semantics

objective-evaluation is a decision phase, not a pass/fail helper.

Outputs:

  • EvaluationResult: success, partial_success, failure, or uncertain against approved RunObjective.
  • EvidenceAssessment: evidence sufficiency, freshness, missing proof, and trust.
  • RepairDecision: continue, retry, repair_required, human_decision_required, or abort.
  • ObjectiveDelta: revision proposal when the RunObjective appears wrong or incomplete.

Rule: objective-evaluation must not mutate RunObjective directly. It may create an ObjectiveDelta / revision proposal; changing RunObjective must go back through objective approval.

gated-integration may start only after evaluation allows it:

Allowed:
  success
  partial_success + continue
  human_decision_required + explicit approval

Blocked:
  failure + repair_required
  uncertain + missing/stale evidence
  abort

gated-integration produces the final integration outcome for learning, not just a merge result.

Outputs:

  • IntegrationCandidate: diff, artifact, branch, worktree, or external ref to integrate.
  • IntegrationCheckResult: merge/dry-run/conflict/smoke/substrate state.
  • CleanupRetentionPlan: retain/cleanup decision for workers, worktrees, tmux, and artifacts.
  • IntegrationDecision: integrate_ready, conflict, repair_required, human_decision_required, or abort.

record-and-calibrate records learning signals without changing policy directly.

Outputs:

  • RewardRecord: RunObjective result plus integration outcome, cost, risk, and evidence quality.
  • PredictionDelta: prediction-vs-actual delta from policy-selection estimates.
  • PolicyHintUpdate: advisory hint for future policy-selection.

Rule: policy updates are not applied here. Future runs consume hints through policy-selection.

evidence-sealed-close is run sealing, not a plain stop.

Outputs:

  • ClosedRunRecord
  • FinalReport
  • RetentionState

Close gate checks:

  • required evidence is current/fresh;
  • required artifacts are stored or referenceable;
  • reward ledger record exists;
  • cleanup/retention state is explicit;
  • no unresolved blocker remains;
  • replay/audit events are sealed.

Failure leaves the run in blocked_close or human_decision_required; an unsealed run is not complete.

TaskGraph decision

TaskGraph is the default execution representation for every Execute phase, not only for team or parallel runs.

TaskGraph = runtime primitive
Team / parallel = execution policy

Single-agent runs may have a single-node or linear TaskGraph. Team/parallel runs use the same TaskGraph model with richer scheduling, assignment, claim, lease, and recovery behavior.

PolicySelection / graph-execution boundary

policy-selection chooses the execution strategy. It may produce candidate graph sketches for simulation, but it does not create the concrete runtime TaskGraph.

graph-execution creates the concrete TaskGraph from approved RunObjective plus selected PolicySelection:

  • assign concrete runtime task ids;
  • resolve dependencies;
  • compute readiness;
  • attach required inputs and evidence expectations;
  • dispatch ready tasks through the selected scheduler/worker policy.

Execute phase components

graph-execution
  -> Decomposer
  -> Scheduler
  -> WorkerRuntime
  -> Verifier

Responsibilities:

  • Decomposer: creates the concrete TaskGraph from approved RunObjective and PolicySelection.
  • Scheduler: computes readiness from dependencies, blockers, claims, leases, and policy limits.
  • WorkerRuntime: dispatches ready tasks to the chosen agent/substrate and manages claim, lease, heartbeat, and worker state.
  • Verifier: validates task evidence, but is not limited to execute; it is a runtime-wide evidence component.

GateEngine / Verifier split

Verifier = validates evidence
GateEngine = decides whether a transition is allowed

GateEngine evaluates GateSpec across hard invariants, phase preset requirements, and RunObjective requirements.

Verifier validates EvidenceSpec, artifact/evidence freshness, and human approval evidence.

This separation matters because evidence may be valid while a transition is still blocked, or evidence may be invalid even when an agent claims completion.

Thin uniform PhaseEngine contract

Each lifecycle phase may have its own engine, but every engine must use the same thin contract. Phase engines produce phase outputs; they do not own transition authority.

Conceptual contract:

interface PhaseEngine {
  phase: RunPhase;
  execute(ctx: PhaseContext): Promise<PhaseResult>;
}

PhaseEngine responsibilities:

  • read PhaseContext;
  • produce outputs, evidence refs, blockers, and proposed state patch;
  • keep phase-specific logic small and local.

Shared runtime responsibilities:

  • RunOrchestrator: phase order, phase_started / phase_completed / phase_blocked events, and phase advancement.
  • RunStateStore: persisted RunState snapshots and state patch application.
  • EventStore: append-only runtime events.
  • Verifier: evidence validation.
  • GateEngine: transition decision.

Non-authority rules:

  • PhaseEngine must not mutate RunState directly.
  • PhaseEngine must not verify its own completion.
  • PhaseEngine must not bypass GateEngine.
  • PhaseEngine must not own event persistence.

This keeps the eight-phase lifecycle explicit without turning each phase into a heavyweight subsystem.

Shared runtime core

MVP shared runtime components:

  • RunOrchestrator: phase order, event emission, PhaseEngine invocation, Verifier/GateEngine coordination, and phase advancement.
  • RunStateStore: operational source of truth for RunState snapshots, state patches, version checks, and optimistic concurrency.
  • EventStore: append-only audit/replay support for runtime events.
  • GateEngine: evaluates HardInvariantGate, PhasePresetGate, and RunObjectiveGate; returns TransitionDecision.
  • Verifier: validates EvidenceSpec, freshness, artifact refs, and human approval evidence.
  • PhaseRegistry: maps phase -> PhaseEngine and loads the preset catalog.
  • SideEffectRunner: executes allowlisted external actions after durable transition intent is committed.

RecoveryManager is not an MVP engine. In MVP, GateEngine returns allowed recovery branches in TransitionDecision; post-MVP RecoveryManager may execute those branches.

type TransitionDecision = {
  allowed: boolean;
  onFail?: "deny" | "block" | "repair_required" | "human_decision_required";
  allowedBranches?: BranchRef[];
  blocker?: Blocker;
};

State, events, and side effects

MVP state model:

RunState snapshot = operational source of truth
EventStore = append-only audit / replay support

The runtime is not fully event-sourced in MVP. Event-sourced replay can become stronger later after snapshots, events, and recovery behavior are proven.

Transition commit is exposed as one runtime operation:

commitTransition(patch, event)

commitTransition owns:

  • RunState version check;
  • state patch application;
  • transition event append;
  • partial-commit prevention as far as the local substrate allows.

Side effect order:

1. PhaseEngine.execute()
2. Verifier validates evidence
3. GateEngine returns TransitionDecision
4. commitTransition(state patch, transition event)
5. SideEffectRunner executes external action, if any
6. EventStore appends side-effect result event
7. RunStateStore records side-effect result / recovery state

Rule:

Durable transition intent before side effect.

SideEffectRunner

PhaseEngine must not perform external actions directly. It returns SideEffectRequest values; Orchestrator commits transition intent first, then SideEffectRunner acts.

Allowlisted MVP side effect kinds:

type SideEffectKind =
  | "launch_worker"
  | "send_worker_input"
  | "create_worktree"
  | "write_artifact"
  | "run_command"
  | "post_external_update";

Common request shape:

type SideEffectRequest = {
  id: string;
  kind: SideEffectKind;
  phase: RunPhase;
  reason: string;
  idempotencyKey: string;
  requiresApproval?: boolean;
};

Rules:

  • SideEffectRunner only executes allowlisted requests.
  • Every side effect is linked to phase, reason, and idempotency key.
  • High-risk/destructive requests must use requiresApproval.
  • Side-effect results are recorded as events and reflected back into RunState.

Acceptance criteria

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions