You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
standard-intake: collect goal, requirements, constraints, success criteria, and ambiguity; produce draft RunObjective and intake evidence.
objective-approval: turn draft RunObjective into approved RunObjective with acceptance criteria, guardrails, stop conditions, and repair thresholds.
policy-selection: compare execution strategy candidates and record PolicySelection for worker count, scheduler mode, isolation, verification depth, repair budget, and related dimensions.
graph-execution: use approved RunObjective plus PolicySelection to create and run a concrete TaskGraph through readiness, claim, lease, worker, and evidence gates.
objective-evaluation: judge whether execution outputs satisfy RunObjective and decide pass, repair_required, human_decision_required, or abort.
gated-integration: turn execution results into an integration candidate across worktree/git/GitHub/substrate concerns, including conflict, dry-run, cleanup, and retention checks.
record-and-calibrate: record execution, evaluation, and integration outcomes into the reward ledger; produce policy calibration candidates without automatic policy mutation.
evidence-sealed-close: seal required evidence, artifacts, reward ledger records, cleanup/retention state, and replay/audit readiness before closing the run.
Evaluation / integration / learning / close semantics
objective-evaluation is a decision phase, not a pass/fail helper.
Outputs:
EvaluationResult: success, partial_success, failure, or uncertain against approved RunObjective.
EvidenceAssessment: evidence sufficiency, freshness, missing proof, and trust.
RepairDecision: continue, retry, repair_required, human_decision_required, or abort.
ObjectiveDelta: revision proposal when the RunObjective appears wrong or incomplete.
Rule: objective-evaluation must not mutate RunObjective directly. It may create an ObjectiveDelta / revision proposal; changing RunObjective must go back through objective approval.
gated-integration may start only after evaluation allows it:
CleanupRetentionPlan: retain/cleanup decision for workers, worktrees, tmux, and artifacts.
IntegrationDecision: integrate_ready, conflict, repair_required, human_decision_required, or abort.
record-and-calibrate records learning signals without changing policy directly.
Outputs:
RewardRecord: RunObjective result plus integration outcome, cost, risk, and evidence quality.
PredictionDelta: prediction-vs-actual delta from policy-selection estimates.
PolicyHintUpdate: advisory hint for future policy-selection.
Rule: policy updates are not applied here. Future runs consume hints through policy-selection.
evidence-sealed-close is run sealing, not a plain stop.
Outputs:
ClosedRunRecord
FinalReport
RetentionState
Close gate checks:
required evidence is current/fresh;
required artifacts are stored or referenceable;
reward ledger record exists;
cleanup/retention state is explicit;
no unresolved blocker remains;
replay/audit events are sealed.
Failure leaves the run in blocked_close or human_decision_required; an unsealed run is not complete.
TaskGraph decision
TaskGraph is the default execution representation for every Execute phase, not only for team or parallel runs.
TaskGraph = runtime primitive
Team / parallel = execution policy
Single-agent runs may have a single-node or linear TaskGraph. Team/parallel runs use the same TaskGraph model with richer scheduling, assignment, claim, lease, and recovery behavior.
PolicySelection / graph-execution boundary
policy-selection chooses the execution strategy. It may produce candidate graph sketches for simulation, but it does not create the concrete runtime TaskGraph.
graph-execution creates the concrete TaskGraph from approved RunObjective plus selected PolicySelection:
assign concrete runtime task ids;
resolve dependencies;
compute readiness;
attach required inputs and evidence expectations;
dispatch ready tasks through the selected scheduler/worker policy.
Decomposer: creates the concrete TaskGraph from approved RunObjective and PolicySelection.
Scheduler: computes readiness from dependencies, blockers, claims, leases, and policy limits.
WorkerRuntime: dispatches ready tasks to the chosen agent/substrate and manages claim, lease, heartbeat, and worker state.
Verifier: validates task evidence, but is not limited to execute; it is a runtime-wide evidence component.
GateEngine / Verifier split
Verifier = validates evidence
GateEngine = decides whether a transition is allowed
GateEngine evaluates GateSpec across hard invariants, phase preset requirements, and RunObjective requirements.
Verifier validates EvidenceSpec, artifact/evidence freshness, and human approval evidence.
This separation matters because evidence may be valid while a transition is still blocked, or evidence may be invalid even when an agent claims completion.
Thin uniform PhaseEngine contract
Each lifecycle phase may have its own engine, but every engine must use the same thin contract. Phase engines produce phase outputs; they do not own transition authority.
RunStateStore: operational source of truth for RunState snapshots, state patches, version checks, and optimistic concurrency.
EventStore: append-only audit/replay support for runtime events.
GateEngine: evaluates HardInvariantGate, PhasePresetGate, and RunObjectiveGate; returns TransitionDecision.
Verifier: validates EvidenceSpec, freshness, artifact refs, and human approval evidence.
PhaseRegistry: maps phase -> PhaseEngine and loads the preset catalog.
SideEffectRunner: executes allowlisted external actions after durable transition intent is committed.
RecoveryManager is not an MVP engine. In MVP, GateEngine returns allowed recovery branches in TransitionDecision; post-MVP RecoveryManager may execute those branches.
RunState snapshot = operational source of truth
EventStore = append-only audit / replay support
The runtime is not fully event-sourced in MVP. Event-sourced replay can become stronger later after snapshots, events, and recovery behavior are proven.
Transition commit is exposed as one runtime operation:
commitTransition(patch,event)
commitTransition owns:
RunState version check;
state patch application;
transition event append;
partial-commit prevention as far as the local substrate allows.
Side effect order:
1. PhaseEngine.execute()
2. Verifier validates evidence
3. GateEngine returns TransitionDecision
4. commitTransition(state patch, transition event)
5. SideEffectRunner executes external action, if any
6. EventStore appends side-effect result event
7. RunStateStore records side-effect result / recovery state
Rule:
Durable transition intent before side effect.
SideEffectRunner
PhaseEngine must not perform external actions directly. It returns SideEffectRequest values; Orchestrator commits transition intent first, then SideEffectRunner acts.
Design: define full-lifecycle preset catalog and graph-execution runtime components
Parent: #167
Related: #168, #194, #196, #198
Purpose
Define the runtime meaning of the default full-lifecycle phase preset catalog and the component boundary for
graph-execution.#167 owns the roadmap structure. #198 owns concrete schema details. This issue owns preset semantics and execution component boundaries.
Default full-lifecycle preset catalog
Preset meanings:
standard-intake: collect goal, requirements, constraints, success criteria, and ambiguity; produce draft RunObjective and intake evidence.objective-approval: turn draft RunObjective into approved RunObjective with acceptance criteria, guardrails, stop conditions, and repair thresholds.policy-selection: compare execution strategy candidates and record PolicySelection for worker count, scheduler mode, isolation, verification depth, repair budget, and related dimensions.graph-execution: use approved RunObjective plus PolicySelection to create and run a concrete TaskGraph through readiness, claim, lease, worker, and evidence gates.objective-evaluation: judge whether execution outputs satisfy RunObjective and decide pass, repair_required, human_decision_required, or abort.gated-integration: turn execution results into an integration candidate across worktree/git/GitHub/substrate concerns, including conflict, dry-run, cleanup, and retention checks.record-and-calibrate: record execution, evaluation, and integration outcomes into the reward ledger; produce policy calibration candidates without automatic policy mutation.evidence-sealed-close: seal required evidence, artifacts, reward ledger records, cleanup/retention state, and replay/audit readiness before closing the run.Evaluation / integration / learning / close semantics
objective-evaluationis a decision phase, not a pass/fail helper.Outputs:
EvaluationResult: success, partial_success, failure, or uncertain against approved RunObjective.EvidenceAssessment: evidence sufficiency, freshness, missing proof, and trust.RepairDecision: continue, retry, repair_required, human_decision_required, or abort.ObjectiveDelta: revision proposal when the RunObjective appears wrong or incomplete.Rule:
objective-evaluationmust not mutate RunObjective directly. It may create an ObjectiveDelta / revision proposal; changing RunObjective must go back through objective approval.gated-integrationmay start only after evaluation allows it:gated-integrationproduces the final integration outcome for learning, not just a merge result.Outputs:
IntegrationCandidate: diff, artifact, branch, worktree, or external ref to integrate.IntegrationCheckResult: merge/dry-run/conflict/smoke/substrate state.CleanupRetentionPlan: retain/cleanup decision for workers, worktrees, tmux, and artifacts.IntegrationDecision: integrate_ready, conflict, repair_required, human_decision_required, or abort.record-and-calibraterecords learning signals without changing policy directly.Outputs:
RewardRecord: RunObjective result plus integration outcome, cost, risk, and evidence quality.PredictionDelta: prediction-vs-actual delta from policy-selection estimates.PolicyHintUpdate: advisory hint for future policy-selection.Rule: policy updates are not applied here. Future runs consume hints through policy-selection.
evidence-sealed-closeis run sealing, not a plain stop.Outputs:
ClosedRunRecordFinalReportRetentionStateClose gate checks:
Failure leaves the run in
blocked_closeorhuman_decision_required; an unsealed run is not complete.TaskGraph decision
TaskGraph is the default execution representation for every Execute phase, not only for team or parallel runs.
Single-agent runs may have a single-node or linear TaskGraph. Team/parallel runs use the same TaskGraph model with richer scheduling, assignment, claim, lease, and recovery behavior.
PolicySelection / graph-execution boundary
policy-selectionchooses the execution strategy. It may produce candidate graph sketches for simulation, but it does not create the concrete runtime TaskGraph.graph-executioncreates the concrete TaskGraph from approved RunObjective plus selected PolicySelection:Execute phase components
Responsibilities:
Decomposer: creates the concrete TaskGraph from approved RunObjective and PolicySelection.Scheduler: computes readiness from dependencies, blockers, claims, leases, and policy limits.WorkerRuntime: dispatches ready tasks to the chosen agent/substrate and manages claim, lease, heartbeat, and worker state.Verifier: validates task evidence, but is not limited to execute; it is a runtime-wide evidence component.GateEngine / Verifier split
GateEngineevaluates GateSpec across hard invariants, phase preset requirements, and RunObjective requirements.Verifiervalidates EvidenceSpec, artifact/evidence freshness, and human approval evidence.This separation matters because evidence may be valid while a transition is still blocked, or evidence may be invalid even when an agent claims completion.
Thin uniform PhaseEngine contract
Each lifecycle phase may have its own engine, but every engine must use the same thin contract. Phase engines produce phase outputs; they do not own transition authority.
Conceptual contract:
PhaseEngine responsibilities:
Shared runtime responsibilities:
RunOrchestrator: phase order, phase_started / phase_completed / phase_blocked events, and phase advancement.RunStateStore: persisted RunState snapshots and state patch application.EventStore: append-only runtime events.Verifier: evidence validation.GateEngine: transition decision.Non-authority rules:
This keeps the eight-phase lifecycle explicit without turning each phase into a heavyweight subsystem.
Shared runtime core
MVP shared runtime components:
RunOrchestrator: phase order, event emission, PhaseEngine invocation, Verifier/GateEngine coordination, and phase advancement.RunStateStore: operational source of truth for RunState snapshots, state patches, version checks, and optimistic concurrency.EventStore: append-only audit/replay support for runtime events.GateEngine: evaluates HardInvariantGate, PhasePresetGate, and RunObjectiveGate; returns TransitionDecision.Verifier: validates EvidenceSpec, freshness, artifact refs, and human approval evidence.PhaseRegistry: maps phase -> PhaseEngine and loads the preset catalog.SideEffectRunner: executes allowlisted external actions after durable transition intent is committed.RecoveryManageris not an MVP engine. In MVP, GateEngine returns allowed recovery branches in TransitionDecision; post-MVP RecoveryManager may execute those branches.State, events, and side effects
MVP state model:
The runtime is not fully event-sourced in MVP. Event-sourced replay can become stronger later after snapshots, events, and recovery behavior are proven.
Transition commit is exposed as one runtime operation:
commitTransitionowns:Side effect order:
Rule:
SideEffectRunner
PhaseEngine must not perform external actions directly. It returns SideEffectRequest values; Orchestrator commits transition intent first, then SideEffectRunner acts.
Allowlisted MVP side effect kinds:
Common request shape:
Rules:
requiresApproval.Acceptance criteria
commitTransition(patch, event)semantics and side-effect ordering.