Reference implementations of the Agentic Engineering Framework (AEF) — a framework for building the "agentic layer" that turns ad-hoc AI coding sessions into a repeatable engineering process.
📖 Read the guide · 🗂️ Pick a sample · 🛠️ Customize the layer
Table of contents
AEF is a framework, not a tool. It defines the patterns: prompt templates, quality gates, feedback loops, workflow config. This repo contains working samples that show one way to build those patterns on top of specific agent runtimes. Pick a sample that matches where your team is today, then fork and adapt it.
The AEF model in one paragraph. An AEF-shaped workflow has three layers. The agentic layer is what your team writes and versions: prompt templates, quality gates, tool permissions, workflow config. The workflow engine reads that layer and executes it: phase orchestration, state management, self-healing loops, escalation. The codebase is where agents do their work: reading, writing, testing, shipping code. The quality comes from the templates and gates you write, not from reviewing every output.
What a run looks like. A typical run cycles through phases: Plan → Build → Test → Review → Document → Deploy. When Test fails, a self-healing loop patches and retries before escalating. When Review finds blockers, a patch loop re-implements and re-tests. Each phase produces structured artifacts that the next phase consumes. State persists across the run so nothing starts from scratch.
The full AEF framework defines two additional phases: Intent (upstream of Plan, normalizes the request) and Monitor (downstream of Deploy, closes the feedback loop). None of the samples in this repo materialize Intent as a dedicated phase yet, and Monitor is only partially covered (e.g. the sprint-runner/claude/ live status server). Treat the 6-phase cycle above as the shape the samples implement, not the full framework surface.
| Sample | Autonomy | Config style | Runner | Best for |
|---|---|---|---|---|
human-in-loop |
L1 Assisted | Markdown checklists | Any agent | Learning the AEF patterns by hand |
automated-kiro |
L1–L4 capable | Declarative YAML + engine | Kiro CLI | Running AEF autonomously, configurable per team |
sprint-runner |
L3 Autonomous | Imperative Python + per-sprint state | Kiro CLI or Claude Code CLI | Installing AEF into existing repos; sprint-batch delivery with crash-resumable runs |
AEF defines four autonomy levels. L1 Assisted: human runs every phase, inspects every output. L2 Supervised: engine runs phases, human approves at checkpoints. L3 Autonomous: engine runs end-to-end, human reviews the resulting PR. L4 ASE: engine merges the PR. You move up the ladder as your gates and templates prove themselves, not by flipping a switch.
| Pattern | Kiro | Claude Code |
|---|---|---|
| Human-in-loop | human-in-loop/kiro/ |
human-in-loop/claude/ |
| Sprint runner | sprint-runner/kiro/ |
sprint-runner/claude/ |
| Automated | automated-kiro/ |
— |
| Primitive agents | agent-samples/ (tech-agnostic) |
Have you run an AEF-shaped workflow before?
├── no → human-in-loop (learn the shape by hand)
└── yes
└── Are you working in an existing repo and want sprint-batch delivery?
├── yes → sprint-runner (installer + crash-resumable orchestrator)
└── no → automated-kiro (declarative workflow.yaml engine, L1–L4 via config)
A sample Vite landing page paired with three markdown gate checklists. You invoke agents manually (/agent plan, /agent build, …), read their outputs, and evaluate the corresponding gate checklist before moving on. No orchestration code; you are the engine.
Quick start (Kiro):
cd human-in-loop/kiro
mkdir -p .kiro/agents
cp ../../agent-samples/agents/*.json .kiro/agents/
# Run each phase in order, evaluating the gate checklists between themAgent configs for the Kiro variant live in agent-samples/: six generic configs (plan, build, test, review, document, deploy) with matching prompt templates. The Claude variant ships its own prompts/ and agents/ folders; copy them into .claude/commands/ and .claude/agents/ instead.
See human-in-loop/README.md for the pattern overview and per-tech READMEs for the phase-by-phase flow.
A sample FastAPI app paired with a complete agentic layer and workflow engine. The engine runs all phases in sequence, invokes Kiro CLI per phase, evaluates YAML-defined gates, runs self-healing loops on failure, and produces structured escalation reports when healing exhausts.
Quick start:
cd automated-kiro/agentic-layer
uv sync
uv run run.py --local --spec "Add a /users endpoint" --issue-type featureOther triggers:
uv run run.py --api # FastAPI server (default port 8002)
uv run run.py --webhook # GitHub webhook receiver
uv run run.py --cron # Poller for open GitHub issuesSee automated-kiro/README.md for the full configuration reference.
An agent-driven SDLC pipeline shaped around installing into an existing git repository and batching multiple sprint briefs per run. The runner processes each sprint through 9 stages (branch → research → plan → build → test → e2e → review → document → commit → publish) with crash-resumable state and self-healing patch loops on every gate. No target app ships inside the sample; you install into an external git repo you already own.
Quick start (Kiro):
python sprint-runner/kiro/agentic-layer/tools/install.py <target-project>
cd <target-project>
python tools/sprint_runner.py --list-sprints
python tools/sprint_runner.py --sprint 03Quick start (Claude Code):
cp -r sprint-runner/claude/.claude <target-project>/
cp -r sprint-runner/claude/tools <target-project>/
cd <target-project>
python tools/sprint_runner.py --list-sprints
python tools/sprint_runner.py --sprint 03See sprint-runner/README.md for the pattern overview and per-tech READMEs for the full stage reference and installer/runner CLI details.
Every sample includes an agentic-layer/ directory. This is the artifact your team maintains.
- Prompts (
prompts/*.md) — one template per phase. Defines role, context, constraints, output format. Versioned with your code. Diff-friendly. Reviewable in PRs. - Gates (
gates/*.yamlorgates/*.md) — pass/fail criteria per phase. Test coverage thresholds, review severity rules, deploy-readiness checks. A failed gate triggers healing loops before escalating. - Tool permissions (
tools/*.yaml) — which tools each phase can access. Planners get read-only; builders get full access. Enforces least-privilege at the phase boundary. - Workflow config (
workflow.yaml) — phase ordering, loop retry limits, escalation rules. One file reshapes the entire pipeline.
When output quality drops, fix the layer, not the generated code. Patching the output treats the symptom. Improving the template treats the cause.
All three samples produce structured artifacts per phase and chain them forward. All enforce phase-separated tool permissions. All support self-healing: manually in human-in-loop, automatically in automated-kiro and sprint-runner. They differ in where the engine lives and how the agentic layer is configured.
The rows below trace AEF concepts (engine, phase ordering, artifact chaining, healing loops, gates, escalation, disposition), not the framework's eight phases. Phase coverage is a separate question: no sample currently implements Intent, and Monitor is only partially covered by sprint-runner/claude/'s live status server. Treat the table as a concept-level map, not a phase-by-phase audit.
| AEF concept | human-in-loop |
automated-kiro |
sprint-runner |
|---|---|---|---|
| Workflow engine | Human reads README and runs phases | engine/runner.py (PipelineRunner class) |
tools/sprint_runner.py (SprintExecutor class) |
| Phase ordering | Documented prose | workflow.yaml phases list |
Hard-coded STEP_ORDER in SprintExecutor.run() |
| Artifact chaining | Human copy-paste between conversations | ${plan_artifact} → ${build_artifact} substitution in templates |
File-path handoff via per-sprint state.json payloads |
| Test retry loop | Human pastes failures into builder agent | run_healing_loop() with configurable max retries |
test_patch_max cycle cap with regenerated patch specs |
| Review patch loop | Human pastes findings into builder agent | run_healing_loop() with strategy: patch |
review_patch_max cycle cap; generates specs/patch/patch-sprint-<id>-review-<n>.md |
| Gate evaluation | Human checks markdown checkboxes | GateEvaluator evaluates YAML criteria |
Structured JSON status checks per step (test/e2e/review result files) |
| Escalation | Continuous — human is always present | handle_escalation() writes forensic markdown |
MERGE_CONFLICT.md + merge-outcome.json in the run dir |
| Disposition | Human decides after each phase | Per-gate config, varies by autonomy level | Cycle caps in pipeline.yaml |
The two Kiro-based samples cover distinct operating models. Use this table to pick between them:
| Dimension | automated-kiro |
sprint-runner |
|---|---|---|
| Orchestration style | Declarative workflow.yaml interpreted by an engine |
Imperative Python (STEP_ORDER in SprintExecutor.run()) |
| Unit of work | A single GitHub issue per run | A sprint backlog (one or many sprints per run) |
| Target location | Bundled FastAPI demo app inside the sample | External git repo; no bundled app |
| Python deps | uv + pydantic + pyyaml + fastapi + uvicorn |
Stdlib only — no pip install |
| Triggers | CLI + API server + GitHub webhook + cron poller | CLI only |
| GitHub integration | gh CLI for PR creation; webhook receives issues |
Git subprocess only; no gh, no webhooks |
| State model | Manifest-based via engine/manifest.py |
Atomic state.json + RUNNING.lock heartbeat; crash-resumable via --resume / --resume-from |
| Install model | Copy agentic-layer/ into your project |
python install.py <target> scaffolds .kiro/ + tools/ automatically |
Common + per-sample prerequisites
Common:
- Node.js 18+ — for the
human-in-loopsample app - Kiro CLI on PATH, or
KIRO_CLI_PATHenv var — forautomated-kiro
automated-kiro:
- Python 3.11+ and
uvpackage manager ghCLI with a valid token, if you use the Deploy phase's PR creation
sprint-runner (Kiro variant):
- Python 3.11+ (stdlib only — no
pip installneeded) kiro-clionPATH- An existing git repository to install into
- Optional:
npxfor Playwright MCP used by the test/e2e/review browser role
sprint-runner (Claude Code variant):
- Python 3.11+ (stdlib only — no
pip installneeded) claudeCLI onPATH- An existing git repository to install into
Directories produced at runtime (add to .gitignore)
| Directory | Contents |
|---|---|
agent_runs_log/ |
Execution logs, prompt audit trails, and manifests per run |
specs/ |
Implementation plans generated by the Plan phase |
ai_docs/ |
Documentation generated by the Document phase and KPI reports |
.developer/sprint-runs/ |
Per-run state.json, step logs, patch specs, and merge outcomes (sprint-runner target projects) |
Each sample's agentic-layer/ is designed to be forked and modified.
- Prompts — edit the markdown templates to change how each phase behaves
- Agents — add or modify persona files to create specialized roles
- Gates — define new quality gates with custom criteria and healing strategies
- Workflow — edit
workflow.yamlto add or remove phases, change gate bindings, or adjust retry limits
The most team-specific prompt surface is the ## Plan Format section inside each planning prompt. Every team formats implementation plans differently. The samples ship an opinionated default with an invariants contract (what the rest of the pipeline depends on) and a safe-to-change list (what you can reshape). See the relevant sample's README for prompt-level customization guidance; sprint-runner/README.md covers this in depth.
The code is output. The agentic layer is the product. Improve a template and every future run benefits. Tighten a gate and every future build is held to the new standard. That's where the leverage is.
See CONTRIBUTING.md for contribution guidelines and the Code of Conduct.
This library is licensed under the MIT-0 License. See LICENSE.

