How to run a software project when some of your contributors are AI agents - and one of them just panic-refactored your auth middleware at 2am while a different one was halfway through the same task.
Thirteen short docs. Markdown + git. No SaaS, no signup, no vendor lock-in. An AI agent picks up the operating contract in seconds via templates/. A human grasps the core concept in 5 minutes via the CHEATSHEET. Full reading is a focused day. Use forever (or until you find something better).
By MiklΓ³s PolgΓ‘r (polgarmiklos@gmail.com) - CC BY 4.0. Fork it, ship it, charge for it, teach it. Just keep the credit.
- Four planning layers - strategy β pillars β epics β items. Each answers a different question.
- Three discipline overlays - working principles, Definition of Done, lessons-learned memory. They bind every change.
- File-based locks with TTL so two AI agents (or two humans) don't both grab the same item.
- Fix-test loop for the actual UI because "tests pass" doesn't mean "the page renders."
- Cross-AI validation + user testing as the final gates.
- Autonomous goal-oriented development cycles - paste-and-adapt
AUTONOMOUS_LOOP.mdprompt drives multi-hour unattended runs toward named milestones; tiered autonomy on authoritative artifacts (cosmetic auto-patch with cross-AI diff-verify; substantive maintainer-authored). - Milestone-driven deep-eval every Nth loop iteration - 0β10 rubric per area; unsolvable issues get handled/postponed/marked, never forced.
- Plan before non-trivial work. Use your tool's plan mode.
- Battle-tested in one production project + self-applied (see
self-development/). Currently v1.17.3. - Quick reference: CHEATSHEET.md. Worked example:
examples/.
Four planning layers cascade downward; three disciplines bind every change at every layer; three operational supports make the daily work navigable.
flowchart TB
classDef planning fill:#dbeafe,stroke:#1e40af,color:#1e3a8a
classDef discipline fill:#fef3c7,stroke:#b45309,color:#78350f
classDef support fill:#dcfce7,stroke:#15803d,color:#14532d
subgraph PLANNING [" PLANNING - cascades downward "]
direction TB
S["<b>Strategy</b><br/><i>why Β· phases Β· outcomes</i>"]
P["<b>Pillars</b><br/><i>long-term capability goals Β· evergreen</i>"]
E["<b>Epics</b><br/><i>3β12 week containers</i>"]
I["<b>Items</b><br/><i>1β2 weeks for humans Β· daily for AI</i>"]
S --> P --> E --> I
end
subgraph DISCIPLINE [" DISCIPLINES - bind every change "]
direction LR
W["<b>Working Principles</b><br/><i>think Β· simple Β· surgical Β· goal-driven</i>"]
D["<b>Definition of Done</b><br/><i>6 gates Β· Status:done requires Test:pass</i>"]
M["<b>Memory</b><br/><i>instruction file + memory directory</i>"]
end
subgraph SUPPORT [" OPERATIONAL SUPPORTS "]
direction LR
L["<b>Locks</b><br/><i>TTL Β· humans + AI<br/>same protocol</i>"]
G["<b>Git Workflow</b><br/><i>branch protection<br/>no force-push</i>"]
T["<b>Fix-Test Loop</b><br/><i>actual UI Β· cross-AI<br/>user testing</i>"]
end
PLANNING -.->|bound by| DISCIPLINE
DISCIPLINE -.->|enforced via| SUPPORT
PLANNING -.->|moves via| SUPPORT
class S,P,E,I planning
class W,D,M discipline
class L,G,T support
For the file layout and how the cascade physically lives on disk, see How the work cascades below.
The concrete payoffs after a week or two of adoption:
- Work stops drifting. Items trace from today's commit back to a phase in the strategy. "Why are we doing this?" is answerable without re-arguing it every quarter.
- The backlog tells the truth.
Status: donerequiresTest: pass- no partial credit. Trust in the backlog comes back. - Parallel agents stop colliding. File-based locks with TTL handle the coordination that chat-based "I've got this" can't.
- The same mistake doesn't come back. Memory entries make recurring fixes a one-time cost.
- AI agents stay on task. Four working principles forbid the speculative-abstraction / scope-creep / "let me also clean this up" pattern that LLMs default to.
- Verification catches what tests miss. White pages, broken dark mode, bypassed auth gates, missing imports - caught at the actual-UI gate, not in production.
- AI-as-yes-man becomes visible. A copy-paste counter-prompt ("what's wrong with this plan?") makes you challenge before approving.
- The cheating agent gets caught. When one model writes both the implementation and the tests that validate it, the green suite hides bugs. A different model auditing catches them.
- Portable across tools. Switching Claude Code β Cursor β Codex doesn't mean re-learning your process - only swapping the project-instruction filename.
- No vendor lock-in. Markdown and git. If your AI tool is gone in 18 months, your methodology isn't.
Most projects accumulate the same failure modes once they last more than a few weeks. AI-assisted projects accumulate them twice as fast - the effective contributor count doubles and the new contributors don't sleep.
| The problem | How this set closes it |
|---|---|
| Direction drifts; every quarter re-litigates "what are we building?" | Strategy docs - versioned phases with exit criteria. |
| "Done" means whatever the contributor decides. | Definition of Done - six binary gates; hard rule Status: done requires Test: pass. |
| "Tests pass" but the page is white, dark mode broken, auth bypassed. | Actual-UI fix-test loop with required dimensions. |
| Lessons evaporate; same mistake every six months. | Two-layer memory - instruction file + memory directory. |
| Parallel contributors collide; two agents grab the same item silently. | File-based locks with TTL - humans and agents, same protocol. |
| AI agents wander off-task - speculation, scope creep, "while I'm here" refactors. | Working principles - distilled from real LLM failure modes. |
| AI agrees with you and you're both wrong. | Challenge before consenting. |
| AI writes broken code AND broken tests that validate it. | Cheating agent anti-pattern + cross-AI validation. |
| Humans become strangers in their own codebase. | Human roles - supervisory layer, four anti-patterns. |
| The trunk breaks; force-push, destructive command, day gone. | Git workflow rules - branch protection, AI never deploys, never destructive. |
| Work doesn't compound across sessions, contributors, or tools - each new session re-derives the context. | Plans, items, and memory all persist in files. Drop items today; another agent picks them up next week. The backlog is the queue. |
| Picking the next item turns into "whichever feels interesting"; cheap high-value work gets skipped. | ROI-based prioritization - Priority: + Effort: fields make "highest-impact-per-effort" the default picking rule. Deviation is explicit, not silent. |
| Human-blocked work freezes agents indefinitely; AI sits on a lock for credentials it'll never get. | HUMAN_NEEDED.md - dedicated file tracks blocked items so agents release the lock and move on; humans see pending delegations in one place. |
| Long autonomous runs (overnight, weekend, milestone push) drift without a structure to ratchet against. | AUTONOMOUS_LOOP.md - loop prompt that picks the highest-impact ready item, executes through the DoD, archives, repeats. Stops at milestone, when no ready items remain, or on user check-in. |
| Done items pile up and become unsearchable; deferred ideas get lost. | ARCHIVE.md keeps every done item grep-able forever. FUTURE.md keeps deferred ideas alive without cluttering active work. Both are standard files in every epic folder. |
ai-development-methodology/
βββ README.md # this file
βββ CHANGELOG.md # version history (self-applies the methodology)
βββ CHEATSHEET.md # one-page quick reference (NEW v1.17.3)
βββ LICENSE # CC BY 4.0
βββ STATUS.md # maintenance posture
βββ methodology/ # the 13 methodology docs (00β12; doc 12 NEW v1.17.3)
βββ templates/
β βββ CLAUDE.md # project-instruction file (Claude Code)
β βββ AGENTS.md # vendor-neutral version (extra plan/tool/safety sections)
β βββ AGENT_KICKOFF.md # planning-mode prompt for new projects
β βββ AUTONOMOUS_LOOP.md # prompt for long autonomous dev sessions (extended v1.17.3 with periodic deep-eval)
β βββ PROJECT_STRUCTURE.md # recommended folder layout + naming conventions
βββ examples/ # NEW v1.17.3 - fictional `tinker` project showing methodology applied end-to-end
β βββ README.md # 3-row comparison: methodology/ vs self-development/ vs examples/
β βββ example-project/
β βββ README.md
β βββ strategy/00_master_plan.md
β βββ pillars/P1_capture.md, P2_retrieval.md
β βββ backlog/TEST_BACKLOG.md + EPICS.md + epics/E01-cli-foundations/ (charter + BACKLOG + ARCHIVE + FUTURE + TEST)
βββ self-development/ # the methodology applied to its own development
βββ AUTONOMOUS_LOOP.md # Step 4 - adapted loop config (operational cycle)
βββ brief/ # Step 0 outputs - vision, audience, competitive landscape, etc.
βββ strategy/ # Step 1 - master plan (vision + 4 phases + pillar roadmap)
βββ pillars/ # Step 1 - 9 capability-layer pillars (P1..P9)
βββ backlog/ # Step 2 - 5 epic charters; Step 3 - items inside active epics
βββ evaluations/ # semi-annual self-eval reports (first pass 2026-05-25)
βββ loop-notes/ # loop-detected methodology insights for maintainer review
~13,000+ lines across 60+ files at v1.17.3. Longest doc ~1,000 lines. Each doc is self-contained - read in any order.
From "your brief" (the upstream work the methodology doesn't do) all the way down to a single line in a BACKLOG.md file - and where each artifact lives on disk.
flowchart TB
BRIEF["π <b>Your brief</b> - Step 0, BEFORE the methodology kicks in<br/><i>what Β· who Β· success metrics Β· competitors Β· business viability Β· tech stack Β· 5β10 capability layers</i>"]
BRIEF ==>|"answers become strategy docs"| STRAT
STRAT["π <b>docs/strategy/</b> (the WHY)<br/>00_master_plan.md - vision Β· phases Β· outcomes<br/>+ supporting docs: 01_market Β· 02_differentiation Β· ... 10_roadmap<br/><i>versioned snapshots; never overwritten</i>"]
STRAT ==>|"strategy defines which capabilities matter"| PIL
PIL["π <b>docs/pillars/</b> (the LONG-TERM GOALS)<br/>P1_<area>.md Β· P2_<area>.md Β· ... PN_<area>.md<br/><i>5β10 evergreen capability goals, sequentially dependent</i>"]
PIL ==>|"each pillar advanced by epics"| EPICS
PIL -.->|"design exploration first"| PLAN
PLAN["π¨ <b>docs/planning/</b> <i>(optional)</i><br/>pre-epic design work Β· becomes the charter when ready"]
PLAN -.-> EPICS
EPICS["π <b>backlog/epics/E<NN>-<slug>/</b> (3β12 week delivery containers)<br/>βββ <b>README.md</b> β charter: primary pillar, binary exit criteria, out-of-scope<br/>βββ <b>BACKLOG.md</b> β active items<br/>βββ <b>ARCHIVE.md</b> β done items<br/>βββ <b>FUTURE.md</b> β deferred / out-of-scope-but-noted<br/>βββ <b>TEST.md</b> β acceptance + regression scenarios<br/><br/>At the backlog root:<br/>Β· <b>EPICS.md</b> - cross-epic rollup<br/>Β· <b>TEST_BACKLOG.md</b> - cross-epic manual-QA queue (optional)<br/>Β· <b>HUMAN_NEEDED.md</b> - items blocked on human agency"]
EPICS ==>|"items live inside each epic's BACKLOG.md"| ITEMS
ITEMS["π <b>Items - BL-XXXX format</b> (sized to the contributor: 1β2 weeks for humans Β· daily for AI)<br/><br/>Summary table at top - one line per item: <code>ID β Title β Priority β Effort β Status</code><br/><br/>Each item's detailed block has frontmatter fields:<br/>Β· <b>Pillar:</b> P3 Β· <b>Priority:</b> P0βP3 Β· <b>Effort:</b> XSβXL<br/>Β· <b>Status:</b> backlog β ready β in-progress β under-review β to-be-tested β done<br/>Β· <b>Test:</b> not-tested β pass Β· <b>Lock:</b> <holder>@<TTL-expiry><br/>+ body: goal Β· plan Β· verification step per substep<br/><br/><i>No separate ticket types - features, bug fixes, tasks, and user stories all use the same BL-XXXX shape.</i>"]
Step 0 is foundational. The brief (product, target user, market, viability, tech stack, capability layers) is your work, not the methodology's. The methodology records and operationalizes those decisions; it does not invent them. Skipping this produces a velocity illusion - shipping confidently-built wrong product. See the "Step 0" callout in How to use it for the long version.
One ticket type, used flexibly. Items can be feature-shaped, bugfix-shaped, task-shaped, or user-story-shaped (Given/When/Then), but they all use the same BL-XXXX frontmatter and live in the same BACKLOG.md. No separate Jira-style ticket-type taxonomy.
- Solo developers using AI coding agents who want process that survives past week 3.
- Small teams mixing humans and AI agents, tired of "who's working on what?" being a question.
- Indie hackers and startup founders who need real process without enterprise overhead.
- Engineering leaders fitting AI agents into existing workflows.
Not for: large enterprises with existing process frameworks. This won't replace SAFe.
Hand the methodology to your AI agent in planning mode before you write any code. By the time you start implementing, the structure is in place.
Step 0 - Have a brief. This methodology executes on goals; it doesn't define them. Before anything else, write defensible answers to: what / who / problem / success metric / competition / business viability / tech stack / 5β10 capability layers (those become your pillars). Use Lean Canvas, JTBD, Five Forces - whatever fits. The discipline of having written, defensible answers is the point, not the format. Skipping this produces a velocity illusion: shipping confidently-built wrong product.
Step 1 - Set up the repo.
mkdir my-new-project && cd my-new-project
git init -b main
git clone --depth 1 https://github.com/Korner83/ai-development-methodology.git _src
mkdir -p docs && cp -r _src/methodology docs/methodology
cp _src/templates/CLAUDE.md ./CLAUDE.md # or AGENTS.md
rm -rf _src
git add docs/methodology CLAUDE.md && git commit -m "docs: import ai-development-methodology"Step 2 - Have your AI agent produce the planning skeleton. Point it at docs/methodology/ (start with 00_README.md), share your brief from Step 0, ask it to produce: strategy master plan β 5β8 pillars β first epic charter β 3β5 backlog items. Use plan mode; review each artifact before the next. Full copy-paste prompt at templates/AGENT_KICKOFF.md.
Step 3 - Day-to-day. The project-instruction file (CLAUDE.md/AGENTS.md) loads automatically on every AI session - no pasting. Your job is to steer when the AI drifts. Four phrases worth memorizing:
- "Do you have any questions before you start?" - surfaces silent assumptions.
- "What's wrong with this plan? What's the strongest case against it?" - counters AI agreement bias.
- "Use plan mode and show me the plan before executing." - when the AI is about to wing it.
- "Stop. Split this item - you're growing scope." - mid-task creep.
For long-running autonomous milestone work, use templates/AUTONOMOUS_LOOP.md.
Each doc stands alone:
- Backlog chaotic? β 03 Epics + 04 Items.
- Items shipping half-broken? β 07 DoD + 10 Testing.
- AI agents colliding? β 05 Locks.
- Same mistakes repeating? β 08 Memory.
- Code over-engineered? β 06 Working principles.
- Strategy drifting? β 01 Strategy.
- Deploys unsafe? β 09 Git workflow.
Full adoption order in methodology/00_README.md.
The methodology is tool-agnostic. Only the project-instruction filename differs:
| Tool | Filename | Template |
|---|---|---|
| Claude Code (Anthropic) | CLAUDE.md |
templates/CLAUDE.md |
| OpenAI Codex CLI | AGENTS.md |
templates/AGENTS.md |
| Google Antigravity | AGENTS.md |
templates/AGENTS.md |
| Cursor | .cursor/rules/ or .cursorrules |
adapt AGENTS.md |
| Aider | CONVENTIONS.md |
adapt AGENTS.md |
| Continue.dev | .continue/context.md |
adapt AGENTS.md |
| Anything else | whatever .md it reads |
either |
AGENTS.md is the superset - includes plan-mode discipline, tool-install guidance, and an operational-safety rule on destructive commands that Claude Code's harness covers implicitly. Symlink CLAUDE.md β AGENTS.md if you use both.
- Projects where humans and AI agents collaborate as peers. The file-based lock + tier matrix + per-item DoD all assume contributors will arrive at AI velocity; the practices are designed for that.
- Adopters who want markdown + git as the substrate. No SaaS, no signup, no vendor lock-in, no monthly cost. Everything lives in the repo where the code lives.
- Solo maintainers + small teams. Scales down to one human + one AI agent without ceremony; scales up to a small team + multiple agents via the lock + WIP cap.
- Long-running projects where direction matters. The four-layer planning cascade (strategy β pillars β epics β items) prevents the silent drift that "let's just keep shipping features" produces over months.
- AI-assisted projects that want a defense against the "cheating agent" anti-pattern. Most testing approaches assume tests-pass = done; this one explicitly addresses the case where the same agent writes both broken code AND the broken tests that validate it.
- Projects shipping toward declared milestones. The periodic deep-eval (doc 12) catches the aggregate problems that per-item DoD can't see - compounded UX debt, cross-cutting perf regressions, security drift.
- Teams willing to write things down. Strategy docs, pillar files, epic charters, items, memory entries - everything is markdown text. The methodology rewards teams whose culture is "if it isn't written, it doesn't exist."
- Teams that want a hosted PM tool with permissions, dashboards, and a web UI. This isn't that. If you want one, use one.
- Projects where ceremony is the value. This methodology removes ceremony where it can. If your team's process culture depends on ritualized standups + sprint demos + retros, this doesn't replace those.
- Regulated-industry projects without further adaptation. The default scoring rubric in doc 12 has no Compliance area; the lock protocol has no audit trail beyond git. You can add these, but they're not built in.
- Single-shot scripts or throwaway prototypes. The overhead doesn't pay off until a project has > 1 month of work ahead of it.
- Replacing institutional knowledge that already works. If your team has implicit conventions that produce good outcomes, don't replace them with this methodology's explicit ones just because the explicit version is shinier. Layer over what works; don't bulldoze.
- Verbal-only teams. If your culture is "we discussed it in chat last week," this methodology won't fit until that culture shifts. Adoption is hard because the methodology assumes writing is the default.
The methodology commits to a few patterns that are worth naming explicitly:
- The "cheating agent" anti-pattern is named + defended. Tests pass β done. Cross-AI validation + the actual-UI fix-test loop + cross-AI diff-verification together make it hard for the implementing session to silently ship a self-validated bug.
- File-based locks with TTL - same protocol for humans and AI agents. No tier system where humans coordinate one way and agents another. Same
Lock:field, same TTL, same release discipline. (Doc 05.) - Challenge-before-consenting as a named pattern with a copy-paste prompt - defends against AI's agreement bias when the maintainer is approving a non-trivial plan. (Doc 06.)
- Four-layer planning hierarchy (strategy β pillars β epics β items) keeps work laddered to long-term direction rather than shipped as disconnected features. (Docs 01β04.)
- DoD coupled to the item frontmatter itself.
Status: donerequiresTest: pass(or narrow exceptions with body-documented reasons). "Done" is mechanically auditable, not maintainer-judgment-dependent. (Docs 04 + 07.) - Tier matrix for autonomous loops on authoritative content. Cosmetic + surgical patches are loop-eligible with cross-AI diff-verification; substantive changes stay human-authored. Compounding without sacrificing safety. (
AUTONOMOUS_LOOP.md.) - Periodic deep-eval every Nth loop. Catches aggregate quality drift (UX debt, perf regression, security drift) on a 0β10 rubric per area, with
handle / postpone / markdiscipline for unsolvable issues. (Doc 12.)
Markdown and git. CC BY 4.0 - use it anywhere (private, commercial, open-source), fork it, modify it, redistribute it, charge for derivatives, ship it inside a paid product. Only obligation is attribution.
Not endorsed by, partnered with, or affiliated with any AI tool vendor (Anthropic, OpenAI, Google, Cursor, Aider, Continue.dev). The project-instruction file each tool reads (CLAUDE.md, AGENTS.md, .cursorrules, .continue/context.md) is the vendor-supported mechanism for project context - using it is the intended path, not a workaround. The methodology's safety rules (no agent prod-deploys, no force-push, no hook bypass) align with vendor AUPs, not fight them.
Not legal advice - if you're under regulated-industry, data-residency, or classified-work constraints, confirm fit with your legal team.
If you use or adapt this, please include credit:
AI Development Methodology by MiklΓ³s PolgΓ‘r, licensed CC BY 4.0. https://github.com/Korner83/ai-development-methodology
For modified versions, indicate you've made changes. Only obligation the license imposes - use it commercially, in client work, in books, in courses, anywhere, as long as the credit travels with it.
Battle-tested in one production project. Currently v1.17.3 - see CHANGELOG.md and STATUS.md. Maintenance is lean - PRs welcome, no SLA. CC BY 4.0 means fork freely if you want a more actively-maintained version.
Direct contact: polgarmiklos@gmail.com.
CC BY 4.0 - Creative Commons Attribution 4.0 International. Copyright Β© 2026 MiklΓ³s PolgΓ‘r. Share, adapt, commercial use OK; just credit.