A structured guide for getting high-fidelity code output from LLMs. Designed to eliminate the most common failure modes: wrong architecture, wrong aesthetics, wrong scope.
Do not read this document end-to-end for every project.
- Start with Section 1: Project Routing — it takes 2 minutes and tells you exactly which sections apply.
- Always read Section 2: Universal Pre-Production — it applies to everything. This now includes AI tool configuration, documentation persistence, and version control cadence.
- Read your project-type section (Section 3) — only the one that matches.
- If your project has a web UI, read Section 4: UI/UX Design Process — depth varies by how design-critical your project is.
- If Project Routing flagged your project as Medium or Complex, read Section 5: Scope Estimation & Staging.
- If your project is Complex, read Section 6: PRD Checklist.
- Read Section 7: Reference Material Guide — applies to everything, but depth varies by project.
- Read Section 8: Prompting Session Tactics — this is your in-session playbook.
- Bookmark Section 9: Course Correction — read it when things go wrong mid-session.
- Bookmark Section 10: Anti-Patterns — skim before your first prompt, reference when debugging failures.
Answer these questions before anything else. They determine your path through this document.
| If your project is primarily... | Read Section 3 subsection... |
|---|---|
| A web app, dashboard, landing page, SaaS product | 3A: Web Apps & SaaS |
| A game (any engine, any platform) | 3B: Games |
| A visual/immersive experience (VR, AR, installations, generative art) | 3C: Visual & Immersive |
| A CLI tool, automation script, or system utility | 3D: CLI Tools & Automation |
| A data pipeline, API, or backend service | 3E: Data Pipelines & Backend |
| A Python application or script (ML, data science, desktop app, tooling) | 3F: Python Applications & Scripts |
| A portfolio, marketing site, blog, or design-focused static site | 3G: Website Design & Static Sites |
If your project spans multiple categories, read all relevant subsections and note where requirements conflict — those conflicts are where LLMs fail hardest.
Use this rough heuristic:
Simple (single prompt, < 300 lines of output expected):
- One file or a handful of files
- No persistent state or database
- No authentication
- No multi-step user flows
- Examples: a utility script, a single React component, a landing page, a single-page portfolio, a CLI tool
Medium (3-8 staged prompts, 300-2000 lines):
- Multiple files/modules with dependencies between them
- Some state management
- 2-5 distinct user flows or features
- One integration (API, database, auth)
- Examples: a CRUD app, a simple game, a multi-page marketing site with blog, a multi-page site with forms, a data pipeline with multiple stages
Complex (requires PRD, 10+ staged prompts or iterative sessions):
- Multiple interacting systems
- Complex state management (multiplayer, real-time, undo/redo)
- 5+ distinct user flows
- Multiple integrations
- Performance-sensitive
- Examples: a multiplayer game, a SaaS product, an agency site with CMS, animations, and multi-language, a VR experience with physics, a distributed backend
Write down your complexity level. You'll reference it throughout this document.
| Complexity | Required Sections |
|---|---|
| Simple | 2, 3 (your type), 4 (if web UI), 7, 8 |
| Medium | 2, 3 (your type), 4 (if web UI), 5, 7, 8, 9 |
| Complex | 2, 3 (your type), 4 (if web UI), 5, 6, 7, 8, 9 |
Every project, every time. No exceptions.
This section covers what you must define before writing a single prompt, regardless of project type or complexity.
Before you think about tech stack or features, write one sentence:
"When this is done, a user will be able to ___."
If you can't fill that blank in one sentence, your scope is too vague. The LLM will fill the ambiguity with generic assumptions, and you'll get generic output.
For yourself: Write this sentence. For the LLM: This sentence goes at the top of your first prompt.
Declare every technology explicitly. LLMs default to what's statistically most common in their training data — which may not be what you want.
Checklist:
- Language and version — "Python 3.12", not "Python". "TypeScript 5.4", not "TypeScript".
- Framework and version — "Next.js 16 App Router", not "React". "Godot 4.5 GDScript", not "Godot".
- Key libraries — List them. If you have preferences (e.g., "use Zustand not Redux", "use Tailwind not CSS modules"), state them.
- Runtime/environment — "Node 22", "Browser only", "Raspberry Pi 5 ARM64", "Quest 3S".
- Database — If any. Include whether it's local (SQLite) or remote (PostgreSQL on Supabase).
- Package manager — npm, pnpm, yarn, pip, uv, cargo. State it.
- Build system — Vite, Webpack, esbuild, pyproject.toml. State it if non-default.
These are the things the LLM must not violate. They go in every prompt (or in a system prompt/project context if your tool supports it).
- Platform constraints — Mobile-only? Desktop-only? Must work offline? Specific OS?
- Performance constraints — "Must render at 72fps on Quest 3S", "Must handle 10k concurrent connections", "Must process 1M rows in < 30 seconds".
- Size constraints — "Bundle must be under 200KB", "Docker image under 500MB".
- Compatibility constraints — "Must work in Safari 16+", "Must support Python 3.10+".
- Legal/licensing constraints — "No GPL dependencies", "Must use MIT-compatible libraries only".
- Security constraints — "No eval()", "All user input sanitized", "CORS restricted to specific origins".
If you're building on top of existing code, the LLM needs to know:
- Existing file structure — Paste your directory tree. Use
tree -L 3 --dirsfirstor equivalent. - Key existing files — Paste the content of files the LLM will need to interface with. Don't paste everything — paste the interfaces, types, schemas, and entry points.
- Conventions already in use — Naming conventions, file organization patterns, error handling patterns. If your existing code uses a specific pattern, state it or the LLM will introduce a conflicting one.
- State of the codebase — "This is a fresh project", "This is a working prototype with tech debt", "This is production code, be conservative".
How will you know the output is correct? Define this before prompting.
- Functional requirements — What specific behaviors must work?
- A test case or walkthrough — "When I click X, Y should happen, and Z should update." Even informal, this gives the LLM a concrete target.
- What "good enough" looks like — If you're prototyping, say so. "Working prototype, rough edges acceptable, no error handling needed yet." This prevents the LLM from gold-plating.
- What failure looks like — "If the state desyncs, the whole thing is broken." This focuses the LLM on critical correctness paths.
You've defined your stack, constraints, and what exists. Now lock the structural decisions. These are the 20% of choices that shape everything else — once prompting begins, they don't change without a full restart (Section 9.2).
- Database schema and relationships — Tables, fields, and how entities relate. The LLM builds queries and models around this; changing it mid-project cascades everywhere.
- Authentication approach — Provider (Supabase Auth, NextAuth, Firebase, custom JWT) and model (roles, permissions, protected routes). Don't let the LLM pick this implicitly.
- API style and URL structure — REST with /api/resource, GraphQL, tRPC, RPC. Pick one. If the LLM mixes styles, integration becomes unpredictable.
- State management pattern — Where state lives (URL, local component, global store, server) and how it flows. This is the #1 source of architectural regret in web apps.
- Styling approach — Tailwind, CSS modules, styled-components, plain CSS. Mixing approaches produces unmaintainable output.
- File and folder structure — Where components, routes, utilities, and types live. Paste your intended structure or let the first prompt's skeleton establish it — then enforce it.
Put these locked decisions in your rules file (Section 2.7) and first prompt. If you let the AI decide these, it will make reasonable but hard-to-reverse choices that may not match your preferences.
Most AI coding tools support persistent context that's automatically included with every prompt. Setting this up is the highest-leverage step you can take — it eliminates re-pasting constraints, stack, and conventions every time.
If your tool supports project-level rules files (CLAUDE.md, .cursorrules, .cursor/rules/, AGENTS.md, .github/copilot-instructions.md, or equivalent), create one at the root of your project.
- Tech stack declaration — Paste your Section 2.2 tech stack verbatim. This is the single most impactful rule.
- Coding conventions — One rule per line. "Use pathlib, not os.path." "All components in src/components/, named PascalCase." "Errors return {ok: false, error: string}, never throw."
- Constraints — Paste your Section 2.3 constraints as rules, not suggestions.
- Negative rules — What the AI must NOT do. "Do not add dependencies without asking." "Do not refactor existing code when adding features." "Do not use default exports." These prevent the most common LLM drift.
- Keep it under 50 lines — Rules files that exceed the LLM's attention limits get partially ignored. If you need more detail, use hierarchical rules (directory-level files) or separate reference documents if your tool supports them.
What goes where:
| Content | Rules File | Prompt | PRD |
|---|---|---|---|
| Tech stack, versions | ✓ | First prompt only | ✓ |
| Coding conventions | ✓ | No | No |
| Non-negotiable constraints | ✓ | Repeat critical ones | ✓ |
| Feature requirements | No | ✓ (one per prompt) | ✓ |
| Design tokens | ✓ (or separate file) | Reference, don't repeat | Optional |
| Architecture decisions | ✓ (brief) | When relevant | ✓ (full) |
For Medium and Complex projects, maintain a set of persistent context documents that you update as the project evolves. This prevents context drift across sessions — the AI starts each session with accurate knowledge of what exists and what's been decided.
One structure that works (adapt to your workflow):
- project-brief.md — One-paragraph summary + core output statement + tech stack. The file you paste at the start of every new session.
- architecture.md — System structure, data model, component boundaries. Updated when architecture decisions change. Paste relevant sections into prompts that touch structure.
- progress.md — What's done, what's in progress, what's next. Updated after each session. This is your handoff document between sessions.
- decisions.md — Architecture and design decisions with brief rationale. "Chose Zustand over Redux: single-store is sufficient, less boilerplate." Prevents the AI from revisiting settled questions.
Version control is your undo button for vibe coding. Without it, a bad prompt can destroy working code with no way back.
- Commit after every verified prompt output. If the next prompt breaks things, you can revert to the last working state. This is the single most important safety practice.
- Branch before experimental prompts. If you're trying a risky approach or major refactor, create a branch first. If it fails, switch back.
- Write descriptive commit messages. "Add user registration form with email validation" not "update". When you need to revert, you need to know what each commit contains.
- Never merge raw AI output to your main branch without review. AI-generated code is a draft. Read it, test it, then commit it.
Read only the subsection that matches your project type from Section 1.
- User flows — Map every distinct flow the user goes through. "Sign up → verify email → onboarding wizard → dashboard" is a flow. Each flow is a discrete unit the LLM should build and test independently.
- Page/route inventory — List every page/route. For each, note: what data it needs, what actions are possible, what other pages it links to.
- State management architecture — Decide this upfront. Where does state live (URL params, React state, global store, server)? What state is shared between components? What state persists across sessions?
- Authentication model — If any. Provider (Supabase Auth, NextAuth, custom)? Roles/permissions? Protected routes?
- API surface — If you have a backend: list endpoints, methods, request/response shapes. Even rough shapes. Without this, the LLM invents an API surface that won't match your backend.
- Responsive behavior — "Desktop only", "Mobile-first", "Responsive with these breakpoints". If you don't specify, you'll get inconsistent responsive behavior across components.
- Design system/tokens — Colors, typography, spacing, border radius. If you have a Figma file or existing theme, provide the token values directly. If not, at minimum declare: primary color, background color, font family, whether dark/light/both.
- Reference sites — Provide 2-3 URLs of sites whose visual style you want. Be specific: "I want the card layout from X, the color palette from Y, the typography from Z."
- Component library — If using one (shadcn/ui, Radix, MUI), state it. If not, state "custom components, no library."
- Animation expectations — "No animations", "Subtle transitions on navigation", "Heavy motion design with scroll-triggered animations". LLMs either add zero animation or go overboard — there's no middle ground without guidance.
- Content density — "Dense dashboard with lots of data", "Spacious marketing site with large whitespace". This single descriptor changes layout decisions dramatically.
- Engine and version — Explicit. "Godot 4.5 GDScript", "Unreal Engine 5.7 Blueprints + C++", "browser-based with Phaser 3".
- Core game loop — Describe the 10-second loop. What does the player do repeatedly? "Steer projectile → hit targets → score points → projectile resets." This is the single most important thing to get right; everything else is decoration.
- Input scheme — Keyboard? Gamepad? Touch? Mouse? VR controllers? Gaze-based? List every input and what it does.
- Entity inventory — List every game object: player, enemies, projectiles, pickups, obstacles, UI elements. For each: what properties it has, how it behaves, what it interacts with.
- Physics/collision model — What collides with what? What happens on collision? Is physics simulated or faked? Rigidbody or kinematic?
- Game state machine — Map the states: Main Menu → Playing → Paused → Game Over → Score Screen. What transitions between them? What persists across states?
- Difficulty/progression — How does the game get harder? What parameters change over time?
- Score/reward system — What earns points? What's the feedback loop? What gets saved?
- Art style — "Pixel art 16x16", "Low-poly 3D", "Vector flat design", "Hand-drawn sketch". Provide reference images.
- Audio expectations — "No audio", "Placeholder beeps", "Specify sound events but don't implement". This prevents the LLM from ignoring audio entirely or spending tokens on audio code you'll replace.
- Resolution/viewport — "1920x1080 fixed", "Responsive to browser window", "Quest 3S native resolution". Include target framerate.
- Platform and rendering target — "Quest 3S standalone (mobile GPU)", "Desktop WebGL", "LED wall 4K output", "Projection mapping". This determines every technical decision downstream.
- Interaction model — "Passive viewing", "Gaze-based interaction", "Hand tracking", "Controller-based", "Touchscreen kiosk". Define every interaction point.
- Scene inventory — List every scene/environment. For each: what's in it, what the user can do, how they transition to the next.
- Shader/material requirements — If you need custom shaders, describe the visual effect in detail. "Translucent material that reveals geometry behind it on gaze" is actionable. "Cool shader" is not.
- Performance budget — Draw call limit, triangle count, texture memory. Immersive experiences on mobile GPUs fail on performance first.
- Deployment model — "Kiosk mode, auto-restart on crash", "WebXR served from local network", "Sideloaded APK". This affects architecture (error recovery, loading, state persistence).
- Mood board or reference images — 3-5 images minimum. Describe what you're taking from each: "The color palette from this image", "The spatial density from this scene", "The material quality from this render."
- Lighting model — "Baked lighting only", "Single directional light + ambient", "Full dynamic lighting". On mobile GPUs, this is a performance-defining decision.
- Post-processing — "None (mobile)", "Bloom + color grading only", "Full post-processing stack". List specific effects.
- Interface specification — List every command, flag, and argument. For each: name, type, default value, description. Example:
command: convert args: input_file (required, path), output_file (optional, path, default: stdout) flags: --format (string, default: "json"), --verbose (bool, default: false) - Input/output formats — What goes in, what comes out. JSON, CSV, plain text, binary? If structured, provide a sample of the actual data.
- Error behavior — What happens on bad input? Missing files? Network failures? "Exit with code 1 and a human-readable error message" vs. "Retry 3 times with exponential backoff."
- Environment assumptions — What OS? What's installed? What permissions are needed? "Runs on Ubuntu 22.04 with Docker available" vs. "Must work on Windows, Mac, and Linux with no dependencies beyond Python 3.10+."
- Idempotency — Can it be run twice safely? This matters for automation scripts that might get re-triggered.
- Data flow diagram — Even ASCII art. "Source A → Transform X → Store in DB → Expose via API → Consumer Z". Every hop in the pipeline is a potential failure point the LLM needs to know about.
- Schema definitions — Actual schemas. Table definitions, JSON schemas, protobuf definitions. Paste them. Don't describe them in prose.
- Volume and throughput — "100 records per day" and "1M records per hour" require completely different architectures. State your numbers.
- Failure and retry semantics — "At-least-once delivery", "Exactly-once processing", "Best-effort, drop on failure". This changes everything.
- Observability requirements — "Structured JSON logging to stdout", "Prometheus metrics on /metrics", "No observability needed, this is a script."
- Authentication and authorization — How do services authenticate to each other? How do external consumers authenticate?
- Deployment target — "Docker Compose on a single VPS", "Kubernetes", "AWS Lambda", "Runs as a systemd service". This affects how you structure config, secrets, and lifecycle.
- Script vs. application — Is this a run-once script, a long-running process, a library/package, or a full application with UI? This determines structure.
- Dependency management — pip + requirements.txt, Poetry, uv, conda? If you have existing dependency files, paste them.
- Type hinting expectations — "Full type hints, strict mypy compatible", "Type hints on public interfaces only", "No type hints, this is a quick script."
- Entry point and invocation — "Run as
python main.py", "Installable CLI via pyproject.toml entry_points", "Imported as a library", "Runs as a FastAPI server." - Data handling — If processing data: input format, output format, sample data. If using pandas/numpy/polars, state which. LLMs default to pandas even when polars or pure Python would be better.
- Concurrency model — "Single-threaded is fine", "Use asyncio", "Use multiprocessing for CPU-bound work", "Use threading for I/O-bound work". If unspecified, LLMs write synchronous code even when async would be dramatically better.
- Configuration — "Hardcoded constants are fine", "Environment variables", "Config file (TOML/YAML)", "CLI arguments via argparse/click/typer".
- Logging — "Print statements are fine", "Use stdlib logging with level=INFO", "Use structlog with JSON output".
- Testing expectations — "No tests needed", "pytest with basic coverage of core functions", "Full test suite with fixtures and mocks."
os.path instead of pathlib, bare except: clauses, not using context managers for file/resource handling, mutable default arguments. If you care about code quality, state "follow modern Python idioms, use pathlib, type hints, context managers."
For portfolios, marketing sites, blogs, agency sites, documentation sites, and any project where visual design quality is the primary deliverable — not app functionality.
- Content inventory — List every page and its sections. For each section: what content goes there (headline, body copy, images, testimonials, CTAs), and whether that content is final or placeholder. LLMs generate lorem ipsum layouts that don't survive real content.
- Site map & navigation structure — Page hierarchy, header nav items, footer nav, mobile nav pattern (hamburger, bottom tabs, slide-out). How deep does the hierarchy go? Is there a search?
- Visual hierarchy per page — What's above the fold? What's the primary CTA? What order do sections appear? What draws the eye first? Describe this per page, not just globally.
- Content-to-layout ratio — "Image-heavy portfolio with minimal text", "Long-form text-heavy blog", "Mixed media marketing site with video heroes". This single descriptor changes layout architecture.
- CMS / static generation — Static HTML, Astro, Hugo, Eleventy, WordPress, headless CMS (Sanity, Contentful, Strapi)? If static: how will content be updated? If CMS: who updates it and how technical are they?
- SEO requirements — Meta titles and descriptions per page, Open Graph images, structured data (JSON-LD), XML sitemap, canonical URLs, robots.txt. If you don't specify, the LLM will skip all of it.
- Asset pipeline — Image optimization strategy (WebP/AVIF with fallbacks?), font loading (self-hosted vs Google Fonts vs Adobe Fonts), SVG icon system (inline, sprite, or icon font), favicon set.
- Performance budget — Core Web Vitals targets: LCP under 2.5s, CLS under 0.1, INP under 200ms. Total page weight budget. These constraints prevent the LLM from producing a visually impressive but slow-loading site.
- Typography system — Heading scale (h1 through h6 sizes), body text size, font pairings (heading font + body font), line height per context (headings: 1.1-1.3, body: 1.5-1.75), maximum line length (45-75 characters).
- Color palette — Primary, secondary, accent, neutrals (at least 5 shades from near-white to near-black), and semantic colors (success, warning, error, info). Provide exact hex or HSL values. "Blue and white" produces a different site every time you prompt.
- Spacing system — Base unit and scale. Example: 4px base → 4, 8, 12, 16, 24, 32, 48, 64, 96. Every margin, padding, and gap should come from this scale.
- Layout grid — Column count (12-column is standard), gutter width, max content width (e.g., 1200px), breakpoints with specific pixel values.
- Image & media treatment — Aspect ratios for hero images, thumbnails, and cards. Border radius, shadow treatment, hover effects. Whether images are full-bleed, contained, or masked.
- Micro-interactions — Hover states for links, buttons, and cards. Scroll-triggered animations (fade-in, parallax, sticky headers). Page transitions. Specify what moves, how fast (in ms), and what easing function. "Subtle animations" means nothing — "fade-in on scroll, 400ms ease-out, triggered at 20% viewport intersection" is actionable.
Read this if your project has any web-based user interface. This section covers the design system decisions that determine whether your output looks designed or default. Depth varies: for a CLI tool, skip this entirely. For a marketing site or SaaS dashboard, this is critical.
- Font family selection — Name the exact fonts. "Inter for body, Fraunces for headings" not "a modern sans-serif". If using Google Fonts, state it. If self-hosting, state it. The loading strategy affects performance.
- Type scale — Pick a ratio and generate your scale. Common ratios: Minor Third (1.200), Major Third (1.250), Perfect Fourth (1.333). Example at 1.250 with 16px base: 16, 20, 25, 31, 39, 49px. Every heading and text size should come from the scale.
- Heading weights — Which weight for each heading level? h1 at 700, h2 at 600, h3 at 500? Or all the same weight with size doing the differentiation?
- Body line-height — 1.5 to 1.75 for body text. 1.1 to 1.3 for headings. State both.
- Maximum line length — 45-75 characters for body text. This means setting
max-widthon text containers, not just hoping the layout handles it.
- Define as tokens, not prose — Provide colors as a structured list of CSS custom properties or design tokens.
--color-primary-500: #2563EBis unambiguous. "A nice blue" is not. - Full palette with shades — For each color: at minimum a light, base, and dark variant. For neutrals: a full scale from near-white to near-black (50, 100, 200, ... 900, 950).
- Light/dark mode — If supporting both: define token values for each mode. If only one: state which and don't let the LLM add a theme toggle you didn't ask for.
- Contrast ratios — WCAG AA minimum: 4.5:1 for normal text, 3:1 for large text. Verify your color pairings meet this. LLMs regularly produce low-contrast text, especially for secondary/muted colors.
- Semantic mapping — Success (green), warning (amber/yellow), error (red), info (blue). These appear in forms, alerts, badges, and status indicators. Define the exact hues.
- Base spacing unit — Pick a base (4px or 8px are standard) and derive all spacing from it. Every margin, padding, and gap should be a multiple of this base.
- Spacing scale — Define the steps: 4, 8, 12, 16, 24, 32, 48, 64, 96, 128. Provide this as a reference. No value should exist outside this scale.
- Grid system — Column count, gutter width, and max content width. Example: "12-column grid, 24px gutters, 1200px max-width, centered."
- Section spacing rhythm — How much vertical space between major page sections? Between components within a section? Define both. Example: "80px between sections, 32px between components within a section."
- Component internal padding — Cards, buttons, inputs, modals — each needs consistent internal spacing. Define it once: "Cards: 24px padding. Buttons: 12px 24px. Inputs: 12px 16px."
- Breakpoints — Define exact pixel values: mobile (< 640px), tablet (640-1024px), desktop (> 1024px). Or whatever your project needs. Don't rely on the LLM's defaults.
- What changes at each breakpoint — For each breakpoint: what layout shifts occur? Does the nav collapse to a hamburger? Do grid columns reduce? Do font sizes decrease? Specify per breakpoint.
- Mobile-first vs desktop-first — State your approach. "Write mobile styles first, add complexity at larger breakpoints" or "Design for desktop, adapt down." This changes how media queries are written.
- Touch target sizes — Minimum 44x44px for interactive elements on touch devices. If you don't specify, LLMs produce tiny links and buttons that are frustrating on mobile.
- Border radius — Pick one value (or a small set: small 4px, medium 8px, large 16px, full/pill) and use it everywhere. LLMs mix 4px, 6px, 8px, 12px randomly.
- Shadow system — Define elevation levels: subtle (cards at rest), medium (dropdowns, popovers), heavy (modals). Provide the actual
box-shadowvalues. - Icon style — Outline or filled? Consistent stroke width? What size (16px, 20px, 24px)? Which icon set (Lucide, Heroicons, Phosphor)? Mixing icon styles is immediately noticeable.
- Button hierarchy — Primary (filled), secondary (outlined), ghost/text. What colors, padding, and border radius for each? What are the hover, active, disabled, and loading states?
- Form element styling — Inputs, selects, checkboxes, radio buttons, textareas. These must look like they belong to the same design system. Define border color, focus ring color, error state, and placeholder color.
- Color contrast — All text meets WCAG AA (4.5:1 normal, 3:1 large). Test your palette pairings before prompting, not after.
- Focus indicators — Visible focus rings on all interactive elements. Specify the style: "2px solid blue outline with 2px offset" not "default browser focus."
- Alt text strategy — Decorative images get
alt="". Informative images get descriptive alt text. State this so the LLM doesn't leavealtattributes empty or fill them with filenames. - Heading hierarchy — One
h1per page. Headings don't skip levels (no h2 → h4). This is semantic structure, not just visual styling. - Landmark regions —
<header>,<nav>,<main>,<footer>,<aside>. Every page should use these. LLMs often produce<div>soup instead. - Keyboard navigation — All interactive elements reachable via Tab. Logical tab order. Escape closes modals. Enter/Space activates buttons. Test this explicitly.
alt attributes, use <div> with onClick instead of <button>, and break heading hierarchy for visual reasons. If accessibility matters, state it as a constraint in every prompt, not just the first one.
Read this if your project is Medium or Complex.
Use this formula as a rough guide:
Distinct features or modules × 1.5 = approximate prompt count
The 1.5 multiplier accounts for: integration between modules, bug fixes, and refinement passes.
If your number exceeds 15, you almost certainly need a PRD (Section 6) and should consider whether this is a project better built with an IDE + LLM-assist rather than pure vibe coding.
Break your project into prompts using these principles:
Each prompt should produce a unit that can be tested independently. If you can't verify the output of a prompt in isolation, the scope is wrong.
Dependencies flow forward, never backward. Prompt 3 can depend on the output of Prompt 2, but if Prompt 3 requires changing what Prompt 2 produced, your decomposition is wrong.
The first prompt is always the skeleton. It should produce:
- Project structure (files, folders)
- Core data types / schemas / interfaces
- Entry point that runs (even if it does nothing useful)
- The single most critical feature — the one everything else hangs off
Subsequent prompts add features one at a time. Each prompt:
- References what already exists (paste the current state or relevant files)
- Adds exactly one feature or module
- Specifies how it integrates with existing code
- Defines how to verify it works
Use vertical slices, not horizontal layers. Each prompt should cut through the full stack for one feature — database schema, server logic, and UI — rather than building all database tables first, then all API routes, then all UI pages. Vertical slices are independently testable and surface integration problems immediately instead of at the end.
Wrong (horizontal layers):
Prompt 1: All database tables
Prompt 2: All API routes
Prompt 3: All UI pages
Prompt 4: Wire everything together (this is where it breaks)
Right (vertical slices):
Prompt 1: Skeleton + task creation (DB table + API endpoint + form UI)
Prompt 2: Task listing (query + API endpoint + list UI)
Prompt 3: Task editing (update logic + API endpoint + edit UI)
Each prompt produces something you can run and test end-to-end.
For each prompt in your sequence, fill this out before prompting:
Prompt N: [name]
Depends on: [which previous prompts]
Adds: [what new capability]
Modifies: [which existing files, if any]
Verify by: [how you'll test this works]
Risk: [what's most likely to go wrong]
This takes 5 minutes and saves hours of debugging tangled prompt chains.
Restart the session/conversation if:
- The LLM's architecture is fundamentally wrong (wrong state management pattern, wrong data model)
- You're on prompt 5+ and every prompt requires fixing the previous prompt's output
- The LLM starts "forgetting" earlier context and introducing contradictions
Continue the session if:
- Issues are localized (a single function is wrong, a style is off, an edge case is missing)
- The architecture is sound but details need refinement
- You're adding new features to a working base
Read this if your project is Complex (Section 1.2).
A PRD for vibe coding is not the same as a corporate PRD. It's a structured context document that gives the LLM (and you) a complete picture of what's being built. It lives alongside your prompt — you paste relevant sections into each prompt.
Build your PRD with these sections. Each section has a specific purpose for the LLM.
A dense paragraph covering: what it is, who it's for, what problem it solves, and what "done" looks like. This goes at the top of every prompt.
Only include if your app has meaningfully different user types. For each persona: what they do, what they need, what they don't care about. Keep each to 2-3 sentences. Skip this for tools, scripts, and single-user projects.
Explicitly list what you are NOT building. This prevents scope creep in both your thinking and the LLM's output.
- "No mobile app — web only, but responsive."
- "No real-time collaboration — single-user editing only."
- "No payment processing in v1."
- "No admin panel — manage data directly in the database for now."
The LLM will suggest or start building features adjacent to what you described. Non-goals give it a hard boundary.
A flat list of every feature. For each:
- Name — short label
- Priority — Must-have, Should-have, Could-have (nice-to-have), Won't-have (explicitly out of scope this version)
- Complexity — Simple, Medium, Complex
- Dependencies — which other features must exist first
- Acceptance criteria — 1-3 concrete statements of what "working" means. For complex interactions, use structured format: "Given [state], When [action], Then [result]." Example: "Given a logged-in user, When they click Delete on a task, Then the task is removed from the list and a confirmation toast appears."
This list becomes your prompt decomposition plan (Section 5).
Every entity, its fields, its relationships. Use actual schema notation your tech stack uses (Prisma schema, SQLAlchemy models, TypeScript interfaces, GDScript class definitions). Not prose. Not ER diagrams described in words.
Paste this into any prompt that touches data.
A brief description of how the system is structured:
- What runs where (client, server, edge, device)
- How components communicate (REST, WebSocket, signals, direct function calls)
- What state lives where
Include a simple ASCII diagram if the architecture has more than 2 components.
For every endpoint or interface between components:
- Method and path (or signal name, or function signature)
- Request shape with types
- Response shape with types
- Error cases
Performance, security, accessibility, scalability requirements that apply globally. These go in every prompt as constraints.
Things you haven't decided yet. Write them down. When prompting, either resolve them before the relevant prompt or explicitly tell the LLM to make a decision and document it: "I haven't decided X yet. Pick the simpler option and note what you chose."
- Too long — If your PRD is over 3000 words, the LLM won't attend to all of it equally. Front-load the most important information.
- Prose-heavy — Schemas, contracts, and inventories should be structured data, not paragraphs. The LLM parses structured formats more reliably.
- Aspirational features — Don't include features you "might" want. Every feature in the PRD will influence architecture decisions. Include only what you're building now.
- No priorities — Without priority labels, the LLM treats everything as equally important and may gold-plate a Nice-to-have while half-implementing a Must-have.
- No non-goals — Without explicit non-goals, the LLM builds adjacent features you didn't ask for. It sees "task management app" and adds notifications, sharing, and analytics because those are common in its training data.
Read this for every project. How you provide references determines whether the LLM builds what you imagined or what it hallucinated.
- Best: Screenshot or image file provided directly to the LLM (if it supports vision). Annotate the screenshot: "I want the layout from this, not the colors."
- Good: URL of a live site + specific description of what to take from it. "Look at stripe.com/docs — I want the same sidebar navigation pattern, the same code block styling, but with my color palette."
- Acceptable: Detailed verbal description with named reference points. "Card-based grid layout like Notion's gallery view. Cards have a thumbnail on top, title below, 2 lines of metadata, subtle drop shadow on hover."
- Bad: "Make it look modern and clean." This produces generic output.
- Best: Paste the actual code you want to match or extend. If it's an API you're integrating with, paste the docs or response shape.
- Good: Link to a specific GitHub file or documentation page + what to take from it.
- Acceptable: Name a well-known pattern: "Use the repository pattern for data access", "Structure it like a Next.js App Router project."
- Bad: "Use best practices." The LLM's "best practices" are an average of everything in its training data.
- Best: Provide a concrete walkthrough: "User clicks the card → card expands to full screen with a 300ms spring animation → content fades in after expansion completes → clicking outside collapses it."
- Good: Name a reference interaction: "Expand behavior should feel like iOS App Store's Today tab card expansion."
- Acceptable: Describe the feel: "Snappy, no perceptible lag, transitions under 200ms, no layout shift."
- Bad: "Make it feel smooth." This means nothing concrete.
- Simple project: 1-2 references for overall direction are enough.
- Medium project: 2-3 references for aesthetic + 1-2 for architecture/patterns.
- Complex project: Compile a reference document with categorized references (visual, behavioral, architectural). Paste relevant subsets into each prompt.
If your references conflict (e.g., site A's layout but site B's spacing, and they're incompatible), you must resolve the conflict before prompting. The LLM will not ask you to resolve it — it'll silently pick one or produce a broken hybrid.
Your in-session playbook.
Your first prompt should follow this structure. Reorder sections by importance for your project, but include all of them.
[1-2 sentence summary of what you're building]
Tech stack: [from Section 2.2]
Constraints: [from Section 2.3]
What exists already: [from Section 2.4, or "Fresh project, no existing code"]
[Your project-type-specific details from Section 3]
[Design tokens from Section 4, if applicable]
[References from Section 7]
Build: [The specific deliverable for this prompt — the skeleton or first feature]
When done, I should be able to: [verification step from Section 2.5]
After the first prompt, every subsequent prompt should:
[Paste or reference the current state of the project — file tree, key files, or
"the code from your previous response"]
Current status: [what works, what doesn't]
Next: [exactly one feature or change]
This should: [concrete behavior description]
Don't change: [list anything that's currently working that you don't want touched]
[Any new references relevant to this specific addition]
The "Don't change" line is critical. Without it, LLMs refactor working code to accommodate new features, introducing regressions.
LLMs have finite context windows. As your project grows across prompts:
What to always include:
- Project summary (1-2 sentences, never omit)
- Tech stack and constraints (paste verbatim every time)
- File tree of current state
- The specific files being modified (paste full content)
What to include selectively:
- Files adjacent to what's being modified (interfaces, types, shared utilities)
- Relevant sections of the PRD
What to omit:
- Files unrelated to the current prompt
- Previous conversation history (the LLM doesn't need to re-read your discussion about the previous feature)
- Completed, stable modules that aren't being touched
Session management:
- Start fresh sessions for new features. Don't carry a 20-message conversation into a new feature area. Start a new session, paste the current project state, and describe the next task from scratch.
- Restart after 2 failed corrections. If the LLM fails to fix something twice, the context is polluted with failed approaches. Start a fresh session with only the current code and a clear problem statement — not the conversation history.
- Separate investigation from implementation. If you need to explore an approach, debug a complex issue, or evaluate options, do it in a throwaway session. Bring only the conclusion back to your main session.
- Clear context between unrelated tasks. If your tool supports it (e.g.,
/clearin Claude Code), clear the conversation when switching to an unrelated task within the same project.
If the LLM asks a clarifying question, that's fine — answer it. But if the LLM makes a choice without asking and you don't like it:
Don't say: "That's wrong, fix it." Say: "Change X to Y. Specifically: [describe the exact change with concrete details]. Keep everything else the same."
Vague corrections produce vague fixes. The LLM will change something, but not necessarily the right thing.
Ways to reduce token waste and improve output quality:
- Front-load critical constraints. The LLM attends most strongly to the beginning and end of your prompt. Put non-negotiable constraints at the top.
- Use structured formats for structured information. Paste schemas as code, not as prose descriptions of schemas.
- Don't repeat yourself across sections. If your tech stack is declared at the top, don't mention it again inline.
- Ask for specific outputs. "Output only the modified files" is better than letting the LLM decide what to include.
- Specify what you don't want. "No comments in the code", "Don't include a README", "Don't add error handling yet — I'll request that separately."
After every prompt's output, before the next prompt:
- Run it. Does it start? Does it crash?
- Test the new feature. Does the specific thing this prompt was supposed to add actually work?
- Regression check. Do the previously working features still work?
- Note exact issues. Not "it's broken" — write down what you clicked, what happened, and what should have happened instead.
Feed these exact observations into the next prompt. "When I click Submit with an empty name field, the app crashes with TypeError: Cannot read property 'length' of undefined at line 42 of Form.jsx" gives the LLM everything it needs. "The form doesn't work" gives it nothing.
Include verification criteria in the prompt itself. Don't just tell the AI what to build — tell it how to verify its own work:
Build: User registration with email validation.
Verify your output by confirming:
- Submitting an empty email shows "Email required"
- Submitting "notanemail" shows "Invalid email format"
- Submitting a valid email creates a user and redirects to /dashboard
- Submitting a duplicate email shows "Account already exists"
This gives the AI a self-check loop. It catches obvious errors before you have to. For tools that can run code (Claude Code, Cursor Agent, Replit), the AI can actually execute these checks.
For complex features where you have a rough idea but haven't fully specified the behavior, have the AI interview you before it builds anything:
I want to add [feature]. Before writing any code, interview me about this
feature. Ask me questions about the expected behavior, edge cases, error
handling, and UI requirements. Ask one question at a time. When you have
enough information, summarize the agreed requirements and wait for my
confirmation before implementing.
This produces better specifications than writing them alone — the AI asks questions you wouldn't think to answer. After the interview, confirm the summary, then either proceed in the same session or start a fresh session with the requirements as your prompt. The second approach keeps the implementation context clean.
Bookmark this section. Read it when things go wrong.
When the output isn't what you wanted, identify which category the problem falls into:
| Problem | Symptom | Fix Strategy |
|---|---|---|
| Wrong architecture | The fundamental structure is wrong. State management is a mess, data flows incorrectly, components are split wrong. | Restart (9.2) |
| Wrong feature behavior | The architecture is fine but a feature doesn't behave correctly. | Targeted correction (9.3) |
| Wrong aesthetics | Functionality works but it looks/feels wrong. | Aesthetic patch (9.4) |
| Accumulating bugs | Each fix introduces new bugs, nothing is stable. | Stabilization pass (9.5) |
| LLM confusion | The LLM contradicts itself, forgets context, or produces nonsensical output. | Context reset (9.6) |
When architecture is fundamentally wrong:
- Do not try to fix it incrementally. An LLM working within a broken architecture will produce increasingly broken code trying to work around foundational issues.
- Identify what went wrong. Usually: ambiguous state management spec, missing data model, or underspecified component boundaries.
- Rewrite your first prompt with the missing specification added.
- Start a new session/conversation. The old conversation's context is contaminated with bad patterns the LLM will repeat.
- Salvage what worked. If individual components from the failed attempt are good, paste them into the new session as "existing code to integrate."
When a specific feature is wrong:
The [feature name] doesn't work correctly.
Current behavior: [exactly what happens]
Expected behavior: [exactly what should happen]
Steps to reproduce: [click/action sequence]
The relevant code is in [file name]:
[paste the relevant section]
Fix only this behavior. Don't restructure or refactor anything else.
Key phrases that prevent LLM over-correction:
- "Fix only this behavior."
- "Don't restructure or refactor anything else."
- "Keep the existing function signatures."
- "Don't change any file except [X]."
When it works but looks wrong:
The functionality is correct — don't change any behavior or logic.
Visual issues:
1. [Specific issue: "The card padding is too tight, increase to 24px"]
2. [Specific issue: "The header font should be 600 weight, currently looks like 400"]
3. [Specific issue: "Add 8px gap between the icon and label"]
Only modify styles/CSS/classes. Don't touch component structure or logic.
Be pixel-specific. "Too much padding" is ambiguous. "Reduce padding from 32px to 16px" is concrete.
When bugs are accumulating:
- Stop adding features.
- List every known bug with reproduction steps.
- Prompt the LLM to fix bugs only:
Do not add any new features or make any improvements.
Fix these bugs only:
1. [Bug with reproduction steps]
2. [Bug with reproduction steps]
3. [Bug with reproduction steps]
After each fix, explain what you changed and why. If fixing one bug would
require changing the approach to another, flag it instead of doing it.
- Verify all bugs are fixed before resuming feature work.
When the LLM is confused, contradicting itself, or producing nonsensical output:
- Start a new conversation/session.
- Paste the current state of your project (file tree + key files), not the conversation history.
- Describe the project fresh as if the LLM has never seen it.
- State what's working and what the next task is.
This is often necessary after 10-15 exchanges in a single conversation. LLM attention degrades over long conversations.
Things that reliably break LLM code generation. Skim before your first prompt, reference when debugging failures.
| Anti-Pattern | Why It Fails | What To Do Instead |
|---|---|---|
| "Build me a full app" in one prompt | LLM produces shallow implementation of everything, deep implementation of nothing. Output is too large to verify. | Decompose per Section 5. One feature per prompt. |
| Describing behavior in prose instead of steps | "The dashboard should be interactive" gives zero actionable information. | "Clicking a chart bar filters the table below to show only rows from that category." |
| Not specifying error handling | LLM ignores errors entirely or adds generic try/catch that swallows everything. | Specify per-operation: "If the API call fails, show a toast with the error message and keep the form data intact." |
| "Make it better" / "Improve this" | The LLM doesn't know your quality bar. It will make arbitrary changes. | Specify exactly what to change: "Increase contrast ratio of body text to 7:1. Add loading skeleton to the card grid. Fix the jittery scroll on the sidebar." |
| Pasting entire codebase | Floods the context window. LLM attends weakly to most of it and may produce changes to files you didn't want touched. | Paste only files being modified + their interfaces. |
| Asking for multiple unrelated changes | LLM gets confused about scope, makes partial changes to each. | One change per prompt. Batch only if changes are to the same file and closely related. |
| "Use best practices" / "Make it production-ready" | Means something different in every context. LLM picks an arbitrary subset. | Name specific practices: "Add input validation with zod schemas", "Use connection pooling with a pool size of 10", "Add structured logging with correlation IDs." |
| Not declaring what exists | LLM creates duplicate utilities, conflicting type definitions, or reimplements what's already there. | Always paste file tree + relevant interfaces at the top of each prompt. |
| Skipping security review | LLM produces code that works but has injection vulnerabilities, exposed secrets, or missing auth checks. You won't notice until production. | After features stabilize, do a dedicated security pass: "Review this code as a security engineer. Check for: injection, auth bypass, exposed secrets, CORS misconfiguration, and unsafe defaults. List issues only, don't refactor." |
These are patterns LLMs reliably produce that you should watch for and correct:
- Silent state mutations. LLMs create functions that modify objects in place without making it obvious. This causes bugs that are hard to trace. Watch for objects and arrays being modified instead of copied.
- Implicit type coercion. Particularly in JavaScript/TypeScript. LLMs write loose comparisons, concatenate strings and numbers, or pass wrong types through function boundaries.
- Orphaned event listeners and subscriptions. LLMs add event listeners in setup/mount but forget cleanup in teardown/unmount. This causes memory leaks and ghost behavior.
- Hardcoded magic values. LLMs embed constants directly in logic instead of extracting named constants. This makes tuning impossible.
- Shallow error handling.
catch(e) { console.log(e) }everywhere. The error is logged but nothing recovers, retries, or informs the user. - God components/functions. LLMs write monolithic components that handle state, rendering, and side effects all in one. This makes targeted modification in later prompts nearly impossible.
- Incorrect async patterns. LLMs produce race conditions, unhandled promise rejections, and missing
awaitkeywords. If your project uses async code, verify every async boundary. - Overengineering on simple projects. For a small script, the LLM may generate a full class hierarchy with abstract base classes when a flat function would suffice. Match complexity to project scope in your prompt.
- Dependency sprawl. LLMs add dependencies you didn't ask for. Review every import. If you see a library you didn't specify, question whether it's necessary.
- Opaque code you can't debug. The LLM produces a working function but you can't explain what it does. If you can't explain it, you can't debug it when it breaks in production. Ask the AI to explain the logic. If the explanation doesn't make sense, ask for a simpler implementation you can follow.
- Not reading the output. Skipping ahead to the next prompt without verifying the current output means errors compound. Always run and test before prompting again.
- Sunk cost continuation. Continuing to patch a broken architecture because you've already invested 8 prompts. See Section 9.2 — restart is often faster.
- Assuming the LLM remembers. Even within a single conversation, after 10+ exchanges, the LLM's effective attention on early context is degraded. Re-state critical constraints.
- Anthropomorphizing errors. "The LLM doesn't understand my vision" — it never did. It's pattern-matching on your words. If the output is wrong, your words were ambiguous. Revise the specification, not your expectations of the LLM.
Cut this out and keep it next to your prompt window.
BEFORE FIRST PROMPT:
□ One-sentence "done" statement
□ Tech stack with versions
□ Constraints and non-negotiables
□ Architecture decisions locked (DB, auth, API style, state mgmt)
□ Rules file created if tool supports it (Section 2.7)
□ What already exists (or "fresh project")
□ Project-type-specific prep (Section 3)
□ Design tokens defined if web UI (Section 4):
□ Typography: fonts, type scale, line heights
□ Colors: full palette as hex/HSL tokens
□ Spacing: base unit + scale
□ Layout: grid, breakpoints, max-width
□ Consistency: border radius, shadows, buttons
□ Accessibility: contrast ratios, focus styles
□ References gathered (visual, code, behavioral)
□ Complexity assessed, prompts staged if Medium/Complex
□ PRD written if Complex (include Non-Goals)
EVERY PROMPT:
□ Project summary (1-2 sentences)
□ Tech stack + constraints
□ Design tokens (if web UI)
□ Current file tree
□ Files being modified (full content)
□ Exactly one task (vertical slice, not horizontal layer)
□ Verification criteria (how AI can self-check)
□ "Don't change" list
AFTER EVERY PROMPT:
□ Run it
□ Test new feature
□ Regression check
□ Note exact issues for next prompt
□ Commit if verified
BETWEEN SESSIONS:
□ Update progress docs (progress.md, decisions.md)
□ Start fresh session — paste current state, not history
□ Review rules file — prune what's outdated, add what's missing
WHEN THINGS GO WRONG:
□ Identify: architecture, behavior, aesthetic, or accumulation?
□ Architecture wrong → restart with better spec
□ Behavior wrong → targeted fix with reproduction steps
□ Aesthetic wrong → visual-only patch with pixel values
□ Bugs accumulating → freeze features, fix all bugs first
□ LLM confused → new session, paste current state fresh
□ 2 failed corrections → stop, fresh session, restate problem
□ After stabilizing → security review pass