Skip to content
Dan Ratner edited this page Jan 8, 2026 · 5 revisions

Make Things With Maestro

Maestro is a tool for building production quality apps with AI-based speed and efficiency. More precisely, it's a highly-opinionated multi-agent orchestration tool for app development that emulates the workflow of high-functioning human development teams using lots of AI agents.

With Maestro, you describe what you want in an interview chat (or upload a markdown file with a description.) Maestro turns that into requirements and then an engineering plan, runs parallel implementation work in isolated containers, enforces reviews and tests, and keeps a durable record of what happened so the next change doesn’t start from zero.

If you’ve ever used an AI coding assistant and thought:

  • “This is fast… but it’s drifting.”
  • “It’s rewriting code that was already fine.”
  • “It made a big change without checking tests.”
  • “It lost the thread halfway through and started improvising.”
  • "It keeps repeating the same mistake..."
  • "How do I get my agents to coordinate without trampling each other?"
  • "How do I ensure consistency with a big crew of agents?"
  • "How do I stop the agent cutting corners and skipping steps?"

…you already understand some of the problems Maestro is trying to solve. The trick is not getting code out of an LLM. The trick is getting reliable software development out of many LLMs, over and over, on a real codebase, without turning you into the full-time supervisor.

That’s what Maestro is for.


The big idea: structure AI like a high-performing dev team

Maestro is built around a simple (and very practical) observation:

LLMs are trained on human work. They tend to reproduce human work patterns—good and bad. If you want consistent results, you’ll get farther by structuring the work like the best engineering teams do, not by asking one “super developer” agent to do everything.

The “lone wolf” failure mode

Most CLI-style agent tools feel like pairing with a single very fast developer. That can be great for quick prototypes. But for production work, the “lone wolf” pattern has predictable failure modes:

  • No separation of duties. The same agent that writes code also “reviews” it. Human teams don’t do this for a reason.
  • Inconsistent process. It might run tests today and skip them tomorrow. It might follow the project’s patterns in one file and ignore them in the next.
  • Context drift. A single agent carries requirements, design, implementation details, and review concerns in one rolling conversation. As context grows, earlier constraints turn into vibes.
  • Coordination problems. Even when an agent is capable, it can’t reliably coordinate parallel work without a workflow spine.

Maestro’s answer is: don’t fight those dynamics with more prompting. Change the shape of the work.

Maestro emulates the parts of a real team that matter

Your team

Maestro splits responsibility across roles:

  • A PM (Product Manager) to gather and refine requirements with you.
  • An Architect to turn requirements into stories, coordinate work, answer questions, and perform reviews/merges.
  • Multiple Coders that implement stories in parallel, each in its own isolated environment.
  • One or more dedicated Hotfix coders for urgent “bypass the queue” changes.

This is not a metaphor layer or a cute naming scheme. It’s a workflow boundary. The boundaries do real work.

And importantly: you can always add coders. Maestro can run multiple coders in parallel (default is 3 and it has been tested with 10 agents.) Idle agents use virtually no resources so there's no hard limit - you can spin up a thousand if you want to. The practical limit on the number of agents is derived from (a) how easy the requirements are to break into stories that can be developed in parallel and (b) the load on the singleton Architect which is reviewing work. For most workloads, 3-10 coders is probably a good balance.


Let AI be AI, and let software be software

Another core idea behind Maestro is that AI and software are good at different things:

  • AI is great at reasoning, translating intent into code, writing text, exploring options, and filling in implementation details.
  • Software is great at state, rigor, repeatability, orchestration, and enforcing constraints.

Many agent tools are effectively “a UI into the LLM.” Maestro is not. Maestro is an orchestrator: it uses deterministic workflow, state machines, persistence, containers, and a dispatcher to turn probabilistic model output into repeatable engineering work.

This is why Maestro can confidently say things like:

  • A coder must submit a plan and get it approved (unless the work is an express hotfix).
  • Work happens in containers and is confined there.
  • Reviews happen through explicit approval types, including plan, code review, completion, and even budget review.
  • If external services go down, agents can suspend and later resume without losing state.
  • Automated tests and linting must pass before code review.
  • If Maestro restarts, it resumes because work is persisted in SQLite.

Those are software guarantees, not prompt wishes.


What Maestro is (and what it’s not)

Maestro is

  • A single-user tool that gives you “a whole team” worth of parallelism and discipline.
  • A web UI-first experience. You run Maestro, open the dashboard, and work through the PM chat and system panels not the CLI.
  • An orchestrator that turns your requirements into stories, dispatches them, and enforces a workflow: spec → stories → planning → coding → review → merge.
  • A system that tracks and displays the things you actually care about: stories, tool use, messages, token/cost/time, test results, and more.
  • A system with built-in “institutional memory” via a documentation/knowledge graph, so the team doesn’t relearn the same rules every story.

Maestro is not

  • A toy for rapid prompt-driven prototyping.
  • A “watch me type code” assistant that expects you to guide every keystroke.
  • A replacement for engineering judgment. Maestro reduces repetitive overhead and catches a lot of problems early, but you still own what ships.

Who should use Maestro?

You should try Maestro if…

  • You want to build software with AI without accepting “it’ll be messy.”
  • You like the speed of AI agents but want outcomes that feel more like a real engineering process: plans, tests, reviews, merges, documentation.
  • You want parallel implementation across multiple work streams, with coordination and dependency management.
  • You want something you can run locally today, but that is conceptually aligned with “run unattended / run in the cloud.”

You might not want Maestro if…

  • You want hands-on, interactive prototyping and you enjoy steering and reviewing in real time.
  • You want maximum configurability over workflow. Maestro is intentionally opinionated in the service of reliability.

The Web UI: one place to run the whole system

Maestro is meant to be used from the dashboard (not a CLI chat loop). The UI is where you:

  • Start a PM interview or upload requirements
  • Preview what will be submitted for development
  • Launch demos for user acceptance testing
  • Watch agents move through states
  • Inspect logs and metrics
  • Cancel/restart runs when needed

Maestro dashboard


The simple picture (with the important missing parts)

Here’s the high-level flow, with the pieces that matter in practice: requirements, dispatcher, hotfix, and the documentation database (lightweight RAG).

┌───────────────────────────────────────────────────────────────────┐
│                              You                                  │
│                   (requirements + feedback)                        │
└───────────────────────────────┬───────────────────────────────────┘
                                │
                                ▼
┌───────────────────────────────────────────────────────────────────┐
│                           PM (Web UI)                              │
│  - Interview or upload requirements                                │
│  - Produces a requirements doc you can preview                      │
│  - Submits to Architect for approval                                │
└───────────────────────────────┬───────────────────────────────────┘
                                │ Submit requirements
                                ▼
┌───────────────────────────────────────────────────────────────────┐
│                      Architect (singleton)                         │
│  - Reviews requirements                                             │
│  - Generates stories + dependencies                                 │
│  - Dispatches work, answers questions, reviews code, merges          │
│                                                                     │
│  Uses documentation database / knowledge graph:                      │
│   - .maestro/knowledge.dot (source-of-truth graph)                   │
│   - SQLite FTS index + “knowledge packs”                             │
└───────────────┬───────────────────────┬────────────────────────────┘
                │                       │
                │                       │
                ▼                       ▼
     ┌───────────────────┐    ┌───────────────────────────────────┐
     │     Dispatcher     │    │   Hotfix lane (bypass dispatcher) │
     │ (stories + deps)   │    │  dedicated hotfix-001 coder       │
     └─────────┬──────────┘    └───────────────────────────────────┘
               │
      Story ready? assign to coder
               │
               ▼
┌───────────────────────────────────────────────────────────────────┐
│                           Coders (N)                               │
│  - Plan → Coding → Testing → Request review → PR → Merge            │
│  - Each coder runs in its own Docker container and git clone         │
│  - Coders terminate after finishing (low idle overhead)              │
└───────────────────────────────────────────────────────────────────┘

A few details worth calling out:

  • The Architect is a singleton by design (coordination + review).
  • Stories are assigned via a dispatcher (not just a queue) as dependencies unlock and coders become available.
  • Hotfix requests can bypass the normal development flow and route to a dedicated hotfix-001 coder.
  • The documentation database isn’t a bolt-on. It’s an explicit part of how the system maintains consistency.

The cast: what each role does (in more detail)

Agent status

PM (Product Manager): requirements in, clarity out

The PM is the user-facing interface. It gathers requirements through an interview, generates a requirements/spec document, lets you preview it, then submits it to the Architect for review.

Requirements

A key detail: the PM has an explicit state machine, including a preview step so you can see what’s about to be submitted.

In practice, this is where Maestro starts to feel different than “chat with a model”:

  • The PM can ask clarification questions.
  • It can read your existing codebase when helpful.
  • It can stop and wait for you.
  • It can generate a structured requirements document and route it through a real approval workflow.

Architect: the coordination and quality spine

The Architect is the singleton that makes Maestro behave like an actual team instead of a pile of concurrent code generators.

Stories

Architect responsibilities include:

  • Reviewing PM submissions (spec review)
  • Generating stories + dependencies
  • Dispatching ready work to coders
  • Handling coder questions
  • Running code review cycles and merging approved work

The Architect explicitly processes multiple request types, including spec review, plan approval, iterative code review, completion approval, and budget review.

That “budget review” detail matters: it’s one of the knobs for managing Architect load. You can reduce Architect interrupts (and increase parallel coder throughput) by changing how often coders need budget increases and intermediate reviews—without abandoning the core discipline of separation of duties.

Coders: parallel implementation with guardrails

Coders implement stories. They run in Docker containers, with their own git clone, and work on one story at a time. Maestro supports multiple coders in parallel (default 3), and coders terminate completely after finishing.

That design has two benefits:

  1. Parallelism when you have independent stories.
  2. Low idle overhead because coders don’t sit around burning containers when there’s no work.

Hotfix: urgent work without breaking the team flow

Hotfix mode exists because real engineering teams have urgent work. Maestro models that.

Hotfix mode:

  • Routes urgent work to a dedicated hotfix-001 coder
  • Skips the planning phase for simple fixes
  • Lets the main development dispatcher keep moving

This is deliberately analogous to “live site” vs “feature team” rotation in real orgs.


The unit of work: stories, dependencies, and dispatch

Maestro doesn’t ask coders to “build the feature.” It asks them to implement stories.

A story is a discrete chunk of work with:

  • A clear task description
  • Acceptance criteria
  • (Often) dependencies on other stories
  • A lifecycle from assignment → plan → code → tests → review → merge

The Architect generates stories from requirements and loads them into the system, then transitions into dispatching so stories can be assigned as coders become available.

Why stories matter (beyond project management)

Stories solve a few AI-specific problems:

  • They bound context. A coder can focus on one coherent unit instead of an entire product vision.
  • They make reviews meaningful. The Architect can compare code against explicit acceptance criteria.
  • They enable parallelism. Dependencies make it safe to run multiple coders without stepping on each other.

The dispatcher: concurrency without chaos

The Architect includes a DISPATCHING phase that loads stories, checks dependencies, and assigns ready stories to coder agents.

This is where Maestro differs sharply from “run N agents and hope.” In Maestro:

  • Dependencies constrain what can run.
  • Story completion happens after successful merge, not just “LGTM.”
  • Merge conflicts route back to coders for resolution.

That last point sounds mundane, but it’s the difference between a demo and an engineering system.


Reviews are the feature: plans, code review, completion, merge

Maestro’s philosophy is that correctness and maintainability beat raw output volume. That philosophy shows up as explicit approval types and review flow.

The Architect’s request handling includes:

  • ApprovalTypePlan: coder plan approval
  • ApprovalTypeCode: iterative code review
  • ApprovalTypeCompletion: story completion approval
  • ApprovalTypeBudgetReview: budget increase request
  • ApprovalTypeSpec: PM requirements review + story generation

Why plan approval exists

Plan approval is a cheap way to catch expensive mistakes early:

  • Wrong approach
  • Misunderstood requirement
  • Missed dependency
  • “This will touch the wrong subsystem”

If you’ve ever watched an agent confidently implement the wrong thing for 20 minutes, you already know why this matters.

Why merge is owned by the Architect

Maestro explicitly keeps the Architect from writing code and makes it responsible for merges. The docs call this out plainly: the Architect exists to enforce discipline and prevent “coders reviewing their own work.”

That separation is one of the strongest reliability multipliers in the whole system.


Heterogeneous models: use the right brain for the job

One of Maestro’s most important (and most pragmatic) ideas is: don’t use the same model for everything.

Even a great model has consistent blind spots. If you use the same model to implement, reason, and review, you often get the same mistake three times.

Maestro is designed so you can:

  • Use a long-context, high-reasoning model (and higher temperature) for the Architect (planning, coordination, review).
  • Use a faster, coding-optimized model (and lower temperature) for Coders (implementation).
  • Use different providers entirely, if you want.

This can be both a quality improvement and a cost optimization, because you’re not paying “reasoning model prices” for the act of writing boilerplate or plumbing code.

Maestro explicitly supports configuration for different models in different modes (including airplane mode with Ollama), e.g. separate coder_model, architect_model, and pm_model.

And Maestro supports “bring your own agent” as well: you can embed Claude Code as a coder implementation while Maestro handles orchestration and signaling.


Containers and isolation: trust comes from blast radius control

Coders always run in Docker containers; this is non-optional.

Isolation gives you:

  • Reproducibility
  • Controlled environments
  • A straightforward recovery path when something goes sideways (“rebuild container” is not a disaster)

Maestro also takes security posture seriously for a local single-user tool: agents run as a non-privileged user with hardening like read-only root filesystem and no-new-privileges.


Documentation as institutional memory: the knowledge graph + lightweight RAG

Maestro includes a knowledge graph system that captures architectural patterns, design decisions, and conventions. It lives in .maestro/knowledge.dot and is used to build “knowledge packs” for coders during planning.

Concretely:

  • The graph is stored in your repo as DOT.
  • Maestro indexes it (SQLite FTS).
  • When a coder starts a story, Maestro extracts key terms and retrieves a focused pack of related patterns.
  • The Architect reviews whether the implementation follows the patterns and validates updates.

This is one of the main ways Maestro avoids the “every story is day one” problem.

And it’s not fragile: if knowledge retrieval fails, Maestro logs and continues (knowledge is helpful, not required).


Modes: development, demo, hotfix, maintenance, airplane

Maestro isn’t just “code generation.” Real software work includes validation, maintenance, urgent fixes, and sometimes offline constraints. Maestro models those explicitly.

Development mode

The canonical loop is:

  1. PM gathers requirements and generates a detailed spec
  2. Architect breaks it into stories
  3. Stories are dispatched to coders
  4. Coders plan, code, test
  5. Architect reviews PRs, requests changes if needed
  6. Approved PRs merge

Demo mode (UAT without leaving the dashboard)

Demo mode

Demo mode runs your application so you can interact with it. It builds inside the dev container, starts the app, provides a URL, shows logs, and detects when the demo is outdated. (It also includes Docker Compose support for required services like databases.)

There are explicit controls (start/stop/restart/rebuild), and the docs even call out common pitfalls like binding to 0.0.0.0 instead of 127.0.0.1.

Hotfix mode

Hotfix runs on a dedicated coder and does not block the main queue.

Maintenance mode

Maintenance runs between specs to manage technical debt: deleting merged branches, cleaning stale artifacts, syncing knowledge, scanning TODOs, checking doc links, etc.

Airplane mode (offline)

Airplane mode replaces GitHub and external LLM APIs with local equivalents: Gitea + Ollama, and provides a --sync path back to GitHub.

This is also a nice proof that Maestro is built around workflow infrastructure, not around one specific model API.


What happens when things go wrong? (Because they will.)

Maestro assumes real-world failure: API timeouts, flaky networks, tool errors, restarts.

Two examples of “software doing software things”:

1) Suspend and resume on service outages

Agents can enter a SUSPEND state when retries are exhausted due to external service unavailability (LLM APIs, network failures). In suspend:

  • State is preserved (no data loss)
  • The orchestrator polls APIs and restores agents when everything is healthy
  • There’s a timeout path to ERROR for a full recycle

2) Persistence and crash recovery

If Maestro crashes, stories, states, tool use, messages, and progress are persisted in SQLite and resume on restart.

That’s the difference between “agent session” and “engineering system.”


Language agnostic by design (and ready for “packs” later)

Maestro’s core loop is language-agnostic because it primarily relies on:

  • Dockerized execution environments
  • A predictable set of build/test/lint/run targets (commonly driven via a Makefile)
  • Repository-level workflows (branches, PRs, merges)

Even in airplane mode, the key requirements are Docker and Ollama; the workflow spine stays the same.

Over time, Maestro can add language/platform “packs” that include optimizations (templates, conventions, toolchains, best practices). But the base system doesn’t need to be rewritten for every stack.


Getting started

The quickstart is intentionally boring:

  1. Install Maestro (Homebrew or release)
  2. Export your model keys (and GitHub token)
  3. Run maestro
  4. Open the web UI and start an interview or upload requirements

From there, Maestro guides you through the workflow.


Why it’s worth trying

Maestro isn’t trying to be the flashiest AI demo. It’s trying to be the tool you can come back to tomorrow, next week, and next month—on the same repo—without everything falling apart.

If you’re the kind of builder who wants:

  • speed and engineering discipline
  • parallel work and coordination
  • autonomy and safety
  • AI productivity without turning you into a full-time reviewer

…then Maestro is designed for you.


Appendix A: a few concrete examples of “team structure beats lone wolf”

These aren’t hypotheticals; they’re the kinds of situations Maestro’s workflow is built to handle.

  1. Two stories touch the same area of code. Without coordination: merge conflicts, duplicated work, drift. With Maestro: dependency graph + dispatch, plus Architect-controlled merges.

  2. A coder is unsure how a pattern works. Without workflow: it guesses, and you discover it later. With Maestro: coder asks Architect; Architect answers with repo/knowledge context.

  3. An urgent production issue appears mid-flight. Without structure: you interrupt everything. With Maestro: hotfix lane + dedicated coder + Architect validation.

  4. A project has conventions that must stay consistent. Without memory: each story re-learns them. With Maestro: knowledge परिसistency via .maestro/knowledge.dot + packs.


Appendix B: a few best practices

Some choices that will make Maestro work better (and other AI agents, too):

  1. Use a compiled language when possible. Whether it's go, c++, rust, or java, compilers take a lot of grunt work off the model. This makes development, debugging, and refactors cheaper and easier. Who wants to deal with syntax errors at runtime?

  2. Use heterogeneous models from multiple providers. Different models have different training and optimization. Empirically, having a truly distinct model do code review catches 90% more errors than just using the same one with a different prompt.

Clone this wiki locally