In one line: a multi-agent planning pipeline that tames a stochastic 8B local model into behaving deterministically enough to trust — running entirely on a 16GB MacBook at zero API cost. A founder's rough idea → structured artifacts (PRD + architecture doc).
One orchestrator routes work to specialist subagents. A human approves each step. Every model call
runs on a local Ollama model (qwen3:8b, deepseek-r1:8b) — no API, no cost,
nothing leaves the machine.
It's built on LangGraph StateGraph, and the whole
design follows one rule I kept relearning the hard way: prompt is suggestion, graph is law.
With a commercial API (GPT-4, Claude), multi-agent systems "just work" — but that's the model doing the heavy lifting, and there's no engineering story in it. This project starts from the opposite constraint:
- An 8B local model breaks instructions probabilistically. Tell the persona "don't call yourself the PM" and it does; tell it "don't call the save tool on turn 1" and it does; it leaks thinking tokens into its replies.
- So instead of trusting the model, I clamp it. A separate critic model, a save-validation gate, response post-processing, state isolation — wrapping each probabilistic component in a deterministic shell is the core of this repo.
📌 Current scope: Phase 0 (idea → PRD → architecture) works end-to-end.
agents/contains persona designs through Phase 1–6 (build / QA / deploy), but only Phase 0 is wired in code — the rest is roadmap (see below).
flowchart TD
START([START]) --> ORC[orchestrator<br/>qwen3:8b]
ORC -->|tool_call| ROUTE{route}
ROUTE -->|call subagent| BRIDGE[bridge<br/>tool_call → brief]
ROUTE -->|reset_project| GATE[approval_gate<br/>HITL approval]
ROUTE -->|none| END([END])
BRIDGE -->|goto| PD[product_discovery<br/>subgraph]
BRIDGE -->|goto| SA[system_architect<br/>subgraph]
subgraph phase0 [Phase 0 subgraphs · internal conversation loop]
PD
SA
end
PD --> REVIEW[review]
SA --> REVIEW
GATE --> BRIDGE
REVIEW --> ORC
PD -.->|save| PRD[(prd.md)]
SA -.->|save| ARCH[(architecture.md)]
- orchestrator: takes the conversation and decides which subagent to delegate to, via a
tool-call. Runs
qwen3:8b. - bridge: converts the orchestrator's tool-call into a brief (HumanMessage) for the subagent
and routes with
Command(goto=...). It uses LangGraph's subgraph-as-node mechanism directly, so a subagent'sinterruptpropagates to the parent automatically and theresumevalue flows back in automatically. - phase-0 subgraphs:
product_discovery(→ PRD) andsystem_architect(→ architecture doc). Each is a conversational subgraph cycling through an internalmodel → save → check_done → wait_for_userloop. - review / approval_gate: returns to the orchestrator after reviewing an artifact; risky actions require human approval.
Key source: src/agent.py (graph assembly), src/libs/subgraph.py (conversational subgraph builder), src/subagents/planners/ (phase-0 agents).
Each item is backed by code/traces; the design write-ups live in a separate blog series, LangGraph Multi-Agent series.
A subagent's state can leak two ways, and only one is worth fully closing. Outbound — the
subagent's internal turns piling up in the parent thread — is solved: subagents run on a separate
SubagentState, and a finalize step uses RemoveMessage to strip those internal turns, leaving
the parent only a short summary. Inbound — the subagent's LLM still receiving the parent's
messages — is left in on purpose, compensated by a structured briefing packet; closing it fully
would have meant giving up LangGraph's native interrupt propagation (I tried, and resume broke).
The original "planner introduces itself as the PM" symptom was fixed separately — model swap +
persona hardening — not by isolation.
→ src/subagents/state.py · libs/subgraph.py:201
The checkpointer wasn't passed down to the subgraph, so the user's reply vanished from messages
and the conversation reset to turn 1. I injected the checkpointer consistently down to the subgraph
and made FastAPI (async) and langgraph dev share the same sqlite file.
→ src/agent.py:170
The dev middleware blocks synchronous I/O inside handlers, so the SqliteSaver connection failed.
I worked around it by opening the sqlite connection at module load time rather than inside
graph(), keeping it off the event loop. → src/agent.py:150
When the planner (temp=0.5, divergent) made the check_done YES/NO call, it misjudged. I injected
a separate temp=0 critic instance (same model file) and stripped thinking tokens, making the
completion check deterministic.
→ product_discovery/init.py
_validate_prd: checks required sections exist and no placeholders remain, blocking incomplete artifacts from being saved. → product_discovery/tools.py:34- Response post-processing: strips
<think>blocks,🛑 [턴 종료]markers, empty code fences, and greeting prefixes after turn 2, all via regex. → src/libs/subgraph.py _sanitize_query: normalizes the orchestrator's hallucinated honorifics (e.g. "대표님!") into a noun phrase. → src/agent.py:17
Since the model ignored "don't save on turn 1", I made the save tool conditionally bound so it's
physically impossible to call. → libs/subgraph.py (model_with_save)
A decision record of how gemma4:e4b (4B) failed at following Korean negative-instruction lists —
with LangSmith trace evidence — and the move to qwen3:8b. → docs/plan/model-use.md
| Layer | Tech |
|---|---|
| Orchestration | LangGraph (StateGraph, subgraph-as-node, interrupt/Command) |
| LLM runtime | Ollama (local) — qwen3:8b (orchestrator/planner), deepseek-r1:8b (critic candidate) |
| Serving | FastAPI (ASGI) + langgraph dev |
| State | SqliteSaver checkpointer (async/sync file sharing) |
| Observability | LangSmith tracing |
| Packaging | uv, Docker / docker-compose |
# 1. Pull local models (Ollama required)
ollama pull qwen3:8b
ollama pull deepseek-r1:8b
# 2. Environment variables
cp .env.example .env # fill in LANGSMITH_API_KEY, MODEL_BASE_URL, etc.
# 3. Dev server
uv sync
uv run uvicorn src.main:app --reload
# or LangGraph Studio: uv run langgraph dev
# 4. (optional) containers
docker-compose up --buildThis repo deliberately narrows scope to Phase 0 as a "finished product" to gain depth. Given the code-generation ceiling of an 8B local model, stretching all the way to Phase 1–6 (actual code generation) would produce an "ambitious but non-working demo".
| Scope | Status |
|---|---|
| Phase 0 — Product Discovery (idea → PRD) | ✅ wired in code |
| Phase 0 — System Architect (PRD → architecture) | ✅ wired in code |
| Orchestrator routing · HITL approval gate · state isolation | ✅ wired in code |
| Phase 1–6 (build/QA/security/deploy agents) | 📐 persona designs only (agents/) · roadmap |
| Go gateway (SSE streaming BFF) · streaming web UI | 🚧 planned |
See it actually run → docs/samples/ holds a real, unedited PRD generated
end-to-end by the product_discovery agent (with the model's rough edges left in, documented honestly).
Pair-programmed with Claude Code. The architecture, the decisions, and the trade-offs documented here are mine; much of the implementation was AI-assisted.
All rights reserved. This repository is public for portfolio/demonstration purposes only — you may read the source to evaluate the work, but no license is granted to reuse, copy, modify, or redistribute it. See LICENSE.