YourAiWorkforce — Local 8B LLM Multi-Agent Orchestration

In one line: a multi-agent planning pipeline that tames a stochastic 8B local model into behaving deterministically enough to trust — running entirely on a 16GB MacBook at zero API cost. A founder's rough idea → structured artifacts (PRD + architecture doc).

One orchestrator routes work to specialist subagents. A human approves each step. Every model call runs on a local Ollama model (qwen3:8b, deepseek-r1:8b) — no API, no cost, nothing leaves the machine.

It's built on LangGraph StateGraph, and the whole design follows one rule I kept relearning the hard way: prompt is suggestion, graph is law.

Why this project

With a commercial API (GPT-4, Claude), multi-agent systems "just work" — but that's the model doing the heavy lifting, and there's no engineering story in it. This project starts from the opposite constraint:

An 8B local model breaks instructions probabilistically. Tell the persona "don't call yourself the PM" and it does; tell it "don't call the save tool on turn 1" and it does; it leaks thinking tokens into its replies.
So instead of trusting the model, I clamp it. A separate critic model, a save-validation gate, response post-processing, state isolation — wrapping each probabilistic component in a deterministic shell is the core of this repo.

📌 Current scope: Phase 0 (idea → PRD → architecture) works end-to-end. agents/ contains persona designs through Phase 1–6 (build / QA / deploy), but only Phase 0 is wired in code — the rest is roadmap (see below).

Architecture

flowchart TD
    START([START]) --> ORC[orchestrator<br/>qwen3:8b]
    ORC -->|tool_call| ROUTE{route}
    ROUTE -->|call subagent| BRIDGE[bridge<br/>tool_call → brief]
    ROUTE -->|reset_project| GATE[approval_gate<br/>HITL approval]
    ROUTE -->|none| END([END])

    BRIDGE -->|goto| PD[product_discovery<br/>subgraph]
    BRIDGE -->|goto| SA[system_architect<br/>subgraph]

    subgraph phase0 [Phase 0 subgraphs · internal conversation loop]
        PD
        SA
    end

    PD --> REVIEW[review]
    SA --> REVIEW
    GATE --> BRIDGE
    REVIEW --> ORC

    PD -.->|save| PRD[(prd.md)]
    SA -.->|save| ARCH[(architecture.md)]

orchestrator: takes the conversation and decides which subagent to delegate to, via a tool-call. Runs qwen3:8b.
bridge: converts the orchestrator's tool-call into a brief (HumanMessage) for the subagent and routes with Command(goto=...). It uses LangGraph's subgraph-as-node mechanism directly, so a subagent's interrupt propagates to the parent automatically and the resume value flows back in automatically.
phase-0 subgraphs: product_discovery (→ PRD) and system_architect (→ architecture doc). Each is a conversational subgraph cycling through an internal model → save → check_done → wait_for_user loop.
review / approval_gate: returns to the orchestrator after reviewing an artifact; risky actions require human approval.

Key source: src/agent.py (graph assembly), src/libs/subgraph.py (conversational subgraph builder), src/subagents/planners/ (phase-0 agents).

Hard problems solved

Each item is backed by code/traces; the design write-ups live in a separate blog series, LangGraph Multi-Agent series.

1. State isolation — two directions, one solved

A subagent's state can leak two ways, and only one is worth fully closing. Outbound — the subagent's internal turns piling up in the parent thread — is solved: subagents run on a separate SubagentState, and a finalize step uses RemoveMessage to strip those internal turns, leaving the parent only a short summary. Inbound — the subagent's LLM still receiving the parent's messages — is left in on purpose, compensated by a structured briefing packet; closing it fully would have meant giving up LangGraph's native interrupt propagation (I tried, and resume broke). The original "planner introduces itself as the PM" symptom was fixed separately — model swap + persona hardening — not by isolation. → src/subagents/state.py · libs/subgraph.py:201

2. Subgraph resume restarted from scratch every time

The checkpointer wasn't passed down to the subgraph, so the user's reply vanished from messages and the conversation reset to turn 1. I injected the checkpointer consistently down to the subgraph and made FastAPI (async) and langgraph dev share the same sqlite file. → src/agent.py:170

3. `langgraph dev`'s sync-I/O block (blockbuster)

The dev middleware blocks synchronous I/O inside handlers, so the SqliteSaver connection failed. I worked around it by opening the sqlite connection at module load time rather than inside graph(), keeping it off the event loop. → src/agent.py:150

4. Making a stochastic 8B deterministic — isolating the done-check

When the planner (temp=0.5, divergent) made the check_done YES/NO call, it misjudged. I injected a separate temp=0 critic instance (same model file) and stripped thinking tokens, making the completion check deterministic. → product_discovery/init.py

5. Save-validation gate & response post-processing

_validate_prd: checks required sections exist and no placeholders remain, blocking incomplete artifacts from being saved. → product_discovery/tools.py:34
Response post-processing: strips <think> blocks, 🛑 [턴 종료] markers, empty code fences, and greeting prefixes after turn 2, all via regex. → src/libs/subgraph.py
_sanitize_query: normalizes the orchestrator's hallucinated honorifics (e.g. "대표님!") into a noun phrase. → src/agent.py:17

6. Hiding the save tool (dynamic tool binding)

Since the model ignored "don't save on turn 1", I made the save tool conditionally bound so it's physically impossible to call. → libs/subgraph.py (model_with_save)

7. Model-selection log

A decision record of how gemma4:e4b (4B) failed at following Korean negative-instruction lists — with LangSmith trace evidence — and the move to qwen3:8b. → docs/plan/model-use.md

Tech stack

Layer	Tech
Orchestration	LangGraph (`StateGraph`, subgraph-as-node, `interrupt`/`Command`)
LLM runtime	Ollama (local) — `qwen3:8b` (orchestrator/planner), `deepseek-r1:8b` (critic candidate)
Serving	FastAPI (ASGI) + `langgraph dev`
State	SqliteSaver checkpointer (async/sync file sharing)
Observability	LangSmith tracing
Packaging	uv, Docker / docker-compose

Running it

# 1. Pull local models (Ollama required)
ollama pull qwen3:8b
ollama pull deepseek-r1:8b

# 2. Environment variables
cp .env.example .env    # fill in LANGSMITH_API_KEY, MODEL_BASE_URL, etc.

# 3. Dev server
uv sync
uv run uvicorn src.main:app --reload
# or LangGraph Studio: uv run langgraph dev

# 4. (optional) containers
docker-compose up --build

Implementation status vs roadmap

This repo deliberately narrows scope to Phase 0 as a "finished product" to gain depth. Given the code-generation ceiling of an 8B local model, stretching all the way to Phase 1–6 (actual code generation) would produce an "ambitious but non-working demo".

Scope	Status
Phase 0 — Product Discovery (idea → PRD)	✅ wired in code
Phase 0 — System Architect (PRD → architecture)	✅ wired in code
Orchestrator routing · HITL approval gate · state isolation	✅ wired in code
Phase 1–6 (build/QA/security/deploy agents)	📐 persona designs only (agents/) · roadmap
Go gateway (SSE streaming BFF) · streaming web UI	🚧 planned

See it actually run → docs/samples/ holds a real, unedited PRD generated end-to-end by the product_discovery agent (with the model's rough edges left in, documented honestly).

How this was built

Pair-programmed with Claude Code. The architecture, the decisions, and the trade-offs documented here are mine; much of the implementation was AI-assisted.

License

All rights reserved. This repository is public for portfolio/demonstration purposes only — you may read the source to evaluate the work, but no license is granted to reuse, copy, modify, or redistribute it. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
agents		agents
docs		docs
src		src
templates		templates
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
langgraph.json		langgraph.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YourAiWorkforce — Local 8B LLM Multi-Agent Orchestration

Why this project

Architecture

Hard problems solved

1. State isolation — two directions, one solved

2. Subgraph resume restarted from scratch every time

3. `langgraph dev`'s sync-I/O block (blockbuster)

4. Making a stochastic 8B deterministic — isolating the done-check

5. Save-validation gate & response post-processing

6. Hiding the save tool (dynamic tool binding)

7. Model-selection log

Tech stack

Running it

Implementation status vs roadmap

How this was built

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

YourAiWorkforce — Local 8B LLM Multi-Agent Orchestration

Why this project

Architecture

Hard problems solved

1. State isolation — two directions, one solved

2. Subgraph resume restarted from scratch every time

3. langgraph dev's sync-I/O block (blockbuster)

4. Making a stochastic 8B deterministic — isolating the done-check

5. Save-validation gate & response post-processing

6. Hiding the save tool (dynamic tool binding)

7. Model-selection log

Tech stack

Running it

Implementation status vs roadmap

How this was built

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3. `langgraph dev`'s sync-I/O block (blockbuster)

Packages