One prompt. One app.
Describe what you want to build. A swarm of AI agents plans, codes, tests, and ships it.
A few examples of experimental projects built:
Spec tree — structured requirements with milestone progress tracking
- Multi-agent orchestration — 14 specialized roles with isolated workspaces and automated merge resolution
- Spec-driven pipeline — structured workflow from specification through bootstrap to development, with checkpoints gating each transition
- Workspace isolation — each agent works on an isolated copy; a tiered merge pipeline (diff3, tree-sitter validation, LLM-assisted resolution) handles conflicts
- Continuous auditing — spec-vs-code conformance checks, code quality reviews, infrastructure issue detection, and spec gap analysis run alongside development
- Testing pipeline — test scripts, test hooks for browser-based visual testing, environment management, and structured test reports
- Visual testing — agents launch, screenshot, and analyze the applications they build via Gemini vision
- Innovation system — agents can propose ideas and improvements that feed back into the spec tree
- Real-time dashboard — monitor agent activity and file writes, chat with the team leader, track merges, review specs/plans/tasks, and inspect agent communications
- Desktop app — native macOS, Linux, and Windows builds via Tauri
- This project is a token burner! It comes with a large protocol layer that spends tons of tokens to keep things structured. Experimentation was done with a $50/month Minimax 2.5 Max coding plan, which is (most of the time) enough to keep the engine running 24/7.
- Don't try this with a per-million-token API key — your bank account won't forgive you.
┌─────────────────────────────────────────────────┐
│ Dashboard (Next.js) │
│ Real-time UI / WebSocket / REST │
└──────────────────────┬──────────────────────────┘
│
┌──────────────────────┴──────────────────────────┐
│ Backend (Fastify) │
│ │
│ ┌────────────┐ ┌───────────┐ ┌────────────┐ │
│ │ Coordinator│ │ Agents │ │ Services │ │
│ │ (lifecycle)│ │ (LLM loops│ │ (browser, │ │
│ │ │ │ + tools) │ │ files, │ │
│ └────────────┘ └───────────┘ │ embeddings│ │
│ └────────────┘ │
│ ┌────────────────────────────────────────────┐ │
│ │ SQLite + Drizzle ORM │ │
│ │ (projects, agents, specs, plans, tasks) │ │
│ └────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘
| Package | Description |
|---|---|
backend |
Fastify API server, orchestration engine, agent runtime, Drizzle/SQLite storage |
dashboard |
Next.js management UI with real-time monitoring |
web-portal |
Public-facing landing page |
tauri |
Desktop app wrapper (macOS, Linux, Windows) |
shared |
Shared TypeScript types across packages |
| Role | Purpose |
|---|---|
| Team Leader | Coordinates agents, dispatches work, communicates with the human |
| Spec Manager | Breaks down requirements into a structured spec tree |
| Planner | Creates implementation plans from specs |
| Tech Lead | Reviews code, manages merges from executors to main |
| Executor | Implements code in an isolated workspace |
| Tester | Runs and writes tests, validates implementations |
| Code Tester | Focused on code-level test coverage and test execution |
| Code Quality | Reviews code for standards, patterns, and issues |
| Spec Auditor | Validates that implementations match their specs |
| DevOps | Manages environments, builds, and deployments |
| Release Manager | Handles release preparation and packaging |
| Innovation Manager | Proposes improvements and new approaches |
| Test Hooks Manager | Manages browser test hooks for visual testing |
| Test Scripts Manager | Manages and maintains test script suites |
- Node.js >= 20
- pnpm >= 10
- An LLM API key (OpenAI, Anthropic, OpenRouter, or compatible)
git clone https://github.com/BenjaminPiette/craftswarm.git
cd craftswarm
pnpm installBefore starting the engine, open the dashboard settings and configure your API keys:
| Provider | Required | Purpose |
|---|---|---|
| LLM (OpenAI-compatible) | Yes | Agent reasoning — base URL, API key, and model (works with OpenAI, Anthropic, OpenRouter, or any compatible API) |
| Embedding | Yes | Semantic search for code, specs, and expertise skills |
| Gemini | Optional | Vision — agents use it to analyze screenshots of what they build |
| Gemini Search | Optional | Grounded web search for agents that need external information |
The dashboard validates each provider on save and shows connection status.
# Browser mode (backend + dashboard + web portal)
pnpm web:dev
# Desktop app (Tauri)
pnpm tauri:devOpen the dashboard, create a project, write a spec, and start the engine. The agents take it from there.
| Layer | Technology |
|---|---|
| Frontend | Next.js 16, React 19, Tailwind CSS 4, Zustand, React Query |
| Backend | Fastify 5, TypeScript 5.9 |
| Database | SQLite (better-sqlite3), Drizzle ORM |
| Vector search | sqlite-vec (embeddings) |
| LLM providers | OpenAI, Anthropic Claude, Google Gemini, OpenRouter |
| Browser automation | Playwright |
| Desktop | Tauri |
| Monorepo | pnpm workspaces, Turborepo |
| Testing | Vitest |
pnpm turbo typecheck # type-check all packages
pnpm build # build all packages
pnpm lint # lint all packages
pnpm test # run testsDatabase migrations are managed with Drizzle Kit:
cd backend
pnpm db:generate # generate migration from schema changes
pnpm db:studio # interactive database browserMigrations run automatically on server startup.
CraftSwarm is an experimental project that was actively developed for about 3 weeks before slowing down, as the path to a fully functional, production-ready app is still long.
- Browser testing / feedback loop — visual testing works but the cycle of launch, screenshot, analyze, fix can still get stuck or loop
- App deployment — agents build and test locally but there is no deployment management yet
- Milestones — the milestone/checkpoint system exists but is incomplete and not fully wired end-to-end
- Spec prioritization — agents don't always pick the most impactful spec items to implement first
- Manual video recordings — the assisted testing flow with video recording and analysis is partially implemented






