The design-token discipline layer for AI-generated codebases.
v0 generates. Cursor edits. Atelier enforces.
Open any AI-generated React component. Count the raw color refs:
// Generated by v0 / Cursor / Claude Code last week
<button className="bg-zinc-900 text-zinc-50 hover:bg-zinc-800 border border-zinc-200">
Save
</button>Now open the project's tailwind.config.ts. There's a perfectly good bg-foreground token sitting right there. The agent didn't use it.
Multiply by 50 components. Different shades of zinc on different pages. Theme switching is impossible. The "design system" is an artifact of the original setup that nothing actually conforms to.
This isn't an agent bug. It's a missing contract. AI tools generate against the densest pattern in their training data — Tailwind's raw palette — unless something stops them.
Atelier is the lint layer that stops them. It builds on @google/design.md (the Google Labs spec for machine-checkable design tokens) and adds three things the spec doesn't ship:
- A precedence rule. When tokens are resolvable from multiple sources — your local
DESIGN.md, a build-category default, the raw Tailwind palette — Atelier defines what wins. Project intent always survives a regeneration pass. - An empirically-derived 8-role vocabulary. Every production
DESIGN.mdwe surveyed used different names for the same eight roles. Atelier defines the canonical set so a linter can check role coverage regardless of internal naming. - A build-category atlas. Seven categories (SaaS dashboard, marketing landing, trading analytics, conversational UI, internal ops, multi-LLM synthesis, marketplace listing) with sensible default DNA. New projects inherit a real starting palette instead of raw Tailwind.
The same machinery ships as a CLI, a project audit, a TypeScript classifier, and an MCP server agents can call directly to lint their own output.
+149.98% relative lift in semantic-token conformance. Three-arm benchmark, 24 repos, pre-registered methodology, two-classifier sensitivity check.
| Arm | Conformance (v2-broad) | n |
|---|---|---|
Repo with DESIGN.md |
78.5% | 3/4 |
shadcn-style default vocab |
61.7% | 10/10 |
| Raw Tailwind palette only | 31.4% | 9/10 |
DESIGN.md arm beats shadcn-default by +16.87pp absolute / +149.98% relative. The pre-registered primary gate required ≥+15pp absolute. Verdict: PASS.
The strict-vs-broad sensitivity check agrees on the verdict. Methodology spec was committed to git before the runner saw it. Caveats and per-repo numbers are in benchmarks/results/2026-05-07-phase-1-v2.md. The corpus is single-developer; external validation requires fork-and-rerun. A generative arm-vs-arm study (causation, not correlation) is on the v0.2.0+ roadmap.
npm install -g @atelier-oss/cli
# In any project root
atelier init # writes a starter DESIGN.md
atelier lint DESIGN.md # validates the token contract
atelier classify . # scores token-vs-raw conformance
atelier atlas fingerprint . # detects build category (saas-dashboard, marketing-landing, ...)
atelier audit # six-section project health checkFirst useful signal in under 30 seconds: atelier classify . returns a single number per file — the share of color/spacing/typography references that resolve to declared tokens vs raw values. Run it before and after your next AI-generated PR.
┌──────────────────────────────────────────────────────────┐
│ Explicit → the project's own DESIGN.md │
│ ↓ │
│ Atlas → build-category default DNA (saas-dashboard,│
│ marketing-landing, ...) │
│ ↓ │
│ Palette → raw framework fallback (Tailwind zinc-*, │
│ blue-*, ...) │
└──────────────────────────────────────────────────────────┘
Higher source ALWAYS wins.
Atlas MUST NOT silently shadow explicit.
Palette is last-resort; lint warns on use.
This encodes early-return semantics for codegen pipelines. A regenerate pass that starts with palette refs gets warned. A new file that reaches for bg-zinc-900 when bg-foreground is declared gets warned. Your project's intent — whatever's in your DESIGN.md — is uncontestable.
Findings:
ATELIER_PRECEDENCE_VIOLATION(warning) — atlas default shadowed an explicit token.ATELIER_MISSING_ROLE(info) — a canonical role from the 8-role set has no satisfying token.
Neither is an error by default. Partial coverage is valid; full coverage is recommended.
| Role | Purpose |
|---|---|
background |
Default page surface |
foreground |
Default text on background |
primary |
Brand action (CTA, links, focus targets) |
primary-foreground |
Legible text on primary |
accent |
Secondary emphasis (badges, highlights) |
muted |
Low-contrast supporting tone (timestamps, captions) |
border |
Surface separator color |
ring |
Focus indicator color |
A DESIGN.md may use any internal naming (c-bg, bg, background are all valid). Map your names to roles via the optional aliases block:
aliases:
background: c-bg
foreground: c-text
primary: c-accentCoverage is a recommendation, never an error. Full vocab in spec/DESIGN.md.spec.md.
Atelier ships an MCP stdio server so any MCP-capable agent (Claude Code, Cursor, Cline, etc.) can call the linter, classifier, and audit directly:
Exposed tools:
atelier_lint— lint aDESIGN.mdagainst the spec + extensionsatelier_classify— score a file or directory for token conformanceatelier_audit— run the six-section project auditatelier_atlas_fingerprint— detect build category from a repo path
The intended workflow: an agent generates a component, calls atelier_classify on the diff, and rewrites any raw palette references before opening the PR. Closes the loop without a human in it.
| Package | Role |
|---|---|
@atelier-oss/cli |
The everyday binary. atelier init / lint / classify / atlas / audit. |
@atelier-oss/lint |
Wraps @google/design.md@0.1.1 (Apache-2.0) and adds the precedence rule + 8 roles. |
@atelier-oss/classify |
Token-vs-raw scorer. The engine behind the +149.98% number. |
@atelier-oss/atlas |
Fingerprints a repo, returns build category and default DNA. |
@atelier-oss/audit |
Six-section health check: token usage, contrast, motion, a11y, design coverage, responsive. |
@atelier-oss/mcp-server |
MCP stdio server exposing the above as agent-callable tools. |
Each package ships independently via changesets. All published with sigstore provenance attestations.
| Tool | Lints DESIGN.md | Precedence rule | Role vocab | Build-category atlas | Agent / MCP support | Empirical lift number |
|---|---|---|---|---|---|---|
| Atelier | yes | yes | yes (8 roles) | yes (7 categories) | yes (MCP server) | +149.98% relative |
@google/design.md |
yes | no | no | no | no | n/a |
eslint-plugin-tailwindcss |
no | no | no | no | no | n/a |
style-dictionary |
no | no | no | no | no | n/a |
Hand-rolled tailwind.config.ts |
no | no | no | no | no | n/a |
shadcn/ui defaults |
no | no | partial (~6 roles, undocumented) | n/a (defaults only) | no | n/a |
Atelier is additive to @google/design.md. Both extensions are proposed upstream as a PR (google-labs-code/design.md#76). If the spec adopts them, this layer becomes a no-op compatibility shim.
Three-arm observational benchmark, pre-registered before the runner expanded:
- Arm A — repos with a committed
DESIGN.md(n=4, 3 kept after dropouts) - Arm B — repos using shadcn-style default vocabulary (Radix UI + shadcn token names) without
DESIGN.md(n=10) - Arm C — repos using raw Tailwind palette refs only (n=10, 9 kept)
Each repo gets walked, every Tailwind class extracted, and each class scored against the project's declared token registry. Token-conforming references count as 1; raw palette references count as 0. Per-repo conformance is the mean. Arm conformance is the mean of repo means.
Two classifier modes for sensitivity:
- v2-strict — only
DESIGN.md-declared tokens count as registry. Hard test of explicit-spec adoption. - v2-broad (primary gate) —
DESIGN.md+tailwind.config+ CSS variables count as registry. Reflects shipped reality.
Both modes agree on the verdict. The pre-registered primary gate (DESIGN.md ≥ shadcn-default + 15pp absolute) was crossed at +16.87pp.
Caveats kept honest:
- Observational, not causal. The next milestone is a generative study (same prompt, with vs. without
DESIGN.mdin context). - Single-developer corpus. External validation requires fork-and-rerun.
- The first MVB (4+4 repos, two-arm) reported +105% relative lift but conflated shadcn-style and raw-palette in the control. The three-arm split above is the corrected number.
Re-run anytime:
python3 -m benchmarks.runner # corpus walk + scoring
python3 -m benchmarks.parity_check # 60/60 oracle for the TS port of the scorerSpec at benchmarks/spec-v2.md. Result at benchmarks/results/2026-05-07-phase-1-v2.md.
v0.1.0 (shipped 2026-05-08) — Tailwind v3 support, six packages on npm with sigstore provenance, three-arm benchmark cleared, upstream proposal PR open as draft.
v0.2.0 — Tailwind v4 support: @theme blocks as a registry source, oklch() palette in atlas, mixed-mode detection. Plan: docs/v0.2.0-plan.md.
v0.3.0 — Generative benchmark arm (causation, not correlation). Same prompt run with and without DESIGN.md in context, conformance measured at generation time. ~$50-100 in API.
v1.0.0 — Pending upstream maintainer signal on PR #76. If the precedence rule and role vocab land in @google/design.md, the wrapper collapses and v1.0 ships as a thin convenience layer. If they don't, v1.0 is the long-lived fork.
- 6 packages live on npm:
@atelier-oss/cli,/lint,/classify,/atlas,/audit,/mcp-server. All v0.1.0, all signed. - 138 TypeScript tests + 65/65 parity oracle + 39 Python validators. CI green on every push.
- Upstream PR open as draft at google-labs-code/design.md#76. Awaiting maintainer signal post-CLA.
The project is small and most decisions are still open. Things that would help right now:
- Run the benchmark on your own repos. Fork, point the runner at your codebase, open an issue with the result. External validation is the single biggest open question.
- Try the MCP server with your agent of choice. Open an issue with the workflow that worked (or didn't).
- Submit DESIGN.md examples. The 8-role vocab was derived from 4 production files. Wider sampling sharpens the canonical set.
- Question the role list. If your DESIGN.md needs a role outside the 8, that's a spec proposal worth making.
PRs welcome. Issues are the right place to start for anything bigger than a typo. Conventional Commits, please.
MIT for Atelier code. Apache-2.0 for the bundled @google/design.md dependency (preserved verbatim, full NOTICE at repo root). The wrapper layer is additive; if the upstream package adopts the extensions, this fork collapses.
- Google Labs for the
DESIGN.mdspec. - The four anonymous DESIGN.md authors whose naming patterns shaped the canonical role list.
If Atelier saves you a debugging hour, star the repo. It's the cheapest way to tell us the lift number isn't an artifact of the corpus.