Skip to content

atelier-oss/atelier

Repository files navigation

Atelier

The design-token discipline layer for AI-generated codebases.

v0 generates. Cursor edits. Atelier enforces.

npm license sigstore provenance spec


The problem

Open any AI-generated React component. Count the raw color refs:

// Generated by v0 / Cursor / Claude Code last week
<button className="bg-zinc-900 text-zinc-50 hover:bg-zinc-800 border border-zinc-200">
  Save
</button>

Now open the project's tailwind.config.ts. There's a perfectly good bg-foreground token sitting right there. The agent didn't use it.

Multiply by 50 components. Different shades of zinc on different pages. Theme switching is impossible. The "design system" is an artifact of the original setup that nothing actually conforms to.

This isn't an agent bug. It's a missing contract. AI tools generate against the densest pattern in their training data — Tailwind's raw palette — unless something stops them.

What Atelier does

Atelier is the lint layer that stops them. It builds on @google/design.md (the Google Labs spec for machine-checkable design tokens) and adds three things the spec doesn't ship:

  1. A precedence rule. When tokens are resolvable from multiple sources — your local DESIGN.md, a build-category default, the raw Tailwind palette — Atelier defines what wins. Project intent always survives a regeneration pass.
  2. An empirically-derived 8-role vocabulary. Every production DESIGN.md we surveyed used different names for the same eight roles. Atelier defines the canonical set so a linter can check role coverage regardless of internal naming.
  3. A build-category atlas. Seven categories (SaaS dashboard, marketing landing, trading analytics, conversational UI, internal ops, multi-LLM synthesis, marketplace listing) with sensible default DNA. New projects inherit a real starting palette instead of raw Tailwind.

The same machinery ships as a CLI, a project audit, a TypeScript classifier, and an MCP server agents can call directly to lint their own output.

Does it work?

+149.98% relative lift in semantic-token conformance. Three-arm benchmark, 24 repos, pre-registered methodology, two-classifier sensitivity check.

Arm Conformance (v2-broad) n
Repo with DESIGN.md 78.5% 3/4
shadcn-style default vocab 61.7% 10/10
Raw Tailwind palette only 31.4% 9/10

DESIGN.md arm beats shadcn-default by +16.87pp absolute / +149.98% relative. The pre-registered primary gate required ≥+15pp absolute. Verdict: PASS.

The strict-vs-broad sensitivity check agrees on the verdict. Methodology spec was committed to git before the runner saw it. Caveats and per-repo numbers are in benchmarks/results/2026-05-07-phase-1-v2.md. The corpus is single-developer; external validation requires fork-and-rerun. A generative arm-vs-arm study (causation, not correlation) is on the v0.2.0+ roadmap.

Quickstart

npm install -g @atelier-oss/cli

# In any project root
atelier init                 # writes a starter DESIGN.md
atelier lint DESIGN.md       # validates the token contract
atelier classify .           # scores token-vs-raw conformance
atelier atlas fingerprint .  # detects build category (saas-dashboard, marketing-landing, ...)
atelier audit                # six-section project health check

First useful signal in under 30 seconds: atelier classify . returns a single number per file — the share of color/spacing/typography references that resolve to declared tokens vs raw values. Run it before and after your next AI-generated PR.

How precedence works

┌──────────────────────────────────────────────────────────┐
│  Explicit  →  the project's own DESIGN.md                │
│       ↓                                                  │
│  Atlas     →  build-category default DNA (saas-dashboard,│
│               marketing-landing, ...)                    │
│       ↓                                                  │
│  Palette   →  raw framework fallback (Tailwind zinc-*,   │
│               blue-*, ...)                               │
└──────────────────────────────────────────────────────────┘
        Higher source ALWAYS wins.
        Atlas MUST NOT silently shadow explicit.
        Palette is last-resort; lint warns on use.

This encodes early-return semantics for codegen pipelines. A regenerate pass that starts with palette refs gets warned. A new file that reaches for bg-zinc-900 when bg-foreground is declared gets warned. Your project's intent — whatever's in your DESIGN.md — is uncontestable.

Findings:

  • ATELIER_PRECEDENCE_VIOLATION (warning) — atlas default shadowed an explicit token.
  • ATELIER_MISSING_ROLE (info) — a canonical role from the 8-role set has no satisfying token.

Neither is an error by default. Partial coverage is valid; full coverage is recommended.

The 8 canonical roles

Role Purpose
background Default page surface
foreground Default text on background
primary Brand action (CTA, links, focus targets)
primary-foreground Legible text on primary
accent Secondary emphasis (badges, highlights)
muted Low-contrast supporting tone (timestamps, captions)
border Surface separator color
ring Focus indicator color

A DESIGN.md may use any internal naming (c-bg, bg, background are all valid). Map your names to roles via the optional aliases block:

aliases:
  background: c-bg
  foreground: c-text
  primary: c-accent

Coverage is a recommendation, never an error. Full vocab in spec/DESIGN.md.spec.md.

For AI coding agents (MCP server)

Atelier ships an MCP stdio server so any MCP-capable agent (Claude Code, Cursor, Cline, etc.) can call the linter, classifier, and audit directly:

// .mcp.json
{
  "mcpServers": {
    "atelier": {
      "command": "npx",
      "args": ["-y", "@atelier-oss/mcp-server"],
    },
  },
}

Exposed tools:

  • atelier_lint — lint a DESIGN.md against the spec + extensions
  • atelier_classify — score a file or directory for token conformance
  • atelier_audit — run the six-section project audit
  • atelier_atlas_fingerprint — detect build category from a repo path

The intended workflow: an agent generates a component, calls atelier_classify on the diff, and rewrites any raw palette references before opening the PR. Closes the loop without a human in it.

Packages

Package Role
@atelier-oss/cli The everyday binary. atelier init / lint / classify / atlas / audit.
@atelier-oss/lint Wraps @google/design.md@0.1.1 (Apache-2.0) and adds the precedence rule + 8 roles.
@atelier-oss/classify Token-vs-raw scorer. The engine behind the +149.98% number.
@atelier-oss/atlas Fingerprints a repo, returns build category and default DNA.
@atelier-oss/audit Six-section health check: token usage, contrast, motion, a11y, design coverage, responsive.
@atelier-oss/mcp-server MCP stdio server exposing the above as agent-callable tools.

Each package ships independently via changesets. All published with sigstore provenance attestations.

How Atelier compares

Tool Lints DESIGN.md Precedence rule Role vocab Build-category atlas Agent / MCP support Empirical lift number
Atelier yes yes yes (8 roles) yes (7 categories) yes (MCP server) +149.98% relative
@google/design.md yes no no no no n/a
eslint-plugin-tailwindcss no no no no no n/a
style-dictionary no no no no no n/a
Hand-rolled tailwind.config.ts no no no no no n/a
shadcn/ui defaults no no partial (~6 roles, undocumented) n/a (defaults only) no n/a

Atelier is additive to @google/design.md. Both extensions are proposed upstream as a PR (google-labs-code/design.md#76). If the spec adopts them, this layer becomes a no-op compatibility shim.

Methodology — where the +149.98% comes from

Three-arm observational benchmark, pre-registered before the runner expanded:

  • Arm A — repos with a committed DESIGN.md (n=4, 3 kept after dropouts)
  • Arm B — repos using shadcn-style default vocabulary (Radix UI + shadcn token names) without DESIGN.md (n=10)
  • Arm C — repos using raw Tailwind palette refs only (n=10, 9 kept)

Each repo gets walked, every Tailwind class extracted, and each class scored against the project's declared token registry. Token-conforming references count as 1; raw palette references count as 0. Per-repo conformance is the mean. Arm conformance is the mean of repo means.

Two classifier modes for sensitivity:

  • v2-strict — only DESIGN.md-declared tokens count as registry. Hard test of explicit-spec adoption.
  • v2-broad (primary gate) — DESIGN.md + tailwind.config + CSS variables count as registry. Reflects shipped reality.

Both modes agree on the verdict. The pre-registered primary gate (DESIGN.md ≥ shadcn-default + 15pp absolute) was crossed at +16.87pp.

Caveats kept honest:

  • Observational, not causal. The next milestone is a generative study (same prompt, with vs. without DESIGN.md in context).
  • Single-developer corpus. External validation requires fork-and-rerun.
  • The first MVB (4+4 repos, two-arm) reported +105% relative lift but conflated shadcn-style and raw-palette in the control. The three-arm split above is the corrected number.

Re-run anytime:

python3 -m benchmarks.runner       # corpus walk + scoring
python3 -m benchmarks.parity_check # 60/60 oracle for the TS port of the scorer

Spec at benchmarks/spec-v2.md. Result at benchmarks/results/2026-05-07-phase-1-v2.md.

Roadmap

v0.1.0 (shipped 2026-05-08) — Tailwind v3 support, six packages on npm with sigstore provenance, three-arm benchmark cleared, upstream proposal PR open as draft.

v0.2.0 — Tailwind v4 support: @theme blocks as a registry source, oklch() palette in atlas, mixed-mode detection. Plan: docs/v0.2.0-plan.md.

v0.3.0 — Generative benchmark arm (causation, not correlation). Same prompt run with and without DESIGN.md in context, conformance measured at generation time. ~$50-100 in API.

v1.0.0 — Pending upstream maintainer signal on PR #76. If the precedence rule and role vocab land in @google/design.md, the wrapper collapses and v1.0 ships as a thin convenience layer. If they don't, v1.0 is the long-lived fork.

Status

Contributing

The project is small and most decisions are still open. Things that would help right now:

  1. Run the benchmark on your own repos. Fork, point the runner at your codebase, open an issue with the result. External validation is the single biggest open question.
  2. Try the MCP server with your agent of choice. Open an issue with the workflow that worked (or didn't).
  3. Submit DESIGN.md examples. The 8-role vocab was derived from 4 production files. Wider sampling sharpens the canonical set.
  4. Question the role list. If your DESIGN.md needs a role outside the 8, that's a spec proposal worth making.

PRs welcome. Issues are the right place to start for anything bigger than a typo. Conventional Commits, please.

License

MIT for Atelier code. Apache-2.0 for the bundled @google/design.md dependency (preserved verbatim, full NOTICE at repo root). The wrapper layer is additive; if the upstream package adopts the extensions, this fork collapses.

Credits

  • Google Labs for the DESIGN.md spec.
  • The four anonymous DESIGN.md authors whose naming patterns shaped the canonical role list.

If Atelier saves you a debugging hour, star the repo. It's the cheapest way to tell us the lift number isn't an artifact of the corpus.

About

Design-token discipline layer for AI-generated codebases. Lints DESIGN.md, enforces explicit > atlas > palette precedence, and ships an MCP server agents can call. +149.98% lift in semantic-token conformance.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors