[DRAFT RFC] Token Schema Structure and Validation System #646

GarthDB · 2025-12-16T21:22:22Z

GarthDB
Dec 16, 2025
Maintainer

⚠️ DRAFT STATUS: This RFC documents the token schema structure implemented in PR #644 and proposes it as the standard for future token work. Open for feedback and review.

RFC: Token Schema Structure and Validation System

Status: Draft - Implementation Complete
Author: Garth Braithwaite
DACI: [To be assigned]
Implementation: PR #644
Related: DNA-1485, RFC #624, RFC #625, RFC #626

Executive Summary

This RFC proposes a comprehensive schema structure for all Spectrum design tokens, transforming hyphen-delimited token names into structured JSON objects with full validation capabilities. This provides the foundation for advanced tooling including token recommendations, automated documentation, and cross-platform transformation.

Problem: Current token structure uses hyphen-delimited names with implicit meaning, no validation of naming conventions, and limited ability to query or analyze tokens systematically. This makes it difficult to build tooling, enforce governance, or provide semantic guidance.

Solution: Implement structured token format with JSON Schema validation, controlled vocabularies (enums), semantic analysis capabilities, and perfect round-trip conversion. Complete implementation provided in PR #644.

Results: All 2,338 tokens across 8 files successfully parsed and validated with 100% regeneration rate and 82% schema validation coverage.

Background & Context

Origin

DNA-1485: Initiative to improve and expand design data schemas
August 2025 Onsite: Discussion of data system improvements and tooling needs
Token Recommendation Requirements: Need structured data for semantic token suggestions
Documentation Generation: Need queryable token structure for automated docs

Current Token Format

{
  "text-to-visual-50": {
    "$schema": "https://opensource.adobe.com/spectrum-design-data/schemas/token-types/dimension.json",
    "value": "4px",
    "uuid": "f1bc4c85-c0dc-44bf-a156-54707f3626e9"
  }
}

Limitations:

Token name meaning is implicit (what does "text-to-visual" mean?)
No validation of naming conventions
Can't query "show me all spacing tokens"
Can't analyze semantic complexity
Difficult to track token references
No structured relationship data

Design Data System Vision

From your onsite presentation:

System of systems: Foundation → Platform → Product
Easy to reason about: Clear structure and relationships
Governance and iteration: Enforceable standards
Partner collaboration: Shared understanding of token structure

Proposal

Structured Token Format

Transform token names into structured objects with full semantic information:

{
  "id": "f1bc4c85-c0dc-44bf-a156-54707f3626e9",
  "$schema": "https://opensource.adobe.com/spectrum-design-data/schemas/token-types/dimension.json",
  "value": "4px",
  "name": {
    "original": "text-to-visual-50",
    "structure": {
      "category": "spacing",
      "property": "spacing",
      "spaceBetween": {
        "from": "text",
        "to": "visual"
      },
      "index": "50"
    },
    "semanticComplexity": 1
  },
  "validation": {
    "isValid": true,
    "errors": []
  }
}

Token Categories

Nine primary token categories identified across all tokens:

spacing - Space-between relationships (text-to-visual-50)
component-property - Component-specific properties (button-height-100)
generic-property - Global properties (corner-radius-100)
semantic-alias - Reference tokens (accent-color-100 → {blue-800})
color-base - Base color palette (blue-800)
color-scale - Color scales with modifiers (blue-800, transparent-blue-800)
gradient-color - Gradient stops (gradient-stop-1-red)
typography-base - Font properties (bold-font-weight, sans-font-family)
special - Edge cases requiring custom schemas (composite tokens, platform-specific)

Schema Architecture

Base Schema Hierarchy

base-token.json (foundation for all tokens)
├── regular-token.json (single-value tokens)
│   ├── spacing-token.json
│   ├── component-property-token.json
│   ├── generic-property-token.json
│   ├── semantic-alias-token.json
│   ├── color-base-token.json
│   ├── color-scale-token.json
│   ├── gradient-color-token.json
│   └── typography-base-token.json
│
└── scale-set-token.json (desktop/mobile variants)
    ├── spacing-scale-set-token.json
    ├── component-property-scale-set-token.json
    ├── generic-property-scale-set-token.json
    ├── color-set-token.json
    ├── color-scale-scale-set-token.json
    └── semantic-alias-color-set-token.json

Enum Schemas (Controlled Vocabularies)

12 enum schemas define allowed values for token name parts:

components.json - 80 component names (button, checkbox, field, etc.)
anatomy-parts.json - 249 anatomy parts (control, track, text, visual, etc.)
properties.json - 376 properties (height, width, size, spacing, etc.)
sizes.json - 19 numeric scale indices (0, 25, 50, 75, 100-1500)
component-options.json - 10 options (small, medium, large, quiet, compact, etc.)
states.json - 5 UI states (hover, down, focus, disabled, selected)
colors.json - 23 base color names (blue, red, green, gray, etc.)
color-modifiers.json - 7 modifiers (transparent, static, etc.)
color-indices.json - 15 color scale indices (100-1400)
platforms.json - 2 platform identifiers (android, ios)
themes.json - 3 theme names (light, dark, wireframe)
relationship-connectors.json - 1 spacing connector ("to")

Total controlled vocabulary: 800+ values ensuring consistency

Semantic Complexity Metric

Measures how much semantic context a token provides (0-3+):

// Calculated based on semantic fields in name.structure
semanticComplexity = countOf([
  component,
  property,
  anatomyPart,
  spaceBetween,
  referencedToken,
  options,
  state,
  calculation,
  platform
])

Examples:

gray-100: complexity 0 (base palette, no semantic context)
background-color-default: complexity 1 (semantic alias with property)
button-background-color-default: complexity 2 (component + property + alias)
button-control-background-color-hover: complexity 3+ (component + anatomy + property + state)

Use Case: Token recommendation systems can suggest more semantically specific tokens:

"You used blue-800, consider accent-color-100 (more semantic)"
"Consider button-background-color-default (most specific for this use case)"

Validation Strategy

Schema-Driven Validation

Validator: AJV with JSON Schema Draft 2020-12
Enums: All token name parts validated against controlled vocabularies
References: Semantic aliases validated for correct token references
Structure: Each category has specific structural requirements

Validation Levels

100%: Perfect schema coverage (color-palette, semantic-color-palette, icons)
95%+: Well-validated with minor edge cases (typography)
75-90%: Good coverage with known special tokens (color-aliases, color-component, layout)
70-85%: Complex tokens with many special cases (layout-component)

Current Validation Results

File	Tokens	Match	Valid	Rate
color-palette.json	372	100%	372	100%
semantic-color-palette.json	94	100%	94	100%
icons.json	79	100%	79	100%
typography.json	312	100%	297	95.2%
color-aliases.json	169	100%	150	88.8%
color-component.json	73	100%	56	76.7%
layout.json	242	100%	180	74.4%
layout-component.json	997	100%	701	70.3%
Total	2,338	100%	1,929	82.5%

Anonymous Token Array Structure

Tokens stored as array of objects (not keyed by name):

Why:

Enables tokens with identical names across themes
Perfect round-trip conversion (name is reconstructed from structure)
Maintains all original metadata (uuid, deprecated flags, etc.)
Supports future multi-dimensional token spaces

Before (keyed by name):

{
  "blue-800": { "value": "#1473E6", "uuid": "..." }
}

After (anonymous array):

[
  {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "value": "#1473E6",
    "name": {
      "original": "blue-800",
      "structure": { "category": "color-base", "color": "blue", "index": "800" },
      "semanticComplexity": 0
    }
  }
]

Round-Trip Verification

Critical Requirement: Structured format must perfectly regenerate original token names.

Implementation:

Handlebars templates for each token category
Template per category (spacing-token.hbs, component-property-token.hbs, etc.)
Automated comparison of original vs regenerated names

Results: 100% match rate (2,338/2,338 tokens) - zero data loss

Example Template (spacing-token.hbs):

{{#if component}}{{component}}-{{/if}}{{spaceBetween.from}}-to-{{spaceBetween.to}}{{#if index}}-{{index}}{{/if}}{{#if options}}{{#each options}}-{{this}}{{/each}}{{/if}}

Implementation

Complete Implementation: PR #644

Package 1: packages/structured-tokens/

8 structured token files (all 2,338 tokens)
25+ JSON schemas for validation
12 enum schemas for controlled vocabularies
Production-ready, fully documented

Package 2: tools/token-name-parser/

Token parser (converts names → structure)
Schema validator (validates against JSON schemas)
Name regenerator (converts structure → names)
Name comparator (verifies round-trip accuracy)
8 Handlebars templates
19 passing tests (100% test coverage)
20+ documentation files

Parser Capabilities

Pattern Detection:

Spacing tokens with space-between relationships
Component properties with anatomy parts and options
Generic properties with compound names (drop-shadow-x, corner-radius)
Semantic aliases with reference tracking
Color tokens (base, scale, gradient) with theme sets
Typography tokens with font properties
Compound patterns (multi-word components: "radio-button", options: "extra-large")

Example Parsing:

Input: checkbox-control-size-small

{
  "category": "component-property",
  "component": "checkbox",
  "anatomyPart": "control",
  "property": "size",
  "options": ["small"]
}

Input: text-to-visual-compact-medium

{
  "category": "spacing",
  "property": "spacing",
  "spaceBetween": { "from": "text", "to": "visual" },
  "options": ["compact", "medium"]
}

Input: accent-color-100 (references {blue-800})

{
  "category": "semantic-alias",
  "property": "accent-color-100",
  "referencedToken": "blue-800",
  "notes": "Semantic alias providing contextual naming"
}

Usage Examples

Query Tokens by Category

const spacingTokens = tokens.filter(t => t.name.structure.category === 'spacing');
// Returns: All 461 spacing tokens

Find High-Complexity Tokens

const semanticTokens = tokens.filter(t => t.name.semanticComplexity >= 2);
// Returns: Tokens with strong semantic context for recommendations

Track Token References

const aliases = tokens.filter(t => 
  t.name.structure.category === 'semantic-alias' &&
  t.name.structure.referencedToken === 'blue-800'
);
// Returns: All tokens that reference blue-800

Validate Token Names

const invalid = tokens.filter(t => !t.validation.isValid);
// Returns: Tokens that don't match schemas (need attention)

Benefits & Use Cases

1. Token Recommendation Systems

Enabled by semantic complexity metric and reference tracking

Use Case: IDE plugin suggests semantic alternatives

// Developer types: color: #1473E6
// Plugin suggests:
//   - blue-800 (exact match, complexity: 0)
//   - accent-color-100 (semantic alias, complexity: 1) ✓ Recommended
//   - button-background-color-default (component-specific, complexity: 2)

2. Automated Documentation Generation

Enabled by structured data and queryable format

Use Case: Generate token catalog by category

# Spacing Tokens

## Text-to-Visual Spacing
- text-to-visual-50: 4px
- text-to-visual-100: 8px
- text-to-visual-200: 12px

3. Design System Governance

Enabled by schema validation and controlled vocabularies

Use Case: CI/CD validation of token PRs

$ pnpm validate-tokens
✓ All token names follow conventions
✗ Error: "button-height-350" - index 350 not in allowed sizes
✗ Error: "unknow-component-size" - component not in allowed list

4. Cross-Platform Token Transformation

Enabled by structured format and perfect round-trip

Use Case: Transform tokens for different platforms

// Web: --spectrum-button-background-default
// iOS: ButtonBackgroundDefault
// Android: button_background_default
// All from same structured source

5. Token Migration & Deprecation

Enabled by reference tracking and semantic analysis

Use Case: Identify tokens to migrate

// Find all tokens referencing deprecated blue-800
const affectedTokens = findReferences('blue-800');
// Plan migration path: blue-800 → blue-900
// Update all 23 semantic aliases automatically

6. Foundation for Future RFCs

Directly enables other proposed RFCs

RFC [DRAFT RFC] Design Token Sourcemaps and Traceability #626 (Sourcemaps): Structured tokens provide UUIDs and reference tracking needed for sourcemaps
RFC [DRAFT RFC] Token Authoring Workflow and Process #625 (Authoring): Schema validation enforces authoring rules in CI/CD
RFC [DRAFT RFC] Token Structure Changes for Multi-Platform Support #624 (Multi-platform): Structured format can express platform extensions/overrides

Alternatives Considered

Alternative 1: Keep Hyphenated Names Only

Pros: No change, existing tooling works
Cons: Can't build advanced tooling, no governance, limited querying
Decision: Rejected - doesn't meet future needs

Alternative 2: Use DTCG Format Directly

Pros: Standard format, external tool support
Cons: Doesn't capture Spectrum-specific semantics (anatomy, space-between), loses semantic complexity
Decision: Considered for future (RFC #627 proposes DTCG as additional output)

Alternative 3: Object with Names as Keys

Pros: Familiar structure, easy lookup by name
Cons: Can't have duplicate names across themes, harder round-trip
Decision: Rejected - anonymous array provides more flexibility

Alternative 4: AI/LLM-Based Parsing

Pros: Could handle more edge cases
Cons: Non-deterministic, harder to validate, slower
Decision: Rejected - rule-based parsing with schemas is more reliable

Migration & Adoption

Phase 1: Non-Breaking Addition (Complete in PR #644)

✅ Structured tokens live alongside original tokens
✅ No changes to existing token files in packages/tokens/src/
✅ New package: packages/structured-tokens/
✅ Tooling in: tools/token-name-parser/

Phase 2: Tooling Integration (Next)

Build token recommendation MCP
Integrate validation into CI/CD
Create documentation generator
Implement sourcemaps (RFC [DRAFT RFC] Design Token Sourcemaps and Traceability #626)

Phase 3: Authoring Workflow (Future)

Define token authoring process using schemas
Implement validation gates
Roll out to team (RFC [DRAFT RFC] Token Authoring Workflow and Process #625)

Phase 4: Platform Transformation (Future)

Use structured tokens for platform-specific builds
Implement multi-platform support (RFC [DRAFT RFC] Token Structure Changes for Multi-Platform Support #624)
Generate platform-specific formats

No breaking changes to existing token consumers.

Success Metrics

Achieved in PR #644:

✅ 100% Token Coverage: All 2,338 tokens parsed (8/8 files)
✅ 100% Regeneration Rate: Perfect round-trip conversion
✅ 82.5% Validation Rate: 1,929/2,338 tokens fully validated
✅ 100% Test Pass Rate: 19/19 tests passing
✅ Zero Breaking Changes: Original tokens unchanged

Future Success Metrics:

90%+ Validation Rate: Improve with additional schemas for edge cases
Developer Adoption: Token recommendation tool usage
CI/CD Integration: Automated validation in PR checks
Documentation Generation: Auto-generated token docs
Platform Support: All platforms using structured format

Known Limitations & Future Work

455 Special Tokens (19.5%)

Tokens that regenerate correctly but need additional schemas:

Categories:

Composite Typography (15 tokens): component-xs-regular bundles multiple font properties
Drop Shadow Composites (4 tokens): drop-shadow-emphasized has complex structure
Component Opacity (17 tokens): swatch-border-opacity direct opacity values
Multiplier Tokens (various): button-minimum-width-multiplier calculation-based
Platform-Specific (2 tokens): android-elevation needs platform schema

Future Schemas Needed:

typography-composite-token.json
drop-shadow-composite-token.json
multiplier-token.json
Platform-specific token schemas

Impact: These tokens work correctly (100% regeneration) but show as "special" in validation reports.

Edge Cases

Some anatomy parts are compound words (focus-indicator, side-label-character-count)
Component options sometimes stack (compact-extra-large)
Platform-specific tokens need additional categorization
Multi-dimensional tokens (modes beyond light/dark) not fully modeled

Performance Considerations

Parser processes 2,338 tokens in <2 seconds
Schema validation adds ~500ms
Acceptable for CI/CD and development use
Not intended for runtime use in products

Open Questions

Schema Evolution: How do we version schemas as token structure evolves?
Special Token Threshold: At what validation rate do we consider "special" tokens acceptable?
DTCG Alignment: Should we align more closely with DTCG format? (See RFC [DRAFT RFC] DTCG Format as Additional Release Output #627)
Multi-Dimensional Tokens: How do we model modes beyond light/dark/wireframe?
Platform Extensions: How do platform-specific tokens integrate with this structure? (See RFC [DRAFT RFC] Token Structure Changes for Multi-Platform Support #624)

Related Work & References

GitHub Discussions

RFC #624: Token Structure Changes for Multi-Platform Support - This RFC provides foundation for structured platform extensions
RFC #625: Token Authoring Workflow and Process - Schema validation enables authoring workflow enforcement
RFC #626: Design Token Sourcemaps and Traceability - Structured tokens provide UUIDs and references needed for sourcemaps
Discussion #297: Composite tokens for Drop Shadow and Typography - Identified composite token patterns
Discussion #507: Composite token proof of concept (s2 typography) - Context for composite typography tokens

Jira Tickets

DNA-1485: Initiate a project to improve and expand design data schemas - Parent initiative

Implementation

PR #644: Add structured token parser and comprehensive schema system - Complete implementation

Documentation (in PR #644)

FINAL_PROJECT_SUMMARY.md - Complete project overview
ICONS_RESULTS.md - Icons parsing results (100% validation)
TYPOGRAPHY_RESULTS.md - Typography parsing results (95.2% validation)
LAYOUT_COMPONENT_RESULTS.md - Layout component results (70.3% validation)
COLOR_FINAL_RESULTS.md - All color files summary
SEMANTIC_COMPLEXITY.md - Semantic complexity metric documentation
ROUND_TRIP_VERIFICATION.md - Round-trip conversion verification
13 additional documentation files

Decision Points

For Approval

Accept structured token format as defined in this RFC
Accept schema architecture (base schemas + category schemas + enums)
Accept semantic complexity metric as standard measure
Accept 80%+ validation rate as success criteria (with special tokens documented)
Accept anonymous token array over keyed object structure

For Discussion

Schema versioning strategy - how do we evolve schemas?
Special token threshold - what % is acceptable long-term?
DTCG alignment - how closely should we align with DTCG format?
Governance - who approves new categories, enums, schema changes?

Next Steps

Immediate (Post-Approval)

Merge PR feat(tokens): Add structured token parser and comprehensive schema system #644 - Make structured tokens available
Close DNA-1485 - Mark schema improvement initiative complete
Update related RFCs - Reference this as foundational work

Short-term (1-2 months)

Implement RFC [DRAFT RFC] Design Token Sourcemaps and Traceability #626 - Build sourcemap system on this foundation
Create special token schemas - Reduce "special" category from 19.5% to <5%
Build token recommendation POC - Demonstrate semantic complexity value

Medium-term (3-6 months)

Integrate into CI/CD - Automated validation of token PRs
Documentation generator - Auto-generate token catalogs
Token authoring workflow - Use schemas to guide token creation (RFC [DRAFT RFC] Token Authoring Workflow and Process #625)

Long-term (6-12 months)

Multi-platform support - Extend for platform-specific tokens (RFC [DRAFT RFC] Token Structure Changes for Multi-Platform Support #624)
DTCG export - Generate DTCG format as additional output
Design tool integration - Figma plugin using structured data

Appendix

Appendix A: Complete Token Category Definitions

See full documentation in PR #644:

packages/structured-tokens/schemas/ - All schema definitions
packages/structured-tokens/schemas/enums/ - All enum definitions
tools/token-name-parser/templates/ - Regeneration templates

Appendix B: Validation Reports

Complete validation reports available in PR #644:

tools/token-name-parser/output/[filename]-validation-report.json

Appendix C: Parser Implementation

Full parser source:

tools/token-name-parser/src/parser.js - Token name parsing logic (838 lines)
tools/token-name-parser/src/validator.js - Schema validation (242 lines)
tools/token-name-parser/src/name-regenerator.js - Name regeneration (98 lines)

Appendix D: Test Coverage

All tests passing:

tools/token-name-parser/test/parser.test.js
tools/token-name-parser/test/name-regenerator.test.js
tools/token-name-parser/test/name-comparator.test.js
tools/token-name-parser/test/semantic-complexity.test.js

Feedback & Discussion

Please provide feedback on:

Schema architecture - Is the hierarchy clear and extensible?
Token categories - Are the 9 categories comprehensive?
Semantic complexity - Is this metric useful for your use cases?
Validation threshold - Is 82% acceptable with documented special tokens?
Anonymous array structure - Better than keyed object?
Special tokens - Should we create more schemas or accept as edge cases?

This RFC is open for discussion and feedback before moving to approval.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT RFC] Token Schema Structure and Validation System #646

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[DRAFT RFC] Token Schema Structure and Validation System #646

Uh oh!

GarthDB Dec 16, 2025 Maintainer

RFC: Token Schema Structure and Validation System

Executive Summary

Background & Context

Origin

Current Token Format

Design Data System Vision

Proposal

Structured Token Format

Token Categories

Schema Architecture

Base Schema Hierarchy

Enum Schemas (Controlled Vocabularies)

Semantic Complexity Metric

Validation Strategy

Schema-Driven Validation

Validation Levels

Current Validation Results

Anonymous Token Array Structure

Round-Trip Verification

Implementation

Complete Implementation: PR #644

Parser Capabilities

Usage Examples

Query Tokens by Category

Find High-Complexity Tokens

Track Token References

Validate Token Names

Benefits & Use Cases

1. Token Recommendation Systems

2. Automated Documentation Generation

3. Design System Governance

4. Cross-Platform Token Transformation

5. Token Migration & Deprecation

6. Foundation for Future RFCs

Alternatives Considered

Alternative 1: Keep Hyphenated Names Only

Alternative 2: Use DTCG Format Directly

Alternative 3: Object with Names as Keys

Alternative 4: AI/LLM-Based Parsing

Migration & Adoption

Phase 1: Non-Breaking Addition (Complete in PR #644)

Phase 2: Tooling Integration (Next)

Phase 3: Authoring Workflow (Future)

Phase 4: Platform Transformation (Future)

Success Metrics

Achieved in PR #644:

Future Success Metrics:

Known Limitations & Future Work

455 Special Tokens (19.5%)

Edge Cases

Performance Considerations

Open Questions

Related Work & References

GitHub Discussions

Jira Tickets

Implementation

Documentation (in PR #644)

Decision Points

For Approval

For Discussion

Next Steps

Immediate (Post-Approval)

Short-term (1-2 months)

Medium-term (3-6 months)

Long-term (6-12 months)

Appendix

Appendix A: Complete Token Category Definitions

Appendix B: Validation Reports

Appendix C: Parser Implementation

Appendix D: Test Coverage

Feedback & Discussion

Replies: 0 comments

GarthDB
Dec 16, 2025
Maintainer