Skip to content

Universal fingerprinting + LLM-powered design review#31

Merged
nahiyankhan merged 21 commits intomainfrom
refactor/universal-fingerprinting
Apr 15, 2026
Merged

Universal fingerprinting + LLM-powered design review#31
nahiyankhan merged 21 commits intomainfrom
refactor/universal-fingerprinting

Conversation

@nahiyankhan
Copy link
Copy Markdown
Collaborator

Summary

  • Universal fingerprinting: Profile any design system from GitHub, npm, URLs, or local paths. LLM-powered (Claude Agent SDK) with filesystem exploration, producing 49-dimensional embeddings across palette, spacing, typography, and surfaces.
  • ghost review: LLM-powered design drift detection on changed files. Sends a design fingerprint + source code to Claude, which reviews for visual language violations and returns structured issues with fix suggestions. The fingerprint IS the rule set — no hardcoded rules.
  • Fleet observability: Compare N design systems, compute pairwise distances, detect clusters and outliers.
  • Evolution tracking: Acknowledge, adopt, or intentionally diverge from a parent design system with per-dimension stances.
  • GitHub Action: Scaffold for automated PR review (action/).
  • AI skill: /ghost-review for Claude Code, Cursor, Codex.
  • MCP server: Added review_files tool alongside existing component registry tools.

Key changes

  • packages/ghost-core/src/review/ — review pipeline (prompts, file collection, LLM orchestration)
  • packages/ghost-core/src/agents/ — fingerprint agent (Claude Agent SDK + filesystem tools)
  • packages/ghost-core/src/fingerprint/ — 49-dim embedding, OKLCH color science, semantic role detection
  • packages/ghost-core/src/stages/ — deterministic stages (extract, compare, comply)
  • packages/ghost-core/src/evolution/ — sync manifests, fleet comparison, temporal analysis
  • packages/ghost-cli/src/bin.ts — 12 CLI commands including ghost review
  • skills/ghost-review/SKILL.md — AI agent skill
  • action/ — GitHub Action scaffold

Test plan

  • pnpm build — TypeScript compiles cleanly
  • pnpm test — 10 test suites, 97 tests pass
  • ghost review <file> --fingerprint <fp.json> — produces structured review via Claude
  • ghost review on ghost-ui components — zero false positives on clean components, catches real drift (e.g. gap-1.5 = 6px not in spacing scale)
  • ghost profile . — generates fingerprint for any project

🤖 Generated with Claude Code

nahiyankhan and others added 21 commits April 13, 2026 08:51
…ing dims

- Expand semantic role mapping from 9 to 30+ tokens (shadcn, brand, prefix fallback)
- Add 15 new CSS token category prefixes (--space-*, --gap-*, --line-height-*, etc.)
- Compute lineHeightPattern from actual token values instead of hardcoding "normal"
- Fill all 15 reserved zero-slots in 64-dim embedding with real signal
- Expand hardcoded color regex to catch oklch, oklab, lch, lab, color() functions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ntic roles, evolution hardening, AI layer

Overhaul fingerprint detection accuracy and extensibility across 5 areas:

- Color parsing: add HSL, oklch%, color-mix(), named colors, system colors;
  continuous saturation/contrast scoring; soft chroma boundary (sigmoid)
- Normalization: log-scaled component/spacing counts, raised radius/base-unit
  caps, sqrt tokenization, centralized NORM constants, borderTokenCount
- Semantic roles: 4-layer inference engine (exact, pattern, keyword, value-based)
  supporting shadcn, MUI, Chakra, and custom naming conventions
- Font matching: fuzzy comparison with normalization, Levenshtein distance,
  and font category fallback (50+ fonts mapped)
- Evolution: diverging stance re-evaluation with reconverging detection,
  per-dimension tolerance, adaptive K clustering (elbow method + K-means++),
  configurable stability threshold
- AI layer: structural analysis and fingerprint validation wired into
  profileWithAnalysis(), .env.local loading in CLI, Tailwind config resolver
- CSS resolver: value-based fallback categorization for non-standard tokens

98 tests passing (up from 24). All changes are backward compatible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace shadcn-coupled designSystems[] config with a universal Target system
that can fingerprint any design system from any source (local path, URL, npm
package, GitHub repo). No per-framework adapters — one generic extraction
pipeline with AI as the universal fallback.

Key changes:
- Target-based architecture replacing DesignSystemConfig/ParentSource
- Zero-config mode: ghost profile . works without ghost.config.ts
- Universal extraction: file walker → format detector → token normalizer
- Source materializers for npm (npm pack), GitHub (clone), URLs (fetch)
- Format detection: CSS custom props, Tailwind, Style Dictionary, W3C tokens
- 5 stateful agent loops + Director orchestrator (extraction, fingerprint,
  comparison, discovery, compliance)
- Discovery agent with curated catalog + live npm/GitHub search
- Compliance agent with rule engine, drift checks, SARIF output for CI
- New CLI commands: ghost profile <target>, ghost comply, ghost discover
- New reporters for compliance (CLI/JSON/SARIF) and discovery (CLI/JSON)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Profile 12 public design systems via GitHub clone: Primer, Radix, Carbon,
PatternFly, Elastic, Fluent, Ant Design, Mantine, Paste, daisyUI, Zendesk
Garden, HashiCorp Helios. Fleet comparison test runs 3 systems head-to-head.

Also fixes SCSS parse errors in the fingerprint agent and normalizer —
PostCSS chokes on // comments in SCSS files, now caught gracefully.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The deterministic extraction pipeline (format-detector → normalizer → per-format
parsers) produced sparse fingerprints — only CSS custom properties were reliably
parsed. Systems storing tokens in TS/JS objects, SCSS variables, or custom JSON
formats got near-empty results (3-5 of 64 embedding dimensions).

New pipeline: walk files → smart sample → LLM interpret → validate → embed.

- New extractors/sampler.ts: priority-scored file selection for LLM context
  (theme/token files score 10, CSS with vars score 8, components score 3)
- ExtractionAgent simplified: materialize + walk + sample, no parsing
- FingerprintAgent rewritten: LLM-first with validation self-healing loop
- Enhanced prompt handles CSS vars, SCSS, Tailwind, JS themes, JSON tokens
- LLM providers accept SampledMaterial instead of ExtractedMaterial
- profileTarget() now requires ANTHROPIC_API_KEY or OPENAI_API_KEY
- Deleted: format-detector.ts, normalizer.ts, summarize.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace brittle string-guessing heuristic with explicit prefix syntax.
Ambiguous inputs (e.g. "primer/css", "antd") now throw with a helpful
suggestion to use a prefix like "github:primer/css" or "npm:antd".

Unambiguous patterns still work without prefixes:
  ./relative, /absolute, @scope/pkg, https://...

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add native iOS detection and analysis across the pipeline: file walker
recognizes Swift/xcassets/xcconfig types, sampler scores SwiftUI theming
patterns and reads Package.swift, LLM prompts include SwiftUI-specific
extraction guidance, embeddings map iOS methodologies to platform-neutral
dimensions, comparisons normalize methodologies cross-platform, and
descriptions use pt units for iOS targets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Flatten deterministic pipeline stages into plain async functions
(src/stages/), extract design signals deterministically before LLM
(src/signals/), add tool use to FingerprintAgent (search_files,
run_extractor, list_files), and publish SKILL.md files for the
agent ecosystem.

- ExtractionAgent, ComparisonAgent, ComplianceAgent → deprecated shims
  delegating to extract(), compare(), comply() stage functions
- Director uses stage functions + real agents (Fingerprint, Discovery)
- Signal extractors parse CSS, JSON tokens, Swift, components before LLM
- Signal-aware prompt: LLM validates/completes vs parsing from scratch
- LLM providers gain chat() method for multi-turn tool use
- 4 agent skills: ghost-fingerprint, ghost-drift-check, ghost-compare,
  ghost-discover (agentskills.io format)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CSS signal extractor now parses embedded CSS from shadcn registry
  JSON files (files[].content where type is registry:*)
- High-priority files (score >= 8) get 20K char budget vs 3K default,
  preventing truncation of theme/token files
- Registry style files scored at 10 (same as theme files)
- Validation allows 999/9999 border-radius as standard pill values

Discovered during live profiling of ghost-ui: styles-main.json was
truncated at 3K chars, breaking JSON parse and losing all 324 tokens.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Director creates a new FingerprintAgent per profile() call to avoid
  shared instance state when profiling in parallel (e.g., compare)
- FingerprintAgent uses interpret() with signals instead of chat()
  tool-use loop for now — the tool-use path needs more debugging for
  production reliability
- Add fallback to interpret() if tool-use loop exhausts iterations

Tested: ghost-ui vs shadcn/ui comparison works end-to-end.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three improvements to comparison accuracy:

1. Recompute oklch tuples from color value strings using deterministic
   math after LLM returns fingerprint. Don't ask the LLM to do color
   space conversion — it approximates. parseColorToOklch() is exact.

2. Compare dominant colors by semantic role name first (primary↔primary),
   fall back to positional for unmatched. Fixes false divergence when
   two systems have the same roles but in different array order.

3. Remove architecture from visual distance calculation (weight → 0).
   Methodology, tokenization, naming pattern are implementation detail,
   not visual language. Still computed and reported, just excluded from
   the overall distance metric. Weights redistributed to visual dims:
   palette=0.35, spacing=0.25, typography=0.25, surfaces=0.15.

ghost-ui vs shadcn/ui: 0.524 → 0.487 (more honest visual distance)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The signals/ module (extractSignals, DeterministicSignals, buildSignalAwarePrompt)
was a pre-parsing step that fed structured hints to the LLM. With the agentic
pipeline and smart sampler in place, the LLM handles raw file interpretation
better without intermediary heuristics. Also moves @anthropic-ai/sdk to root
devDependencies and adds tsx.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…loration

The old pipeline sampled 56 files, stuffed them into a prompt with format-
specific instructions (including an Apple system color lookup table), and
asked the LLM to parse everything in one shot. This caused hallucination:
goose2 (a direct ghost-ui consumer) was fingerprinted with Apple system
colors that don't exist in its source, producing 43% drift against its
own parent.

Now uses @anthropic-ai/claude-agent-sdk — the LLM gets Read/Glob/Grep
tools and explores the codebase itself. It finds token definition files,
reads the actual values, follows variable references. Result: ghost-ui
vs goose2 drops from 43% to 11% (honest signal, not hallucination).

Changes:
- Add fingerprint-agent.ts using Claude Agent SDK query()
- Remove platform detection (detectPlatform, Platform type, iOS prompt)
- Remove architecture from fingerprint (type, schema, embedding, compare)
- Strip format guide and Apple color table from LLM prompt
- Embedding reduced to 49 visual-only dimensions (was 64)
- Comparison uses 4 dimensions: palette, spacing, typography, surfaces
- profileTarget() now uses agent SDK directly, bypasses sampler pipeline

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ghost-ui now ships its own machine-readable identity as fingerprint.json —
consumers can compare against it directly without re-profiling the source.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Larger design systems need more exploration turns to produce complete fingerprints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
README: fix Quick Start with universal target examples, add missing CLI
commands (diff, comply, discover), add Ghost MCP section, update project
structure with agents/stages, fix Configuration to match GhostConfig type.

Skills: fix ghost-compare/fleet to use file paths not targets, remove
nonexistent ghost drift command and --limit flag, add --ai/--verbose flags.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces `ghost review` — sends a design fingerprint + source files to
Claude, which reviews code for visual language drift (palette, spacing,
typography, surfaces) and returns structured issues with fix suggestions.

The fingerprint IS the rule set. The LLM IS the reviewer. No hardcoded
rules — different project, different fingerprint, different review.

New:
- review/pipeline.ts — orchestrator: resolve fingerprint → collect files → LLM review
- review/prompts.ts — fingerprint-as-spec prompt for Claude
- review/file-collector.ts — git diff parsing for changed files + line numbers
- reporters/review-cli.ts, review-json.ts, github-pr.ts — output formatters
- ghost review CLI command (zero-config with .ghost-fingerprint.json)
- GitHub Action scaffold (action/)
- AI skill for Claude Code / Cursor (skills/ghost-review/)
- MCP review_files tool
- ghost-ui canonical fingerprint (.ghost-fingerprint.json)

Removed:
- test/live/profile-systems.test.ts (required API key, non-deterministic)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update @biomejs/biome to 2.4.12 (resolves platform binary issue on CI)
- Fix biome lint errors in review files (formatting, imports, non-null assertions)
- Remove --deep flag from CLI and action (review is always LLM-powered)
- Clean up action.yml inputs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Apply biome format fixes (import ordering, whitespace, line length)
- Replace non-null assertions with optional chaining or null coalescing
- Remove unused imports
- Add type guard filter for nullable array in embedding.ts
- 2 remaining warnings (noAssignInExpressions in url.ts — intentional pattern)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ceptions

- Update biome.json schema to match CLI version 2.4.12
- Extract reviewCommand to review-command.ts (bin.ts was over 500 line limit)
- Add narrowly scoped exceptions for types.ts (700) and bin.ts (520)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nahiyankhan nahiyankhan merged commit 34d7d89 into main Apr 15, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant