diff --git a/.claude/commands/prime.md b/.claude/commands/prime.md
index f2eecaaed2..b497f04fc8 100644
--- a/.claude/commands/prime.md
+++ b/.claude/commands/prime.md
@@ -10,9 +10,9 @@ git ls-files
 
 @README.md
 @pyproject.toml
-@docs/vision.md
-@docs/workflow.md
-@docs/architecture/repository.md
+@docs/concepts/vision.md
+@docs/workflows/overview.md
+@docs/reference/repository.md
 
 ## Read and Execute
 
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
index 233ff9de26..4bd0247407 100644
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -38,7 +38,7 @@ This file provides guidance to GitHub Copilot when working with code in this rep
 **Supported Libraries** (9 total):
 - matplotlib, seaborn, plotly, bokeh, altair, plotnine, pygal, highcharts, lets-plot
 
-**Core Principle**: Community proposes plot ideas via GitHub Issues → AI generates code → Multi-LLM quality checks → Deployed.
+**Core Principle**: Community proposes plot ideas via GitHub Issues → AI generates code → AI quality review → Deployed.
 
 ## Development Setup
 
@@ -135,8 +135,8 @@ Examples: `scatter-basic`, `scatter-color-mapped`, `bar-grouped-horizontal`, `he
 ### PR Labels (set by workflows)
 
 - **`approved`** - Human approved specification for merge
-- **`ai-approved`** - AI quality check passed (score >= 90)
-- **`ai-rejected`** - AI quality check failed (score < 90)
+- **`ai-approved`** - AI quality check passed (score >= 90, or >= 50 after 3 attempts)
+- **`ai-rejected`** - AI quality check failed (score < 90), triggers repair loop
 - **`quality:XX`** - Quality score (e.g., `quality:92`)
 
 **Specification Lifecycle:**
@@ -146,8 +146,8 @@ Examples: `scatter-basic`, `scatter-color-mapped`, `bar-grouped-horizontal`, `he
 
 **Implementation PR Lifecycle:**
 ```
-[open] → impl-review → ai-approved → impl-merge → impl:{library}:done
-                     → ai-rejected → impl-repair (×3)
+[open] → impl-review → ai-approved (≥90) → impl-merge → impl:{library}:done
+                     → ai-rejected (<90) → impl-repair (×3) → ai-approved (≥50) or failed (<50)
 ```
 
 ## Code Standards
@@ -205,8 +205,7 @@ plt.savefig('plot.png', dpi=300, bbox_inches='tight')
 
 ### Anti-Patterns to Avoid
 
-- No `preview.png` files in repository (use GCS)
-- No `quality_report.json` files (use GitHub Issues)
+- No `preview.png` files in repository (stored in GCS)
 - No hardcoded API keys (use environment variables)
 
 ## Tech Stack
diff --git a/.github/workflows/spec-create.yml b/.github/workflows/spec-create.yml
index ee6746d685..bfdd4aac51 100644
--- a/.github/workflows/spec-create.yml
+++ b/.github/workflows/spec-create.yml
@@ -117,7 +117,7 @@ jobs:
             6. **Create specification files:**
                - Read template: `prompts/templates/specification.md`
                - Read metadata template: `prompts/templates/specification.yaml`
-               - Read tagging guide: `docs/concepts/tagging-system.md`
+               - Read tagging guide: `docs/reference/tagging-system.md`
                - Create directory: `plots/{specification-id}/`
                - Create: `plots/{specification-id}/specification.md` (follow template structure)
                - Create: `plots/{specification-id}/specification.yaml` with:
@@ -213,7 +213,7 @@ jobs:
             6. **Create specification files:**
                - Read template: `prompts/templates/specification.md`
                - Read metadata template: `prompts/templates/specification.yaml`
-               - Read tagging guide: `docs/concepts/tagging-system.md`
+               - Read tagging guide: `docs/reference/tagging-system.md`
                - Create directory: `plots/{specification-id}/`
                - Create: `plots/{specification-id}/specification.md` (follow template structure)
                - Create: `plots/{specification-id}/specification.yaml` with:
diff --git a/CLAUDE.md b/CLAUDE.md
index aa6fd50966..5fc7559c08 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -109,7 +109,7 @@ done
 - **highcharts** - Interactive web charts, stock charts (requires license for commercial use)
 - **lets-plot** - ggplot2 grammar of graphics by JetBrains, interactive
 
-**Core Principle**: Community proposes plot ideas via GitHub Issues → AI generates code → Multi-LLM quality checks → Deployed.
+**Core Principle**: Community proposes plot ideas via GitHub Issues → AI generates code → AI quality review → Deployed.
 
 ## Essential Commands
 
@@ -261,7 +261,7 @@ Example: `plots/scatter-basic/` contains everything for the basic scatter plot.
 
 1. **Repository Pattern**: Data access layer in `core/repositories/`
 2. **Async Everything**: FastAPI + SQLAlchemy async + asyncpg
-3. **Clean Repo**: Only production code in git. Quality reports → GitHub Issues. Preview images → GCS.
+3. **Clean Repo**: Only production code in git. Quality reports → `metadata/{library}.yaml`. Preview images → GCS.
 4. **Issue-Based Workflow**: GitHub Issues as state machine for plot lifecycle
 
 ### Metadata System
@@ -399,8 +399,8 @@ gs://pyplots-images/
 - **Plotting**: matplotlib, seaborn, plotly, bokeh, altair, plotnine, pygal, highcharts, lets-plot
 - **Package Manager**: uv (fast Python installer)
 - **Infrastructure**: Google Cloud Run, Cloud SQL, Cloud Storage
-- **Automation**: GitHub Actions (code workflows) + n8n Cloud (external services)
-- **AI**: Claude (code generation), Vertex AI (multi-LLM quality checks)
+- **Automation**: GitHub Actions
+- **AI**: Claude (code generation + quality review)
 
 ## Code Standards
 
@@ -475,7 +475,6 @@ uv run python -c "from core.database import is_db_configured; print(is_db_config
 - Implementation code (full Python source)
 - Implementation metadata (library, variant, quality score, generation info from metadata/*.yaml)
 - GCS URLs for preview images
-- Social media promotion queue
 
 **What's in Repository** (source of truth):
 - Everything in `plots/{specification-id}/`:
@@ -486,7 +485,6 @@ uv run python -c "from core.database import is_db_configured; print(is_db_config
 
 **What's NOT Stored in DB**:
 - Preview images (in GCS)
-- Detailed quality reports (in GitHub Issues, summary in metadata)
 
 **Migrations**: Managed with Alembic
 ```bash
@@ -511,8 +509,7 @@ The `prompts/` directory contains AI agent prompts for code generation, quality
 | `plot-generator.md` | Base rules for all plot implementations |
 | `library/*.md` | Library-specific rules (9 files) |
 | `quality-criteria.md` | Definition of code/visual quality |
-| `quality-evaluator.md` | Multi-LLM evaluation prompt |
-| `auto-tagger.md` | Automatic tagging across 5 dimensions |
+| `quality-evaluator.md` | AI quality evaluation prompt |
 | `spec-validator.md` | Validates plot request issues |
 | `spec-id-generator.md` | Assigns unique spec IDs |
 
@@ -918,12 +915,12 @@ pytest --pdb       # Debug on failure
 
 ## Key Documentation Files
 
-- **docs/development.md**: Development setup, testing, deployment
-- **docs/workflow.md**: Automation flows (Discovery → Deployment → Social)
-- **docs/specs-guide.md**: How to write plot specifications
-- **docs/architecture/repository.md**: Directory structure
-- **docs/architecture/api.md**: API endpoints reference
-- **docs/architecture/database.md**: Database schema
+- **docs/contributing.md**: How to add/improve specs and implementations
+- **docs/workflows/overview.md**: Automation flows and label system
+- **docs/concepts/vision.md**: Product vision
+- **docs/reference/repository.md**: Directory structure
+- **docs/reference/api.md**: API endpoints reference
+- **docs/reference/database.md**: Database schema
 - **prompts/README.md**: AI agent prompt system
 
 ## Project Philosophy
@@ -932,5 +929,5 @@ pytest --pdb       # Debug on failure
 - **Spec improvements over code fixes**: If a plot has issues, improve the spec, not the code
 - **Your data first**: Examples work with real user data, not fake data
 - **Community-driven**: Anyone can propose plots via GitHub Issues
-- **Multi-LLM quality**: Claude + Gemini + GPT ensure quality (score ≥90 required)
-- **Full transparency**: All feedback documented in GitHub Issues, not hidden in repo files
+- **AI quality review**: Claude evaluates quality (≥90 instant merge, <90 repair loop, ≥50 minimum)
+- **Full transparency**: All quality feedback stored in repository (`metadata/{library}.yaml`)
diff --git a/README.md b/README.md
index 3b5d92a271..80515b6617 100644
--- a/README.md
+++ b/README.md
@@ -18,7 +18,7 @@
 maintains plotting examples. Browse hundreds of plots across all major Python libraries - matplotlib, seaborn, plotly,
 bokeh, altair, plotnine, pygal, highcharts, and lets-plot.
 
-**Community-driven, AI-maintained** - Propose plot ideas via GitHub Issues, AI generates the code, multi-LLM quality
+**Community-driven, AI-maintained** - Propose plot ideas via GitHub Issues, AI generates the code, automated quality
 checks ensure excellence. Zero manual coding required.
 
 ---
@@ -29,37 +29,11 @@ checks ensure excellence. Zero manual coding required.
 - **Compare libraries** - View matplotlib, seaborn, plotly side-by-side for the same plot
 - **Always current** - AI agents continuously update examples with latest library versions
 - **Natural language search** - Find plots by asking "show correlation between variables"
-- **Multi-LLM quality checks** - Claude + Gemini + GPT ensure every plot meets quality standards
+- **AI quality review** - Claude evaluates every plot against quality standards (score ≥ 50 required)
 - **Open source** - Community proposes ideas via Issues, AI generates the code
 
 ---
 
-## Quick Start
-
-```bash
-# Clone repository
-git clone https://github.com/MarkusNeusinger/pyplots.git
-cd pyplots
-
-# Install dependencies with uv (fast!)
-curl -LsSf https://astral.sh/uv/install.sh | sh
-uv sync --all-extras
-
-# Database setup (optional - API works without DB in limited mode)
-cp .env.example .env
-# Edit .env with your DATABASE_URL
-
-# Run migrations
-uv run alembic upgrade head
-
-# Start backend
-uv run uvicorn api.main:app --reload
-
-# Visit http://localhost:8000/docs
-```
-
----
-
 ## Architecture
 
 **Specification-first design**: Every plot starts as a Markdown spec (library-agnostic), then AI generates
@@ -81,9 +55,9 @@ plots/scatter-basic/
 
 **Issue-based workflow**: GitHub Issues as state machine for plot lifecycle. Status tracked via live-updating table (no sub-issues). Each library generates in parallel, creating PRs to a feature branch.
 
-**AI quality review**: Claude evaluates generated plots (score ≥ 90 required). Automatic feedback loops (max 3 attempts per library). Quality scores flow via PR labels → per-library metadata files.
+**AI quality review**: Claude evaluates generated plots. Score ≥ 90 → immediate merge. Score < 90 → repair loop (max 3 attempts). After 3 attempts: ≥ 50 → merge, < 50 → failed.
 
-See [docs/architecture/](docs/architecture/) for details.
+See [docs/reference/](docs/reference/) for details.
 
 ---
 
@@ -97,9 +71,9 @@ See [docs/architecture/](docs/architecture/) for details.
 
 **Infrastructure**: Google Cloud Run • Cloud SQL • Cloud Storage
 
-**Automation**: GitHub Actions • n8n Cloud Pro
+**Automation**: GitHub Actions
 
-**AI**: Claude (Code Max) • Vertex AI (Multi-LLM)
+**AI**: Claude (code generation + quality review)
 
 ---
 
@@ -115,31 +89,28 @@ Most plotting libraries are fully open source. Note these exceptions:
 
 ```
 pyplots/
-├── plots/              # Plot-centric directories (spec + metadata + implementations)
-│   └── {spec-id}/
-│       ├── specification.md
-│       ├── specification.yaml
-│       ├── metadata/
-│       └── implementations/
+├── plots/              # Plot specs + metadata + implementations
 ├── prompts/            # AI agent prompts
-├── core/               # Shared business logic
 ├── api/                # FastAPI backend
-├── app/                # React frontend (Vite + MUI)
-├── tests/              # Test suite (pytest)
-└── docs/               # Documentation
+├── app/                # React frontend
+├── core/               # Shared business logic
+├── automation/         # Workflow scripts (sync, labels)
+├── tests/              # Test suite (unit, integration, e2e)
+├── alembic/            # Database migrations
+├── docs/               # Documentation
+└── .github/workflows/  # GitHub Actions
 ```
 
-**For detailed structure and file organization**, see [Repository Structure](docs/architecture/repository.md)
+**For details**, see [Repository Structure](docs/reference/repository.md)
 
 ---
 
 ## Documentation
 
-- **[Vision](docs/vision.md)** - Product vision and mission
-- **[Workflow](docs/workflow.md)** - Automation flows (Discovery → Deployment → Social Media)
-- **[Development](docs/development.md)** - Local setup, testing, deployment
-- **[Specs Guide](docs/specs-guide.md)** - How to write plot specifications
-- **[Architecture](docs/architecture/)** - API, database, repository structure
+- **[Vision](docs/concepts/vision.md)** - Product vision and mission
+- **[Contributing](docs/contributing.md)** - How to add/improve specs and implementations
+- **[Workflows](docs/workflows/overview.md)** - Automation flows and label system
+- **[Reference](docs/reference/)** - API, database, repository structure
 
 ---
 
@@ -160,32 +131,19 @@ We welcome contributions! **All code is AI-generated** - you propose ideas, AI i
 2. AI generates spec, creates feature branch
 3. Maintainer reviews and adds `approved` label
 4. 9 library implementations generate in parallel (tracked via live status table)
-5. AI quality review per library (score ≥ 90 required)
+5. AI quality review per library (≥ 90 instant, < 90 repair loop, ≥ 50 final threshold)
 6. Auto-merge to feature branch, then to main
 
 **Important**: Don't submit code directly! If a plot has quality issues, it means the spec needs improvement, not the
 code.
 
-See [development.md](docs/development.md) for details.
+See [contributing.md](docs/contributing.md) for details.
 
 ---
 
 ## Development
 
-```bash
-# Install dependencies (uv is a fast Python package installer)
-uv sync --all-extras
-
-# Run tests
-uv run pytest
-
-# Start backend
-uv run uvicorn api.main:app --reload
-```
-
-**For detailed development setup, testing, and code quality tools**, see [Development Guide](docs/development.md)
-
-**Python versions**: 3.10+ | **Coverage target**: 90%+
+See **[Development Guide](docs/development.md)** for local setup instructions.
 
 ---
 
diff --git a/docs/architecture/api.md b/docs/architecture/api.md
deleted file mode 100644
index 7967af4ccd..0000000000
--- a/docs/architecture/api.md
+++ /dev/null
@@ -1,786 +0,0 @@
-# 🔌 API Specification
-
-## Overview
-
-The pyplots API is a **FastAPI-based REST API** that serves as the central data access layer for all components: frontend, n8n workflows, and GitHub Actions.
-
-**Base URL**: `https://api.pyplots.ai`
-
-**Key Principle**: All database access goes through the API - no direct database connections from frontend or automation tools.
-
----
-
-## Authentication
-
-### Public Endpoints
-
-No authentication required:
-- Browse plots
-- View specs
-- Search
-
-### Authenticated Endpoints
-
-API key required (header):
-```http
-Authorization: Bearer {api_key}
-```
-
-Used for:
-- User data upload
-- Plot generation with custom data
-- Internal automation endpoints
-
----
-
-## Core Endpoints
-
-### 1. Specs
-
-#### GET `/specs`
-
-**Purpose**: List all plot specifications
-
-**Query Parameters**:
-- `tags` (optional): Comma-separated tags to filter by
-- `search` (optional): Search in title and description
-- `limit` (optional): Number of results (default: 50, max: 100)
-- `offset` (optional): Pagination offset
-
-**Response**:
-```json
-{
-  "specs": [
-    {
-      "id": "scatter-basic-001",
-      "title": "Basic 2D Scatter Plot",
-      "description": "Create a simple scatter plot...",
-      "tags": ["correlation", "bivariate", "basic"],
-      "implementation_count": 3,
-      "best_quality_score": 92.0,
-      "created_at": "2025-01-15T10:00:00Z"
-    }
-  ],
-  "total": 42,
-  "limit": 50,
-  "offset": 0
-}
-```
-
-**Example**:
-```bash
-GET /specs?tags=correlation,finance&limit=10
-```
-
----
-
-#### GET `/specs/{spec_id}`
-
-**Purpose**: Get detailed information about a specific spec
-
-**Response**:
-```json
-{
-  "id": "scatter-basic-001",
-  "title": "Basic 2D Scatter Plot",
-  "description": "Create a simple scatter plot...",
-  "data_requirements": [
-    {
-      "name": "x",
-      "type": "numeric",
-      "description": "X-axis values"
-    },
-    {
-      "name": "y",
-      "type": "numeric",
-      "description": "Y-axis values"
-    }
-  ],
-  "optional_params": [
-    {
-      "name": "color",
-      "type": "string|column",
-      "default": null,
-      "description": "Point color or column for mapping"
-    },
-    {
-      "name": "alpha",
-      "type": "float",
-      "default": 0.8,
-      "description": "Transparency (0-1)"
-    }
-  ],
-  "tags": ["correlation", "bivariate", "basic"],
-  "use_cases": [
-    "Correlation analysis",
-    "Outlier detection"
-  ],
-  "implementations": [
-    {
-      "library": "matplotlib",
-      "variant": "default",
-      "quality_score": 92.0,
-      "preview_url": "https://storage.googleapis.com/...",
-      "python_version": "3.10+"
-    },
-    {
-      "library": "seaborn",
-      "variant": "default",
-      "quality_score": 90.0,
-      "preview_url": "https://storage.googleapis.com/...",
-      "python_version": "3.10+"
-    }
-  ],
-  "created_at": "2025-01-15T10:00:00Z",
-  "updated_at": "2025-01-16T14:30:00Z"
-}
-```
-
----
-
-#### GET `/specs/{spec_id}/markdown`
-
-**Purpose**: Get the original spec as Markdown
-
-**Response**:
-```markdown
-# scatter-basic-001: Basic 2D Scatter Plot
-
-## Description
-
-Create a simple scatter plot showing the relationship...
-
-## Data Requirements
-
-- **x**: Numeric values for x-axis
-- **y**: Numeric values for y-axis
-
-...
-```
-
-**Content-Type**: `text/markdown`
-
----
-
-### 2. Implementations
-
-#### GET `/specs/{spec_id}/implementations`
-
-**Purpose**: Get all implementations for a spec
-
-**Query Parameters**:
-- `library` (optional): Filter by library (matplotlib, seaborn, etc.)
-- `variant` (optional): Filter by variant (default, ggplot_style, etc.)
-
-**Response**:
-```json
-{
-  "spec_id": "scatter-basic-001",
-  "implementations": [
-    {
-      "id": "550e8400-e29b-41d4-a716-446655440000",
-      "library": "matplotlib",
-      "library_name": "Matplotlib",
-      "plot_function": "scatter",
-      "variant": "default",
-      "quality_score": 92.0,
-      "preview_url": "https://storage.googleapis.com/...",
-      "python_version": "3.10+",
-      "tested": true,
-      "created_at": "2025-01-15T10:30:00Z"
-    }
-  ]
-}
-```
-
----
-
-#### GET `/specs/{spec_id}/implementations/{library}/{variant}/code`
-
-**Purpose**: Get the implementation code
-
-**Response**:
-```python
-import matplotlib.pyplot as plt
-import pandas as pd
-
-
-def create_plot(data: pd.DataFrame, x: str, y: str, **kwargs):
-    """
-    Implementation for scatter-basic-001 using matplotlib
-
-    Args:
-        data: Input DataFrame
-        x: Column name for x-axis
-        y: Column name for y-axis
-        **kwargs: Additional parameters (color, size, alpha, etc.)
-
-    Returns:
-        matplotlib Figure object
-    """
-    fig, ax = plt.subplots(figsize=(10, 6))
-
-    ax.scatter(data[x], data[y], **kwargs)
-    ax.set_xlabel(x)
-    ax.set_ylabel(y)
-    ax.grid(True, alpha=0.3)
-
-    return fig
-```
-
-**Content-Type**: `text/x-python`
-
----
-
-### 3. Plot Generation
-
-#### POST `/plots/generate`
-
-**Purpose**: Generate plot with user's data
-
-**Authentication**: Required (API key)
-
-**Request**:
-```json
-{
-  "spec_id": "scatter-basic-001",
-  "library": "matplotlib",
-  "variant": "default",
-  "data": {
-    "x": [1, 2, 3, 4, 5],
-    "y": [2, 4, 6, 8, 10]
-  },
-  "params": {
-    "color": "blue",
-    "alpha": 0.8,
-    "title": "My Scatter Plot"
-  }
-}
-```
-
-**Alternative (CSV upload)**:
-```http
-POST /plots/generate
-Content-Type: multipart/form-data
-
-spec_id=scatter-basic-001
-library=matplotlib
-variant=default
-x=column1
-y=column2
-file={csv_file}
-```
-
-**Response**:
-```json
-{
-  "image_url": "https://storage.googleapis.com/pyplots-images/generated/{session_id}/{plot_id}.png",
-  "code": "import matplotlib.pyplot as plt\nimport pandas as pd\n\n...",
-  "expires_at": "2025-01-19T10:00:00Z"
-}
-```
-
-**Notes**:
-- Image auto-deleted after 24 hours
-- No user data stored permanently
-- Maximum data size: 10 MB
-
----
-
-### 4. Libraries
-
-#### GET `/libraries`
-
-**Purpose**: List all supported plotting libraries
-
-**Response**:
-```json
-{
-  "libraries": [
-    {
-      "id": "matplotlib",
-      "name": "Matplotlib",
-      "version": "3.8.0",
-      "documentation_url": "https://matplotlib.org",
-      "implementation_count": 42,
-      "active": true
-    },
-    {
-      "id": "seaborn",
-      "name": "Seaborn",
-      "version": "0.13.0",
-      "documentation_url": "https://seaborn.pydata.org",
-      "implementation_count": 38,
-      "active": true
-    }
-  ]
-}
-```
-
----
-
-### 5. Search & Discovery
-
-#### GET `/search`
-
-**Purpose**: Full-text search across specs
-
-**Query Parameters**:
-- `q`: Search query
-- `tags` (optional): Filter by tags
-- `libraries` (optional): Filter by available libraries
-- `limit`: Results limit (default: 20)
-
-**Response**:
-```json
-{
-  "results": [
-    {
-      "spec_id": "scatter-basic-001",
-      "title": "Basic 2D Scatter Plot",
-      "description": "Create a simple scatter plot...",
-      "relevance_score": 0.95,
-      "tags": ["correlation", "bivariate"],
-      "preview_url": "https://storage.googleapis.com/..."
-    }
-  ],
-  "total": 5,
-  "query": "correlation analysis"
-}
-```
-
----
-
-#### GET `/tags`
-
-**Purpose**: Get all available tags with counts
-
-**Response**:
-```json
-{
-  "tags": [
-    {
-      "tag": "correlation",
-      "count": 15,
-      "confidence": 1.0
-    },
-    {
-      "tag": "finance",
-      "count": 8,
-      "confidence": 0.95
-    }
-  ]
-}
-```
-
----
-
-#### GET `/similar/{spec_id}`
-
-**Purpose**: Find similar plots (based on tags and description)
-
-**Response**:
-```json
-{
-  "spec_id": "scatter-basic-001",
-  "similar": [
-    {
-      "spec_id": "scatter-advanced-005",
-      "similarity_score": 0.85,
-      "title": "Advanced Scatter Plot with Regression",
-      "preview_url": "https://storage.googleapis.com/..."
-    }
-  ]
-}
-```
-
----
-
-## Internal/Automation Endpoints
-
-### 6. Deployment Management
-
-#### POST `/internal/sync-from-repo`
-
-**Purpose**: Sync metadata from repository to database
-
-**Authentication**: Service account only
-
-**Request**:
-```json
-{
-  "trigger": "deployment"
-}
-```
-
-**Response**:
-```json
-{
-  "synced": {
-    "specs": 5,
-    "implementations": 15
-  },
-  "errors": []
-}
-```
-
-**Usage**: Called by GitHub Actions after deployment
-
----
-
-#### POST `/internal/specs/{spec_id}/deployed`
-
-**Purpose**: Mark spec as deployed and add to promotion queue
-
-**Authentication**: Service account only
-
-**Request**:
-```json
-{
-  "quality_score": 92.0,
-  "preview_url": "https://storage.googleapis.com/..."
-}
-```
-
-**Response**:
-```json
-{
-  "status": "deployed",
-  "added_to_promotion_queue": true
-}
-```
-
----
-
-### 7. Promotion Queue
-
-#### GET `/internal/promotion-queue`
-
-**Purpose**: Get next item from promotion queue
-
-**Authentication**: Service account (n8n)
-
-**Query Parameters**:
-- `limit`: Number of items (default: 1)
-- `platform`: Filter by platform (twitter, linkedin, etc.)
-
-**Response**:
-```json
-{
-  "items": [
-    {
-      "id": "660e8400-e29b-41d4-a716-446655440000",
-      "spec_id": "scatter-basic-001",
-      "title": "Basic 2D Scatter Plot",
-      "quality_score": 92.0,
-      "preview_url": "https://storage.googleapis.com/...",
-      "platform": "twitter",
-      "priority": "high"
-    }
-  ],
-  "daily_count": 1,
-  "limit_reached": false
-}
-```
-
----
-
-#### POST `/internal/promotion-queue/{id}/mark-posted`
-
-**Purpose**: Mark promotion as posted
-
-**Authentication**: Service account (n8n)
-
-**Request**:
-```json
-{
-  "platform": "twitter",
-  "post_url": "https://twitter.com/pyplots/status/123456789"
-}
-```
-
-**Response**:
-```json
-{
-  "status": "posted",
-  "posted_at": "2025-01-18T15:00:00Z"
-}
-```
-
----
-
-## Error Responses
-
-### Standard Error Format
-
-```json
-{
-  "error": {
-    "code": "VALIDATION_ERROR",
-    "message": "Invalid spec_id format",
-    "details": {
-      "field": "spec_id",
-      "expected": "Format: {type}-{variant}-{number}"
-    }
-  }
-}
-```
-
-### Error Codes
-
-| Code | HTTP Status | Description |
-|------|-------------|-------------|
-| `VALIDATION_ERROR` | 400 | Invalid request parameters |
-| `NOT_FOUND` | 404 | Resource not found |
-| `UNAUTHORIZED` | 401 | Missing or invalid API key |
-| `RATE_LIMIT_EXCEEDED` | 429 | Too many requests |
-| `SERVER_ERROR` | 500 | Internal server error |
-| `DATA_TOO_LARGE` | 413 | Uploaded data exceeds 10 MB |
-| `GENERATION_FAILED` | 500 | Plot generation failed |
-
----
-
-## Rate Limiting
-
-### Public Endpoints
-
-- 100 requests per minute per IP
-- 1000 requests per hour per IP
-
-### Authenticated Endpoints
-
-- 1000 requests per minute per API key
-- 10000 requests per hour per API key
-
-### Headers
-
-Response includes rate limit headers:
-```http
-X-RateLimit-Limit: 100
-X-RateLimit-Remaining: 95
-X-RateLimit-Reset: 1705680000
-```
-
----
-
-## CORS Configuration
-
-### Allowed Origins
-
-```python
-# Development
-CORS_ORIGINS = [
-    "http://localhost:3000",
-    "http://127.0.0.1:3000"
-]
-
-# Production
-CORS_ORIGINS = [
-    "https://pyplots.ai",
-    "https://www.pyplots.ai"
-]
-```
-
-### Allowed Methods
-
-```
-GET, POST, OPTIONS
-```
-
----
-
-## Caching
-
-### Response Caching
-
-Public endpoints cached with appropriate headers:
-
-```http
-Cache-Control: public, max-age=3600
-ETag: "abc123"
-```
-
-### Cache Invalidation
-
-- Specs: Invalidate on deployment
-- Implementations: Invalidate on update
-- Libraries: Invalidate on version change
-
----
-
-## Request/Response Examples
-
-### Browse Plots with Filtering
-
-```bash
-curl "https://api.pyplots.ai/specs?tags=correlation,finance&limit=5"
-```
-
-### Get Specific Spec
-
-```bash
-curl "https://api.pyplots.ai/specs/scatter-basic-001"
-```
-
-### Get Implementation Code
-
-```bash
-curl "https://api.pyplots.ai/specs/scatter-basic-001/implementations/matplotlib/default/code"
-```
-
-### Generate Plot with User Data
-
-```bash
-curl -X POST "https://api.pyplots.ai/plots/generate" \
-  -H "Authorization: Bearer YOUR_API_KEY" \
-  -H "Content-Type: application/json" \
-  -d '{
-    "spec_id": "scatter-basic-001",
-    "library": "matplotlib",
-    "data": {
-      "x": [1, 2, 3, 4, 5],
-      "y": [2, 4, 6, 8, 10]
-    },
-    "params": {
-      "color": "blue",
-      "title": "My Data"
-    }
-  }'
-```
-
----
-
-## Client SDKs
-
-### Python Client
-
-```python
-from pyplots import Client
-
-client = Client(api_key="YOUR_API_KEY")
-
-# Browse specs
-specs = client.specs.list(tags=["correlation"])
-
-# Get spec details
-spec = client.specs.get("scatter-basic-001")
-
-# Generate plot
-plot = client.plots.generate(
-    spec_id="scatter-basic-001",
-    library="matplotlib",
-    data={"x": [1, 2, 3], "y": [2, 4, 6]}
-)
-
-# Download image
-plot.save("output.png")
-
-# Get code
-print(plot.code)
-```
-
-### JavaScript Client
-
-```javascript
-import { PyplotsClient } from '@pyplots/client';
-
-const client = new PyplotsClient({ apiKey: 'YOUR_API_KEY' });
-
-// Browse specs
-const specs = await client.specs.list({ tags: ['correlation'] });
-
-// Get spec details
-const spec = await client.specs.get('scatter-basic-001');
-
-// Generate plot
-const plot = await client.plots.generate({
-  specId: 'scatter-basic-001',
-  library: 'matplotlib',
-  data: { x: [1, 2, 3], y: [2, 4, 6] }
-});
-
-// Get image URL
-console.log(plot.imageUrl);
-```
-
----
-
-## API Versioning
-
-### Current Version
-
-API version: `v1` (implicit in URLs)
-
-### Future Versioning
-
-When breaking changes needed:
-- `/v2/specs` (new version)
-- `/specs` (alias to latest)
-- Old versions deprecated with 6-month notice
-
----
-
-## Health & Status
-
-### GET `/health`
-
-**Purpose**: Health check
-
-**Response**:
-```json
-{
-  "status": "healthy",
-  "version": "1.0.0",
-  "database": "connected",
-  "storage": "accessible"
-}
-```
-
-### GET `/status`
-
-**Purpose**: System status
-
-**Response**:
-```json
-{
-  "api": "operational",
-  "database": "operational",
-  "storage": "operational",
-  "stats": {
-    "total_specs": 42,
-    "total_implementations": 126,
-    "active_libraries": 3
-  }
-}
-```
-
----
-
-## Security
-
-### Input Validation
-
-- All inputs validated with Pydantic
-- SQL injection prevention (SQLAlchemy ORM)
-- File upload size limits (10 MB)
-- Allowed file types: CSV, Excel, JSON
-
-### Sandboxed Execution
-
-Plot generation runs in sandboxed environment:
-- Import whitelist (pandas, numpy, matplotlib, etc.)
-- Time limit: 30 seconds
-- Memory limit: 512 MB
-- No file system access
-
-### Data Privacy
-
-- User data never stored permanently
-- Generated plots deleted after 24 hours
-- No tracking of data content
-- Anonymous session IDs only
-
----
-
-*For database schema, see [database.md](./database.md)*
-*For automation workflows, see `.github/workflows/`*
diff --git a/docs/concepts/ab-testing-rules.md b/docs/concepts/ab-testing-rules.md
deleted file mode 100644
index bd867ff901..0000000000
--- a/docs/concepts/ab-testing-rules.md
+++ /dev/null
@@ -1,798 +0,0 @@
-# 🧪 A/B Testing Rules: Comparison Strategies
-
-## Overview
-
-When you create a new version of generation or evaluation rules, you need to **scientifically prove** it's better than the current version before deploying it. This document explores different strategies for A/B testing rule versions.
-
-## The Core Challenge
-
-**Problem**: You have two rule versions and need to answer:
-- Is the new version objectively better?
-- For which metrics? (quality score, generation time, pass rate)
-- By how much? (statistical significance)
-- For all plot types or just some?
-
-**Requirements**:
-- Compare same specs with both rule versions
-- Objective metrics (not subjective)
-- Statistical validity (enough samples)
-- Visual comparison (side-by-side images)
-- Cost-conscious (minimize AI API calls)
-
----
-
-## Approach 1: Parallel Generation
-
-### Concept
-
-Generate plots **simultaneously** with both rule versions and compare results.
-
-```
-Spec: scatter-basic-001
-    │
-    ├─→ Generate with v1.0.0 → plot_v1.png + metrics_v1
-    │
-    └─→ Generate with v2.0.0 → plot_v2.png + metrics_v2
-
-Compare: metrics_v1 vs metrics_v2
-```
-
-### Workflow
-
-```mermaid
-graph LR
-    A[Test Specs] --> B[Generate with v1.0.0]
-    A --> C[Generate with v2.0.0]
-    B --> D[Collect Metrics]
-    C --> D
-    D --> E[Statistical Comparison]
-    E --> F[Report with Visuals]
-```
-
-### Implementation
-
-```bash
-# Command-line tool
-python automation/testing/ab_parallel.py \
-  --baseline v1.0.0 \
-  --candidate v2.0.0 \
-  --specs scatter-basic-001,heatmap-corr-002,bar-grouped-004 \
-  --runs 10 \
-  --output comparison-report.html
-```
-
-### Pros
-
-✅ **Fair comparison**: Both versions tested under identical conditions
-✅ **No bias**: Same timestamp, same LLM state, same randomness
-✅ **Fast results**: Get answers quickly
-✅ **Easy to automate**: Can run in CI/CD
-
-### Cons
-
-❌ **Expensive**: Doubles AI API costs (generate everything twice)
-❌ **Requires both implementations**: Need v1 and v2 automation code
-❌ **Resource intensive**: Doubles compute and storage
-
-### Best For
-
-- Final validation before deploying new rules
-- Small test sets (5-10 specs)
-- Critical decisions (major version bumps)
-- When budget allows doubling AI costs
-
----
-
-## Approach 2: Historical Comparison
-
-### Concept
-
-Generate plots with **new** version only, compare metrics against **historical** results from the old version.
-
-```
-Spec: scatter-basic-001
-    │
-    └─→ Generate with v2.0.0 → plot_v2.png + metrics_v2
-
-Database: metrics_v1 (from past generations with v1.0.0)
-
-Compare: metrics_v2 vs historical metrics_v1
-```
-
-### Workflow
-
-```mermaid
-graph LR
-    A[Test Specs] --> B[Generate with v2.0.0]
-    B --> C[Collect Metrics]
-    D[Database: v1.0.0 Results] --> E[Historical Metrics]
-    C --> F[Compare vs History]
-    E --> F
-    F --> G[Report]
-```
-
-### Implementation
-
-```bash
-# Generate with new version
-python automation/testing/ab_historical.py \
-  --candidate v2.0.0 \
-  --baseline-from-db v1.0.0 \
-  --specs scatter-basic-001,heatmap-corr-002 \
-  --output comparison-report.html
-```
-
-### Database Query
-
-```sql
--- Get historical metrics for v1.0.0
-SELECT
-  spec_id,
-  AVG(quality_score) as avg_score,
-  AVG(generation_time_seconds) as avg_time,
-  COUNT(*) as sample_size
-FROM implementations
-WHERE generation_ruleset_version = 'v1.0.0'
-  AND spec_id IN ('scatter-basic-001', 'heatmap-corr-002')
-GROUP BY spec_id;
-```
-
-### Pros
-
-✅ **Cost-effective**: Only generate with new version
-✅ **Fast**: No need to regenerate with old version
-✅ **Scalable**: Can compare against large historical dataset
-✅ **Continuous**: Always comparing against production baseline
-
-### Cons
-
-❌ **Timing bias**: Old results from different time (different LLM version?)
-❌ **Context drift**: Libraries may have updated between v1 and v2
-❌ **Sample variance**: Historical data may be noisy
-❌ **No visual comparison**: Can't show side-by-side images (old images may not exist)
-
-### Best For
-
-- Quick preliminary checks
-- Large-scale comparisons (100+ specs)
-- Continuous monitoring
-- When budget is tight
-- Minor version updates (low risk)
-
----
-
-## Approach 3: Staged Rollout
-
-### Concept
-
-Deploy new version to a **small percentage** of plots first, monitor performance, gradually increase.
-
-```
-Day 1: 10% of new plots use v2.0.0, 90% use v1.0.0
-       Monitor metrics for 24 hours
-
-Day 2: If good → 25% use v2.0.0
-       If bad → Rollback to 0%
-
-Day 3: 50% → 75% → 100%
-```
-
-### Workflow
-
-```mermaid
-graph TD
-    A[Deploy v2.0.0] --> B[10% Traffic]
-    B --> C{Metrics OK?}
-    C -->|Yes| D[Increase to 25%]
-    C -->|No| E[Rollback to 0%]
-    D --> F{Metrics OK?}
-    F -->|Yes| G[Increase to 50%]
-    F -->|No| E
-    G --> H[Eventually 100%]
-```
-
-### Implementation
-
-```python
-# automation/rollout/canary.py
-class CanaryRollout:
-    def select_rule_version(self, spec_id: str) -> str:
-        """
-        Returns which rule version to use for this generation
-
-        Uses consistent hashing to ensure:
-        - Same spec always gets same version (during rollout)
-        - Percentage split is accurate
-        """
-        rollout_percentage = get_current_rollout_percentage()
-
-        # Hash spec_id to get consistent assignment
-        hash_value = int(hashlib.md5(spec_id.encode()).hexdigest(), 16)
-        bucket = hash_value % 100
-
-        if bucket < rollout_percentage:
-            return "v2.0.0"  # New version
-        else:
-            return "v1.0.0"  # Current stable version
-```
-
-### Monitoring
-
-```bash
-# Real-time monitoring dashboard
-python automation/rollout/monitor.py \
-  --new-version v2.0.0 \
-  --baseline v1.0.0 \
-  --metrics quality_score,generation_time,pass_rate \
-  --window 24h \
-  --auto-rollback-threshold -5%
-```
-
-### Pros
-
-✅ **Safe**: Limits blast radius if new version has issues
-✅ **Real production data**: Testing with actual usage patterns
-✅ **Gradual**: Can abort anytime
-✅ **Continuous feedback**: Real-time metrics
-✅ **Cost-effective**: Not duplicating work
-
-### Cons
-
-❌ **Slow**: Takes days to fully roll out
-❌ **Requires monitoring**: Someone needs to watch metrics
-❌ **Mixed state**: System has two versions running simultaneously
-❌ **Rollback complexity**: Need to invalidate cached results
-
-### Best For
-
-- Major version changes (high risk)
-- Production deployments
-- When you have time (not urgent)
-- When you have monitoring infrastructure
-- Large-scale systems with many users
-
----
-
-## Approach 4: Hybrid (Recommended)
-
-### Concept
-
-Combine multiple approaches for **balance of speed, cost, and confidence**.
-
-```
-Phase 1: Historical Comparison (Quick & Cheap)
-         ↓ If promising
-Phase 2: Parallel Generation on Small Set (5 specs)
-         ↓ If good
-Phase 3: Staged Rollout (10% → 50% → 100%)
-```
-
-### Workflow
-
-```mermaid
-graph TD
-    A[New Rule v2.0.0] --> B[Phase 1: Historical]
-    B --> C{Promising?}
-    C -->|No| Z[Reject v2.0.0]
-    C -->|Yes| D[Phase 2: Parallel 5 specs]
-    D --> E{Quality OK?}
-    E -->|No| Z
-    E -->|Yes| F[Phase 3: Canary 10%]
-    F --> G{Monitor 24h}
-    G -->|Issues| H[Rollback]
-    G -->|Good| I[Increase to 50%]
-    I --> J[Monitor 24h]
-    J -->|Good| K[Deploy 100%]
-```
-
-### Decision Tree
-
-```
-┌─────────────────────────────────────────────────┐
-│ Phase 1: Historical Comparison                  │
-│ Cost: Low | Speed: Fast | Confidence: Medium    │
-│                                                  │
-│ Question: Is new version likely better?         │
-│ Criteria: avg(score_v2) > avg(score_v1) + 2%   │
-└─────────────────────────────────────────────────┘
-              │
-              ├─ NO: STOP (reject v2.0.0)
-              │
-              └─ YES: Continue
-                     ↓
-┌─────────────────────────────────────────────────┐
-│ Phase 2: Parallel on Small Set                  │
-│ Cost: Medium | Speed: Medium | Confidence: High │
-│                                                  │
-│ Question: Is quality consistently better?       │
-│ Criteria: - No regressions on critical metrics  │
-│           - Visual quality equal or better      │
-│           - Statistical significance (p < 0.05) │
-└─────────────────────────────────────────────────┘
-              │
-              ├─ NO: STOP (need more refinement)
-              │
-              └─ YES: Deploy with canary
-                     ↓
-┌─────────────────────────────────────────────────┐
-│ Phase 3: Staged Rollout                         │
-│ Cost: Low | Speed: Slow | Confidence: Very High │
-│                                                  │
-│ 10% → 24h monitor → 50% → 24h → 100%           │
-│                                                  │
-│ Auto-rollback if: - Quality drops > 5%          │
-│                   - Failure rate > 10%          │
-│                   - Generation time > 2x        │
-└─────────────────────────────────────────────────┘
-```
-
-### Implementation
-
-```bash
-# Automated multi-phase testing
-python automation/testing/ab_hybrid.py \
-  --baseline v1.0.0 \
-  --candidate v2.0.0 \
-  --test-specs standard_test_set.txt \
-  --auto-progress \
-  --output hybrid-test-report.html
-
-# Output:
-# ✓ Phase 1 (Historical): +3.2% quality improvement → PASS
-# ✓ Phase 2 (Parallel):   4/5 specs improved → PASS
-# → Triggering Phase 3 (Canary 10%)
-# → Will auto-increase after 24h if metrics stable
-```
-
-### Pros
-
-✅ **Balanced cost**: Expensive tests only if cheap tests pass
-✅ **Fast feedback**: Know quickly if worth pursuing
-✅ **High confidence**: Multiple validation layers
-✅ **Safe**: Gradual rollout limits risk
-✅ **Efficient**: Don't waste resources on bad versions
-
-### Cons
-
-❌ **Complex**: More moving parts
-❌ **Longer total time**: Three phases take longer than one
-❌ **Requires automation**: Manual process would be tedious
-
-### Best For
-
-- **Most scenarios** (recommended default)
-- Production systems
-- When you want confidence without excessive cost
-- Continuous improvement workflow
-
----
-
-## Metrics to Compare
-
-### 1. Quality Score
-```python
-{
-  "metric": "quality_score",
-  "v1_mean": 87.3,
-  "v2_mean": 91.2,
-  "improvement": "+3.9%",
-  "p_value": 0.003,  # Statistically significant
-  "verdict": "BETTER"
-}
-```
-
-### 2. Pass Rate
-```python
-{
-  "metric": "pass_rate",
-  "v1": 0.87,  # 87% passed
-  "v2": 0.94,  # 94% passed
-  "improvement": "+7%",
-  "verdict": "BETTER"
-}
-```
-
-### 3. Generation Time
-```python
-{
-  "metric": "generation_time_seconds",
-  "v1_p50": 12.3,
-  "v2_p50": 15.1,
-  "change": "+22.8%",
-  "verdict": "WORSE (slower)"
-}
-```
-
-### 4. Attempt Distribution
-```python
-{
-  "metric": "attempts_to_pass",
-  "v1": {"1": 0.60, "2": 0.27, "3": 0.13},  # 60% pass on first try
-  "v2": {"1": 0.75, "2": 0.20, "3": 0.05},  # 75% pass on first try
-  "verdict": "BETTER (fewer retries)"
-}
-```
-
-### 5. LLM Agreement (Multi-LLM only)
-```python
-{
-  "metric": "llm_agreement",
-  "v1": 0.78,  # 78% agreement between Claude/Gemini/GPT
-  "v2": 0.89,  # 89% agreement
-  "verdict": "BETTER (more consistent criteria)"
-}
-```
-
----
-
-## Comparison Report Format
-
-### HTML Report Structure
-
-```html
-<!DOCTYPE html>
-<html>
-<head>
-    <title>Rule A/B Test: v1.0.0 vs v2.0.0</title>
-</head>
-<body>
-    <h1>A/B Test Results</h1>
-
-    <!-- Executive Summary -->
-    <section>
-        <h2>Summary</h2>
-        <table>
-            <tr>
-                <th>Metric</th>
-                <th>v1.0.0</th>
-                <th>v2.0.0</th>
-                <th>Change</th>
-                <th>Verdict</th>
-            </tr>
-            <tr>
-                <td>Quality Score</td>
-                <td>87.3</td>
-                <td>91.2</td>
-                <td class="positive">+3.9%</td>
-                <td class="winner">✓ BETTER</td>
-            </tr>
-            <!-- More metrics... -->
-        </table>
-    </section>
-
-    <!-- Per-Spec Comparison -->
-    <section>
-        <h2>Per-Spec Results</h2>
-        <div class="spec">
-            <h3>scatter-basic-001</h3>
-            <div class="side-by-side">
-                <div>
-                    <h4>v1.0.0 (score: 88)</h4>
-                    <img src="scatter-basic-001_v1.png" />
-                </div>
-                <div>
-                    <h4>v2.0.0 (score: 93)</h4>
-                    <img src="scatter-basic-001_v2.png" />
-                </div>
-            </div>
-            <p class="analysis">
-                Improvement: Font size increased, grid more subtle,
-                colorblind-safe palette applied.
-            </p>
-        </div>
-        <!-- More specs... -->
-    </section>
-
-    <!-- Statistical Analysis -->
-    <section>
-        <h2>Statistical Significance</h2>
-        <p>T-test: p-value = 0.003 (p < 0.05, significant)</p>
-        <p>Effect size: Cohen's d = 0.72 (medium effect)</p>
-        <p>Sample size: 10 specs × 5 runs = 50 samples per version</p>
-    </section>
-
-    <!-- Recommendation -->
-    <section>
-        <h2>Recommendation</h2>
-        <div class="recommendation pass">
-            ✓ DEPLOY v2.0.0
-
-            Rationale:
-            - Significant quality improvement (+3.9%, p=0.003)
-            - Better pass rate (+7%)
-            - No critical regressions
-            - Visual quality consistently better
-
-            Suggested rollout: Canary 10% → 50% → 100%
-        </div>
-    </section>
-</body>
-</html>
-```
-
----
-
-## Sample Sizes & Statistical Power
-
-### How Many Specs to Test?
-
-**Rule of Thumb**:
-- **Quick check**: 3-5 specs (low confidence, good enough for draft)
-- **Standard test**: 10-15 specs (medium confidence, good for minor versions)
-- **Rigorous test**: 20-30 specs (high confidence, required for major versions)
-
-### How Many Runs per Spec?
-
-```python
-# Statistical power calculation
-def required_sample_size(
-    expected_improvement: float = 0.05,  # 5% improvement
-    significance_level: float = 0.05,    # p < 0.05
-    power: float = 0.80                   # 80% power
-) -> int:
-    """
-    Returns: Number of runs needed per spec per version
-
-    Example:
-    - To detect 5% improvement
-    - With 95% confidence (p < 0.05)
-    - And 80% chance of detecting if it exists
-    → Need ~64 samples per version
-    → For 10 specs: 6-7 runs per spec per version
-    """
-    pass
-```
-
-**Practical Guide**:
-- **Budget unlimited**: 10 runs per spec per version
-- **Budget medium**: 5 runs per spec per version
-- **Budget tight**: 3 runs per spec per version
-- **Quick check**: 1 run per spec per version (not statistically valid, just a sanity check)
-
----
-
-## Cost Estimation
-
-### Parallel Generation (Approach 1)
-
-```python
-# Assumptions
-specs = 10
-runs = 5
-cost_per_generation = $0.10  # Claude API
-
-# Cost calculation
-total_generations = specs × runs × 2  # ×2 for both versions
-total_cost = total_generations × cost_per_generation
-
-# = 10 × 5 × 2 × $0.10 = $10.00
-```
-
-### Historical Comparison (Approach 2)
-
-```python
-# Only generate with new version
-total_generations = specs × runs
-total_cost = total_generations × cost_per_generation
-
-# = 10 × 5 × $0.10 = $5.00  (50% cheaper)
-```
-
-### Hybrid Approach (Approach 4)
-
-```python
-# Phase 1: Historical (free, uses existing data)
-phase1_cost = $0
-
-# Phase 2: Parallel on 5 specs, 5 runs each
-phase2_cost = 5 × 5 × 2 × $0.10 = $5.00
-
-# Phase 3: Canary (spreads over time, no extra cost)
-phase3_cost = $0
-
-# Total: $5.00  (same as Approach 2, but higher confidence)
-```
-
----
-
-## Automation Scripts (Conceptual)
-
-### Quick Start
-
-```bash
-# Install dependencies
-pip install pyplots-testing
-
-# Run standard A/B test (hybrid approach)
-pyplots-ab-test \
-  --baseline v1.0.0 \
-  --candidate v2.0.0 \
-  --output report.html
-
-# Output:
-# ✓ Phase 1: Historical check PASSED
-# ✓ Phase 2: Parallel test PASSED
-# → Starting Phase 3: Canary rollout
-```
-
-### Custom Test
-
-```python
-# automation/testing/custom_ab_test.py
-from pyplots.testing import ABTest
-
-# Configure test
-test = ABTest(
-    baseline_version="v1.0.0",
-    candidate_version="v2.0.0",
-    approach="hybrid"
-)
-
-# Add test specs
-test.add_specs([
-    "scatter-basic-001",
-    "heatmap-corr-002",
-    "bar-grouped-004"
-])
-
-# Configure metrics
-test.track_metrics([
-    "quality_score",
-    "pass_rate",
-    "generation_time",
-    "attempts_to_pass"
-])
-
-# Run test
-results = test.run()
-
-# Generate report
-test.generate_report(
-    output="comparison-report.html",
-    include_visuals=True
-)
-
-# Decision
-if results.recommend_deployment():
-    print("✓ Deploy v2.0.0")
-    test.trigger_canary_rollout()
-else:
-    print("✗ Keep v1.0.0")
-    print(f"Reason: {results.rejection_reason}")
-```
-
----
-
-## Decision Framework
-
-### Should I Deploy the New Version?
-
-```
-Deploy v2.0.0 if ALL of:
-├─ Quality score improved OR stayed same
-├─ Pass rate improved OR stayed within -2%
-├─ No critical regressions (must-have features still work)
-├─ Statistical significance (p < 0.05) OR large effect size
-└─ Visual inspection looks good (side-by-side comparison)
-
-DON'T deploy if ANY of:
-├─ Quality score dropped > 3%
-├─ Pass rate dropped > 5%
-├─ Critical features broken
-├─ Generation time increased > 50% (unless quality gain is huge)
-└─ Visual quality clearly worse
-```
-
-### Borderline Cases
-
-```
-If results are mixed (some metrics better, some worse):
-1. Weight metrics by importance:
-   - Quality score: 40%
-   - Pass rate: 30%
-   - Visual quality: 20%
-   - Generation time: 10%
-
-2. Calculate weighted score
-
-3. If weighted score > current + 5%:
-   → Deploy
-   Otherwise:
-   → Refine and test again
-```
-
----
-
-## Future Enhancements
-
-### Automatic A/B Testing in CI/CD
-
-```yaml
-# .github/workflows/test-new-rules.yml
-on:
-  pull_request:
-    paths:
-      - 'rules/**'
-
-jobs:
-  ab-test:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Detect rule changes
-        id: detect
-        run: |
-          # Extract version numbers
-          OLD_VERSION=$(...)
-          NEW_VERSION=$(...)
-
-      - name: Run A/B test
-        run: |
-          pyplots-ab-test \
-            --baseline $OLD_VERSION \
-            --candidate $NEW_VERSION \
-            --auto
-
-      - name: Post report to PR
-        uses: actions/github-script@v6
-        with:
-          script: |
-            // Comment with HTML report
-```
-
-### Machine Learning for Rule Optimization
-
-```python
-# Future: Learn which rules produce best results
-from pyplots.ml import RuleOptimizer
-
-optimizer = RuleOptimizer()
-
-# Learn from historical data
-optimizer.train(
-    generations=all_generations_from_database,
-    target_metric="quality_score"
-)
-
-# Suggest rule improvements
-suggestions = optimizer.suggest_improvements(
-    current_version="v2.0.0"
-)
-
-# Output:
-# Suggestion 1: Increase grid alpha to 0.35 (predicted +2% quality)
-# Suggestion 2: Add minimum 11pt font size (predicted +1.5% quality)
-```
-
----
-
-## Summary
-
-### Quick Reference Table
-
-| Approach | Cost | Speed | Confidence | Best For |
-|----------|------|-------|------------|----------|
-| **Parallel** | High | Fast | High | Critical decisions, final validation |
-| **Historical** | Low | Very Fast | Medium | Quick checks, large scale |
-| **Staged** | Low | Slow | Very High | Major changes, production |
-| **Hybrid** | Medium | Medium | Very High | **Most scenarios (recommended)** |
-
-### Recommendation
-
-**For most use cases, use the Hybrid approach**:
-1. Quick historical check (5 min, $0)
-2. If promising → Parallel test on 5 specs (1 hour, $5)
-3. If good → Canary rollout 10% → 50% → 100% (2-3 days, $0 extra)
-
-This balances cost, speed, and confidence while minimizing risk.
-
----
-
-## Related Documentation
-
-- [Rule Versioning System](../architecture/rule-versioning.md)
-- [Claude Skill for Plot Generation](./claude-skill-plot-generation.md)
-- [Automation Workflows](../architecture/automation-workflows.md)
-
----
-
-*"Test scientifically. Deploy confidently."*
diff --git a/docs/concepts/claude-skill-plot-generation.md b/docs/concepts/claude-skill-plot-generation.md
deleted file mode 100644
index 32763a74da..0000000000
--- a/docs/concepts/claude-skill-plot-generation.md
+++ /dev/null
@@ -1,966 +0,0 @@
-# 🎨 Claude Skill: Plot Generation
-
-## Overview
-
-A **Claude Skill** is a specialized, reusable capability that can be invoked by Claude Code or other AI systems. This document proposes a comprehensive skill for automated plot generation that:
-
-- Reads versioned rule files (Markdown)
-- Generates implementation code from specs
-- Performs self-review and optimization
-- Handles multi-attempt feedback loops
-- Integrates with the pyplots rule versioning system
-
-## Why a Claude Skill?
-
-### Problems with Ad-Hoc Prompting
-
-❌ **Inconsistent**: Every generation uses slightly different prompts
-❌ **Not reusable**: Have to explain the full process each time
-❌ **Hard to improve**: Prompt changes lost in chat history
-❌ **No versioning**: Can't track what prompts generated which plots
-❌ **Manual orchestration**: Human has to manage the feedback loop
-
-### Benefits of a Skill
-
-✅ **Consistent**: Same process every time
-✅ **Reusable**: Call the skill, get a plot
-✅ **Versionable**: Skill linked to rule versions
-✅ **Automated**: Handles feedback loops internally
-✅ **Testable**: Can A/B test different skill versions
-✅ **Scalable**: Easy to invoke from automation (GitHub Actions, n8n)
-
----
-
-## Skill Architecture
-
-### High-Level Flow
-
-```
-Input: Spec Markdown + Target Library + Rule Version
-   ↓
-┌──────────────────────────────────────────┐
-│  Claude Skill: Plot Generation v1.0.0    │
-│                                           │
-│  1. Load Rules (from rules/{version}/)   │
-│  2. Generate Code                         │
-│  3. Self-Review                           │
-│  4. Optimize if needed (max 3 attempts)  │
-│  5. Return Result                         │
-└──────────────────────────────────────────┘
-   ↓
-Output: Python Code + Metadata + Feedback
-```
-
-### Skill Interface
-
-```python
-# Conceptual API
-from claude_skills import PlotGenerationSkill
-
-skill = PlotGenerationSkill(
-    rule_version="v1.0.0",  # Which rules to use
-    max_attempts=3           # Maximum optimization loops
-)
-
-result = skill.generate(
-    spec_markdown="specs/scatter-basic-001.md",
-    library="matplotlib",
-    variant="default"
-)
-
-# result.success → True/False
-# result.code → Generated Python code
-# result.quality_score → Self-review score
-# result.attempt_count → How many tries it took
-# result.feedback → Improvement suggestions
-```
-
----
-
-## Skill Inputs
-
-### Required Inputs
-
-```python
-{
-  "spec_markdown": "# scatter-basic-001: Basic 2D Scatter Plot\n\n...",
-  "library": "matplotlib",  # or "seaborn", "plotly", etc.
-  "variant": "default",     # or "ggplot_style", "dark_mode", etc.
-}
-```
-
-### Optional Inputs
-
-```python
-{
-  "rule_version": "v1.0.0",     # Default: latest active version
-  "max_attempts": 3,             # Default: 3
-  "strict_mode": False,          # If True, fail if any criterion not met
-  "custom_criteria": [],         # Additional quality checks
-  "python_version": "3.12",      # Target Python version
-  "style_constraints": {         # Additional styling rules
-    "color_palette": "colorblind_safe",
-    "figure_size": (12, 8)
-  }
-}
-```
-
----
-
-## Skill Outputs
-
-### Success Case
-
-```python
-{
-  "success": True,
-  "code": "import matplotlib.pyplot as plt\nimport pandas as pd\n\n...",
-  "quality_score": 92,
-  "attempt_count": 2,
-  "criteria_met": [
-    "axes_labeled",
-    "grid_visible",
-    "colorblind_safe"
-  ],
-  "criteria_failed": [],
-  "feedback": {
-    "attempt_1": {
-      "score": 78,
-      "issues": ["X-axis labels overlapping", "Grid too prominent"],
-      "improvements": "Rotate labels, reduce grid alpha"
-    },
-    "attempt_2": {
-      "score": 92,
-      "issues": [],
-      "improvements": "All criteria met, code optimized"
-    }
-  },
-  "metadata": {
-    "rule_version": "v1.0.0",
-    "generation_time_seconds": 15.3,
-    "library": "matplotlib",
-    "variant": "default"
-  }
-}
-```
-
-### Failure Case
-
-```python
-{
-  "success": False,
-  "code": None,
-  "quality_score": 71,  # Below threshold after 3 attempts
-  "attempt_count": 3,
-  "criteria_met": ["axes_labeled"],
-  "criteria_failed": ["grid_visible", "colorblind_safe"],
-  "feedback": {
-    "attempt_1": {...},
-    "attempt_2": {...},
-    "attempt_3": {
-      "score": 71,
-      "issues": [
-        "Colorblind safety check still failing",
-        "Unable to find suitable palette that works with data"
-      ],
-      "recommendations": [
-        "Consider using different visualization type",
-        "May need manual refinement"
-      ]
-    }
-  },
-  "error": "Failed to meet quality threshold after 3 attempts"
-}
-```
-
----
-
-## Internal Workflow
-
-### Phase 1: Load Rules
-
-```python
-def load_rules(version: str) -> Rules:
-    """
-    Load generation rules from rules/generation/{version}/
-
-    Returns:
-    - code_generation_rules: How to generate code
-    - quality_criteria: What makes a good plot
-    - self_review_checklist: How to self-evaluate
-    """
-    base_path = f"rules/generation/{version}/"
-
-    rules = Rules(
-        generation=load_markdown(base_path + "code-generation-rules.md"),
-        quality=load_markdown(base_path + "quality-criteria.md"),
-        self_review=load_markdown(base_path + "self-review-checklist.md"),
-        metadata=load_yaml(base_path + "metadata.yaml")
-    )
-
-    return rules
-```
-
-### Phase 2: Generate Initial Code
-
-```python
-def generate_initial_code(
-    spec: str,
-    library: str,
-    rules: Rules
-) -> str:
-    """
-    Generate first version of code based on spec and rules
-
-    Process:
-    1. Parse spec to extract requirements
-    2. Follow generation rules for code structure
-    3. Apply library-specific patterns
-    4. Generate complete, executable code
-    """
-    prompt = f"""
-You are generating a plot implementation.
-
-# Spec
-{spec}
-
-# Target Library
-{library}
-
-# Generation Rules
-{rules.generation}
-
-# Task
-Generate complete Python code that:
-1. Implements the spec requirements
-2. Follows all generation rules
-3. Is ready to execute
-
-Return only the Python code, no explanations.
-"""
-
-    code = call_claude(prompt)
-    return code
-```
-
-### Phase 3: Self-Review
-
-```python
-def self_review(
-    code: str,
-    spec: str,
-    rules: Rules
-) -> SelfReviewResult:
-    """
-    Evaluate generated code against quality criteria
-
-    Returns:
-    - score: 0-100
-    - issues: List of problems found
-    - suggestions: How to improve
-    """
-    # Execute code to generate plot image
-    image_bytes = execute_and_render(code)
-
-    prompt = f"""
-You are reviewing a generated plot implementation.
-
-# Spec
-{spec}
-
-# Generated Code
-```python
-{code}
-```
-
-# Quality Criteria (from rules)
-{rules.quality}
-
-# Self-Review Checklist
-{rules.self_review}
-
-# Task
-1. Execute the code mentally (or review the logic)
-2. Check against each quality criterion
-3. Provide a score (0-100) and detailed feedback
-
-Return JSON:
-{{
-  "score": 0-100,
-  "criteria_met": ["id1", "id2"],
-  "criteria_failed": ["id3"],
-  "issues": ["Issue 1", "Issue 2"],
-  "suggestions": ["Suggestion 1", "Suggestion 2"]
-}}
-"""
-
-    result = call_claude_with_image(prompt, image_bytes)
-    return parse_json(result)
-```
-
-### Phase 4: Optimization Loop
-
-```python
-def optimize_code(
-    code: str,
-    review_result: SelfReviewResult,
-    rules: Rules
-) -> str:
-    """
-    Improve code based on self-review feedback
-
-    Process:
-    1. Identify specific issues
-    2. Generate targeted fixes
-    3. Apply fixes to code
-    4. Return improved version
-    """
-    prompt = f"""
-You are optimizing plot code based on review feedback.
-
-# Current Code
-```python
-{code}
-```
-
-# Review Feedback
-Score: {review_result.score}/100
-
-Issues:
-{'\n'.join(f"- {issue}" for issue in review_result.issues)}
-
-Suggestions:
-{'\n'.join(f"- {sug}" for sug in review_result.suggestions)}
-
-# Quality Criteria (still need to meet)
-{format_failed_criteria(review_result.criteria_failed, rules.quality)}
-
-# Task
-Generate improved code that addresses all issues.
-Focus specifically on the failed criteria.
-
-Return only the improved Python code, no explanations.
-"""
-
-    improved_code = call_claude(prompt)
-    return improved_code
-```
-
-### Phase 5: Multi-Attempt Loop
-
-```python
-def generate_with_feedback_loop(
-    spec: str,
-    library: str,
-    rules: Rules,
-    max_attempts: int = 3,
-    pass_threshold: int = 90
-) -> GenerationResult:
-    """
-    Main generation loop with self-correction
-
-    Returns after:
-    - Score >= threshold (success)
-    - max_attempts reached (failure)
-    """
-    feedback_history = []
-
-    # Attempt 1: Initial generation
-    code = generate_initial_code(spec, library, rules)
-    review = self_review(code, spec, rules)
-    feedback_history.append(review)
-
-    attempt = 1
-
-    # Attempts 2-3: Optimization loop
-    while review.score < pass_threshold and attempt < max_attempts:
-        attempt += 1
-
-        code = optimize_code(code, review, rules)
-        review = self_review(code, spec, rules)
-        feedback_history.append(review)
-
-    # Final result
-    success = review.score >= pass_threshold
-
-    return GenerationResult(
-        success=success,
-        code=code if success else None,
-        quality_score=review.score,
-        attempt_count=attempt,
-        criteria_met=review.criteria_met,
-        criteria_failed=review.criteria_failed,
-        feedback=feedback_history
-    )
-```
-
----
-
-## Skill Definition (Claude Code Format)
-
-```yaml
-# skills/plot-generation/skill.yaml
-name: plot-generation
-version: 1.0.0
-description: Generate plot implementations from specifications with automated quality feedback
-
-inputs:
-  spec_markdown:
-    type: string
-    required: true
-    description: Plot specification in Markdown format
-
-  library:
-    type: string
-    required: true
-    enum: [matplotlib, seaborn, plotly, bokeh, altair]
-
-  variant:
-    type: string
-    required: false
-    default: "default"
-
-  rule_version:
-    type: string
-    required: false
-    default: "latest"
-    description: Which rule version to use (e.g., "v1.0.0")
-
-  max_attempts:
-    type: integer
-    required: false
-    default: 3
-    min: 1
-    max: 5
-
-capabilities:
-  - read_files: true        # Read spec and rule files
-  - execute_code: true      # Execute generated code to render plots
-  - vision: true            # Analyze generated plot images
-  - iterative: true         # Multi-attempt optimization loop
-
-outputs:
-  success:
-    type: boolean
-    description: Whether generation succeeded
-
-  code:
-    type: string
-    description: Generated Python code (null if failed)
-
-  quality_score:
-    type: integer
-    description: Final quality score (0-100)
-
-  attempt_count:
-    type: integer
-    description: Number of attempts needed
-
-  feedback:
-    type: object
-    description: Detailed feedback from all attempts
-
-workflow:
-  - step: load_rules
-    action: Read rule files from rules/generation/{rule_version}/
-
-  - step: generate
-    action: Create initial code based on spec and rules
-
-  - step: review
-    action: Self-evaluate code against quality criteria
-    loop:
-      max_iterations: ${max_attempts}
-      continue_if: quality_score < 90
-      next_step: optimize
-
-  - step: optimize
-    action: Improve code based on feedback
-    next_step: review
-
-  - step: finalize
-    action: Return result with code and metadata
-```
-
----
-
-## Invocation Examples
-
-### Example 1: Basic Invocation
-
-```bash
-# From command line (hypothetical)
-claude-skill plot-generation \
-  --spec specs/scatter-basic-001.md \
-  --library matplotlib \
-  --variant default \
-  --output generated-plot.py
-```
-
-### Example 2: From Python
-
-```python
-# core/generators/claude_generator.py
-from claude_skills import invoke_skill
-
-result = invoke_skill(
-    skill="plot-generation",
-    inputs={
-        "spec_markdown": Path("specs/scatter-basic-001.md").read_text(),
-        "library": "matplotlib",
-        "variant": "default",
-        "rule_version": "v1.0.0"
-    }
-)
-
-if result.success:
-    # Save generated code
-    Path("plots/matplotlib/scatter/scatter-basic-001/default.py").write_text(result.code)
-
-    # Record metadata
-    save_metadata(
-        spec_id="scatter-basic-001",
-        quality_score=result.quality_score,
-        attempt_count=result.attempt_count,
-        rule_version="v1.0.0"
-    )
-else:
-    # Log failure
-    log_failure(
-        spec_id="scatter-basic-001",
-        reason=result.error,
-        feedback=result.feedback
-    )
-```
-
-### Example 3: From GitHub Actions
-
-```yaml
-# .github/workflows/generate-plot.yml
-- name: Generate plot implementation
-  id: generate
-  run: |
-    claude-skill plot-generation \
-      --spec ${{ env.SPEC_FILE }} \
-      --library ${{ matrix.library }} \
-      --rule-version v1.0.0 \
-      --output generated.py \
-      --json-output result.json
-
-- name: Check if successful
-  run: |
-    SUCCESS=$(jq -r '.success' result.json)
-    SCORE=$(jq -r '.quality_score' result.json)
-
-    if [ "$SUCCESS" = "true" ]; then
-      echo "✓ Generation successful (score: $SCORE)"
-    else
-      echo "✗ Generation failed"
-      exit 1
-    fi
-```
-
-### Example 4: A/B Testing with Different Rule Versions
-
-```python
-# automation/testing/ab_with_skills.py
-from claude_skills import invoke_skill
-
-def compare_rule_versions(spec_id: str, versions: list[str]):
-    """
-    Generate plot with multiple rule versions and compare
-    """
-    results = {}
-
-    for version in versions:
-        result = invoke_skill(
-            skill="plot-generation",
-            inputs={
-                "spec_markdown": load_spec(spec_id),
-                "library": "matplotlib",
-                "rule_version": version
-            }
-        )
-
-        results[version] = {
-            "success": result.success,
-            "quality_score": result.quality_score,
-            "attempt_count": result.attempt_count,
-            "code": result.code
-        }
-
-    # Compare results
-    return generate_comparison_report(results)
-
-# Usage
-report = compare_rule_versions(
-    spec_id="scatter-basic-001",
-    versions=["v1.0.0", "v2.0.0"]
-)
-```
-
----
-
-## Advanced Features
-
-### Feature 1: Custom Quality Criteria
-
-```python
-# Add project-specific criteria
-result = invoke_skill(
-    skill="plot-generation",
-    inputs={
-        "spec_markdown": spec,
-        "library": "matplotlib",
-        "custom_criteria": [
-            {
-                "id": "brand_colors",
-                "requirement": "Use company brand colors only",
-                "colors": ["#FF6B6B", "#4ECDC4", "#45B7D1"],
-                "weight": 1.0
-            },
-            {
-                "id": "max_figure_width",
-                "requirement": "Figure width must not exceed 10 inches",
-                "max_width": 10,
-                "weight": 0.5
-            }
-        ]
-    }
-)
-```
-
-### Feature 2: Style Templates
-
-```python
-# Use predefined style templates
-result = invoke_skill(
-    skill="plot-generation",
-    inputs={
-        "spec_markdown": spec,
-        "library": "matplotlib",
-        "style_template": "academic_paper",  # or "presentation", "web", etc.
-        "style_overrides": {
-            "font_family": "Arial",
-            "font_size": 12
-        }
-    }
-)
-```
-
-### Feature 3: Multi-Library Generation
-
-```python
-# Generate for all suitable libraries in one call
-result = invoke_skill(
-    skill="plot-generation",
-    inputs={
-        "spec_markdown": spec,
-        "library": "all",  # Special value: generate for all suitable libraries
-        "rule_version": "v1.0.0"
-    }
-)
-
-# Result contains implementations for multiple libraries
-# result.implementations = {
-#     "matplotlib": {...},
-#     "seaborn": {...},
-#     "plotly": {...}
-# }
-```
-
-### Feature 4: Incremental Refinement
-
-```python
-# Start with a draft, refine iteratively
-draft_result = invoke_skill(
-    skill="plot-generation",
-    inputs={
-        "spec_markdown": spec,
-        "library": "matplotlib",
-        "strict_mode": False,  # Allow lower quality for draft
-        "max_attempts": 1       # Quick draft
-    }
-)
-
-# Review and refine
-final_result = invoke_skill(
-    skill="plot-generation",
-    inputs={
-        "spec_markdown": spec,
-        "library": "matplotlib",
-        "initial_code": draft_result.code,  # Start from draft
-        "feedback": "Improve colorblind safety and font sizes",
-        "strict_mode": True,
-        "max_attempts": 3
-    }
-)
-```
-
----
-
-## Integration with Rule Versioning
-
-### Linking Skills to Rules
-
-```yaml
-# skills/plot-generation/versions.yaml
-skill_versions:
-  - version: "1.0.0"
-    compatible_rule_versions:
-      generation: ["v1.0.0", "v1.1.0"]
-      evaluation: ["v1.0.0"]
-    status: "active"
-
-  - version: "1.1.0"
-    compatible_rule_versions:
-      generation: ["v2.0.0", "v2.1.0"]
-      evaluation: ["v2.0.0"]
-    status: "active"
-```
-
-### Automatic Rule Selection
-
-```python
-# Skill automatically selects appropriate rules
-result = invoke_skill(
-    skill="plot-generation",
-    skill_version="1.1.0",  # Skill version
-    inputs={
-        "spec_markdown": spec,
-        "library": "matplotlib",
-        # rule_version not specified → use latest compatible
-    }
-)
-
-# Skill uses:
-# - Skill logic version 1.1.0
-# - Latest compatible generation rules (v2.1.0)
-# - Latest compatible evaluation rules (v2.0.0)
-```
-
----
-
-## Performance Optimization
-
-### Caching
-
-```python
-# Cache generated code to avoid regeneration
-result = invoke_skill(
-    skill="plot-generation",
-    inputs={
-        "spec_markdown": spec,
-        "library": "matplotlib",
-        "rule_version": "v1.0.0"
-    },
-    cache_key=f"{spec_hash}:{library}:v1.0.0"
-)
-
-# If cache hit: return cached result (instant)
-# If cache miss: generate and cache (15-30 seconds)
-```
-
-### Parallel Generation
-
-```python
-# Generate for multiple libraries in parallel
-from concurrent.futures import ThreadPoolExecutor
-
-libraries = ["matplotlib", "seaborn", "plotly"]
-
-with ThreadPoolExecutor(max_workers=3) as executor:
-    futures = [
-        executor.submit(
-            invoke_skill,
-            skill="plot-generation",
-            inputs={"spec_markdown": spec, "library": lib}
-        )
-        for lib in libraries
-    ]
-
-    results = {lib: future.result() for lib, future in zip(libraries, futures)}
-```
-
----
-
-## Error Handling
-
-### Graceful Degradation
-
-```python
-try:
-    result = invoke_skill(
-        skill="plot-generation",
-        inputs={...},
-        timeout_seconds=60  # Don't wait forever
-    )
-
-    if result.success:
-        # Use generated code
-        save_code(result.code)
-    else:
-        # Fall back to template or manual generation
-        log_failure(result.feedback)
-        use_fallback_template()
-
-except TimeoutError:
-    # Skill took too long
-    log_error("Generation timeout")
-    use_fallback_template()
-
-except SkillError as e:
-    # Skill crashed or invalid input
-    log_error(f"Skill error: {e}")
-    use_fallback_template()
-```
-
-### Retry Logic
-
-```python
-# Retry with exponential backoff
-def generate_with_retry(spec, library, max_retries=3):
-    for attempt in range(max_retries):
-        try:
-            result = invoke_skill(...)
-
-            if result.success:
-                return result
-            elif result.quality_score > 75:
-                # Close enough, acceptable
-                return result
-            else:
-                # Try again with more attempts
-                continue
-
-        except Exception as e:
-            if attempt < max_retries - 1:
-                time.sleep(2 ** attempt)  # 1s, 2s, 4s
-                continue
-            else:
-                raise
-```
-
----
-
-## Monitoring & Telemetry
-
-### Metrics to Track
-
-```python
-{
-  "skill_invocations": {
-    "total": 1234,
-    "successful": 1180,
-    "failed": 54,
-    "success_rate": 0.956
-  },
-
-  "performance": {
-    "avg_generation_time_seconds": 18.3,
-    "p50": 15.2,
-    "p95": 35.7,
-    "p99": 48.1
-  },
-
-  "quality": {
-    "avg_quality_score": 89.2,
-    "avg_attempts_to_pass": 1.7,
-    "first_attempt_success_rate": 0.68
-  },
-
-  "by_rule_version": {
-    "v1.0.0": {
-      "invocations": 523,
-      "avg_quality_score": 87.1
-    },
-    "v2.0.0": {
-      "invocations": 711,
-      "avg_quality_score": 91.3
-    }
-  }
-}
-```
-
----
-
-## Future Enhancements
-
-### 1. Learning from Feedback
-
-```python
-# Skill learns which strategies work best
-skill.train(
-    successful_generations=database.get_successful_generations(),
-    failed_generations=database.get_failed_generations()
-)
-
-# Improves:
-# - Which libraries work best for which plot types
-# - Common pitfalls to avoid
-# - Optimization strategies
-```
-
-### 2. Multi-Modal Input
-
-```python
-# Generate plot from image + description
-result = invoke_skill(
-    skill="plot-generation",
-    inputs={
-        "reference_image": "path/to/example.png",  # What they want
-        "description": "Like this but for time series data",
-        "library": "matplotlib"
-    }
-)
-```
-
-### 3. Interactive Refinement
-
-```python
-# User provides feedback, skill refines
-result1 = invoke_skill(...)  # First version
-
-user_feedback = "The legend is too large and covers data"
-
-result2 = invoke_skill(
-    inputs={
-        "initial_code": result1.code,
-        "user_feedback": user_feedback,
-        "refine_only": ["legend"]  # Only change legend
-    }
-)
-```
-
----
-
-## Summary
-
-### Skill Benefits
-
-✅ **Consistent quality** through versioned rules
-✅ **Automated feedback loops** reduce manual work
-✅ **Testable** via A/B testing of rule versions
-✅ **Scalable** from CLI to full automation
-✅ **Auditable** - know exactly what generated each plot
-
-### Next Steps
-
-1. **Define initial ruleset** (v1.0.0-draft)
-2. **Prototype skill logic** (Python script)
-3. **Test with 5-10 specs** manually
-4. **Refine based on results**
-5. **Formalize as Claude Skill** (if system supports)
-6. **Integrate with automation** (GitHub Actions, n8n)
-
----
-
-## Related Documentation
-
-- [Rule Versioning System](../architecture/rule-versioning.md)
-- [A/B Testing Rules](./ab-testing-rules.md)
-- [Automation Workflows](../architecture/automation-workflows.md)
-
----
-
-*"A skill is a reusable unit of AI capability. Make it good, make it versioned, make it testable."*
diff --git a/docs/vision.md b/docs/concepts/vision.md
similarity index 100%
rename from docs/vision.md
rename to docs/concepts/vision.md
diff --git a/docs/contributing.md b/docs/contributing.md
new file mode 100644
index 0000000000..7531fcb781
--- /dev/null
+++ b/docs/contributing.md
@@ -0,0 +1,81 @@
+# Contributing to pyplots
+
+## Overview
+
+pyplots is a specification-driven platform where **AI generates all plot implementations**. As a contributor, your main focus is on **specifications** (what to visualize) rather than code (how to implement).
+
+---
+
+## How to Propose a New Plot Type
+
+1. **Create a GitHub Issue** with a descriptive title (e.g., "Radar Chart with Multiple Series")
+   - Do NOT include spec-id in the title
+2. **Add the `spec-request` label**
+3. **Wait for automation**:
+   - `spec-create.yml` analyzes your request
+   - Assigns a unique spec-id
+   - Creates a PR with `specification.md` and `specification.yaml`
+4. **Review the generated spec** (PR comments)
+5. **Maintainer adds `approved` label** to the Issue (not the PR)
+6. **Spec merges to main** with `spec-ready` label
+
+---
+
+## How to Improve an Existing Spec
+
+1. **Create a GitHub Issue** referencing the spec to update
+2. **Add the `spec-update` label**
+3. **Wait for `spec-update.yml`** to create a PR with changes
+4. **Maintainer reviews and adds `approved` label**
+
+---
+
+## How to Trigger Implementation Generation
+
+After a spec has the `spec-ready` label:
+
+**Single Library:**
+- Add `generate:{library}` label to the issue (e.g., `generate:matplotlib`)
+
+**All Libraries:**
+```bash
+gh workflow run bulk-generate.yml -f specification_id=<spec-id> -f library=all
+```
+
+---
+
+## What NOT to Do
+
+| Don't | Why |
+|-------|-----|
+| Manually create `plots/` directories | Let `spec-create.yml` handle it |
+| Write `specification.md` files directly | Let AI generate from your Issue |
+| Include `[spec-id]` in issue titles | Spec-id is auto-assigned |
+| Add `approved` label to PRs | Add it to Issues instead |
+| Run `gh pr merge` on implementation PRs | Let `impl-merge.yml` handle it |
+| Create `metadata/*.yaml` manually | Created automatically on merge |
+
+---
+
+## Why This Workflow?
+
+Manual intervention causes:
+- Missing quality scores in metadata
+- Missing preview images in GCS
+- Issues staying open when complete
+- Broken database sync
+
+**Trust the automation.** It handles: code generation, quality review, repair attempts, image promotion, and database sync.
+
+---
+
+## Labels Reference
+
+See [workflows/overview.md](./workflows/overview.md) for the complete label system.
+
+---
+
+## Questions?
+
+- Check existing [Issues](https://github.com/MarkusNeusinger/pyplots/issues) for similar requests
+- Review the [workflows overview](./workflows/overview.md) for automation details
diff --git a/docs/development.md b/docs/development.md
index 232beff2b8..b2ec96af7c 100644
--- a/docs/development.md
+++ b/docs/development.md
@@ -1,436 +1,209 @@
-# 🛠️ Development Guide
+# Development Guide
 
-## Getting Started
-
-### Prerequisites
-
-**Required**:
-- Python 3.10+ (3.12 recommended)
-- [uv](https://github.com/astral-sh/uv) package manager
-- Git
-- PostgreSQL 15+ (for local development)
-
-**Optional**:
-- Docker (for containerized database)
-- Node.js 20+ (for frontend development)
+Guide for setting up a local development environment.
 
 ---
 
-## Local Setup
-
-### 1. Clone Repository
-
-```bash
-git clone https://github.com/your-username/pyplots.git
-cd pyplots
-```
+## Prerequisites
 
-### 2. Install Dependencies
+- **Python 3.10+**
+- **Node.js 18+** and yarn
+- **PostgreSQL** (or access to Cloud SQL)
+- **uv** - Fast Python package manager
 
 ```bash
-# Install uv if not already installed
+# Install uv
 curl -LsSf https://astral.sh/uv/install.sh | sh
-
-# Sync dependencies
-uv sync --all-extras
 ```
 
-This installs:
-- Core dependencies
-- Development tools (pytest, ruff, mypy)
-- All plotting libraries (matplotlib, seaborn, plotly)
-
-### 3. Database Setup
+---
 
-**Option A: Local PostgreSQL**
+## Backend Setup
 
 ```bash
-# Create database
-createdb pyplots
+# Clone and install
+git clone https://github.com/MarkusNeusinger/pyplots.git
+cd pyplots
+uv sync --all-extras
 
-# Set environment variables
+# Database configuration
 cp .env.example .env
-# Edit .env and set DATABASE_URL
-```
-
-**Option B: Docker**
-
-```bash
-docker run -d \
-  --name pyplots-postgres \
-  -e POSTGRES_DB=pyplots \
-  -e POSTGRES_USER=pyplots \
-  -e POSTGRES_PASSWORD=dev_password \
-  -p 5432:5432 \
-  postgres:15
-```
+# Edit .env with your DATABASE_URL:
+# DATABASE_URL=postgresql+asyncpg://user:pass@host:5432/pyplots
 
-### 4. Run Migrations
-
-```bash
+# Run migrations
 uv run alembic upgrade head
-```
 
-### 5. Start Backend
-
-```bash
-uv run uvicorn api.main:app --reload --port 8000
+# Start API server
+uv run uvicorn api.main:app --reload
+# → http://localhost:8000/docs
 ```
 
-API available at: `http://localhost:8000`
-Docs available at: `http://localhost:8000/docs`
+---
 
-### 6. Start Frontend (Optional)
+## Frontend Setup
 
 ```bash
 cd app
-npm install
-npm run dev
+yarn install
+yarn dev
+# → http://localhost:3000
 ```
 
-Frontend available at: `http://localhost:3000`
+For production build:
+```bash
+yarn build
+```
 
 ---
 
-## Environment Variables
-
-Create `.env` file in project root (see `.env.example`):
+## Running Tests
 
 ```bash
-# Database (Cloud SQL via public IP for local development)
-DATABASE_URL=postgresql+asyncpg://user:password@CLOUD_SQL_PUBLIC_IP:5432/pyplots
+# All tests
+uv run pytest
 
-# Google Cloud Storage
-GCS_BUCKET=pyplots-images
-GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
+# Only unit tests (fast, no DB needed)
+uv run pytest tests/unit
 
-# Environment
-ENVIRONMENT=development
+# Only integration tests (SQLite in-memory)
+uv run pytest tests/integration
 
-# API Keys (optional, for AI generation workflows)
-# ANTHROPIC_API_KEY=sk-ant-...
-```
+# Only E2E tests (requires DATABASE_URL)
+uv run pytest tests/e2e
 
-**Production**: In Cloud Run, `DATABASE_URL` is injected from Secret Manager and uses a Unix socket connection to Cloud SQL.
+# With coverage report
+uv run pytest --cov=. --cov-report=html
+```
 
-**Never commit `.env`!** (Already in `.gitignore`)
+**Coverage target**: 90%+
 
 ---
 
-## Development Workflow
-
-### Running Tests
-
-```bash
-# Run all tests
-uv run pytest
-
-# Run with coverage
-uv run pytest --cov=. --cov-report=html
-
-# Run specific test file
-uv run pytest tests/unit/api/test_routers.py
-
-# Run specific test
-uv run pytest tests/unit/api/test_routers.py::test_get_specs
-```
+## Code Quality
 
-### Code Formatting
+Both linting and formatting must pass for CI.
 
 ```bash
-# Check formatting
+# Check linting
 uv run ruff check .
 
-# Auto-fix issues
+# Auto-fix linting issues
 uv run ruff check . --fix
 
-# Format code
-uv run ruff format .
-```
-
-### Type Checking (Optional)
-
-```bash
-# Install mypy first
-uv sync --extra typecheck
+# Check formatting
+uv run ruff format --check .
 
-# Then run type checking
-uv run mypy .
+# Auto-format
+uv run ruff format .
 ```
 
-**Note**: Type checking is optional. Ruff already catches most issues.
-
-### Pre-commit Hook (Recommended)
-
+**Always run before committing:**
 ```bash
-# Install pre-commit
-uv pip install pre-commit
-
-# Install git hooks
-pre-commit install
-
-# Now formatting runs automatically on git commit
+uv run ruff check . && uv run ruff format .
 ```
 
 ---
 
-## Code Standards
-
-See [CLAUDE.md](../CLAUDE.md) for detailed code standards including:
-- Python style guide (PEP 8, Ruff)
-- Type hints requirements
-- Docstring format (Google style)
-- Import ordering
-
----
-
-## Testing
-
-**Coverage Target**: 90%+
-
-**Test Structure**: Mirror source structure (`plots/.../default.py` → `tests/unit/plots/.../test_*.py`)
-
-**Test Naming**: `test_{what_it_does}`
-
-**Fixtures**: Use pytest fixtures in `tests/conftest.py` for reusable test data
-
-See [CLAUDE.md](../CLAUDE.md) for testing standards.
-
----
-
-## Writing Plot Implementations
-
-See [CLAUDE.md](../CLAUDE.md) for:
-- Implementation file template
-- Best practices (validation, defaults, error handling)
-- Anti-patterns to avoid
-
----
-
-## Contributing
-
-### Proposing New Plots
-
-**Option 1: GitHub Issue (Recommended)**
-
-1. Create issue using spec template
-2. Fill in description, applications, data requirements
-3. Add label `plot-idea`
-4. Wait for review and approval
-5. AI generates implementations automatically
+## Database
 
-**Option 2: Pull Request (Advanced)**
+### Local PostgreSQL
 
-1. Create spec directory: `plots/{spec-id}/` with spec.md
-2. Implement for at least one library
-3. Add tests
-4. Create PR with previews
-5. Wait for quality check and review
-
-### Contribution Guidelines
-
-**Before Submitting**:
-- [ ] Code passes all tests (`pytest`)
-- [ ] Code is formatted (`ruff format`)
-- [ ] Type hints are present (`mypy`)
-- [ ] Coverage is >90% for new code
-- [ ] Docstrings are complete
-- [ ] Preview image looks good
-
-**PR Description Template**:
-
-```markdown
-## Description
-
-Implements scatter-basic-001 for matplotlib
-
-## Checklist
-
-- [x] Spec file created/updated
-- [x] Implementation code written
-- [x] Tests added (coverage: 95%)
-- [x] Preview generated
-- [ ] Quality check passed (waiting for CI)
-
-## Preview
-
-![Preview](link-to-preview.png)
-
-## Related Issue
-
-Closes #123
+Set `DATABASE_URL` in `.env`:
 ```
-
----
-
-## Project Structure
-
-See [CLAUDE.md](../CLAUDE.md) for:
-- Directory structure
-- Implementation file naming (`plots/{spec-id}/implementations/{library}.py`)
-- Test file naming (`tests/unit/plots/test_{spec_id}.py`)
-
----
-
-## Common Tasks
-
-### Add a New Library
-1. Update database (add to `libraries` table)
-2. Create directory structure (`mkdir -p plots/{library}/scatter`)
-3. Implement existing specs
-4. Add tests
-
-### Update an Existing Implementation
-1. Create GitHub issue referencing original
-2. Update implementation file in `plots/{spec-id}/implementations/{library}.py`
-3. Run tests: `uv run pytest tests/unit/plots/test_{spec_id}.py`
-4. Create PR → Quality check runs automatically
-
----
-
-## Debugging Tips
-
-### Database Connection Issues
-
-```bash
-# Test connection
-psql -U pyplots -d pyplots -h localhost
-
-# Check migrations
-uv run alembic current
-uv run alembic history
+DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/pyplots
 ```
 
-### Import Errors
+### Cloud SQL (development)
 
-```bash
-# Verify package installation
-uv pip list
-
-# Reinstall
-uv sync --reinstall
+For Cloud SQL access, your IP must be in the authorized networks. Set:
 ```
-
-### Plot Generation Errors
-
-```python
-# Run implementation standalone
-python plots/scatter-basic/implementations/matplotlib.py
-
-# Add debug prints
-print(f"Data shape: {data.shape}")
-print(f"Columns: {data.columns.tolist()}")
+DATABASE_URL=postgresql+asyncpg://user:pass@CLOUD_SQL_PUBLIC_IP:5432/pyplots
 ```
 
-### Test Failures
+### Migrations
 
 ```bash
-# Verbose output
-pytest -v
+# Apply all migrations
+uv run alembic upgrade head
 
-# Show print statements
-pytest -s
+# Create new migration
+uv run alembic revision --autogenerate -m "description"
 
-# Drop into debugger on failure
-pytest --pdb
+# Check current version
+uv run alembic current
 ```
 
 ---
 
-## FAQ
-
-### Q: How do I add a completely new plot type?
-
-**A**: Create GitHub issue with spec → AI generates code → Review and merge
-
-### Q: What if I want to use a different plotting style?
-
-**A**: Create style variant (e.g., `ggplot_style.py`, `dark_style.py`)
-
-### Q: How do I test plot generation locally?
-
-**A**: Run implementation file directly: `python plots/scatter-basic/implementations/matplotlib.py`
-
-### Q: Do I need to implement for all libraries?
-
-**A**: No! Start with one library. Others can be added later.
+## Environment Variables
 
-### Q: How do I handle Python version differences?
+Copy `.env.example` and configure:
 
-**A**: Only create version-specific files if absolutely necessary (e.g., syntax changes). Prefer single `default.py` that works across 3.10-3.13.
+| Variable | Required | Description |
+|----------|----------|-------------|
+| `DATABASE_URL` | Yes | PostgreSQL connection string |
+| `GCS_BUCKET` | No | GCS bucket for images (default: pyplots-images) |
+| `GOOGLE_APPLICATION_CREDENTIALS` | No | Path to service account JSON |
+| `ENVIRONMENT` | No | `development` or `production` |
 
 ---
 
-## Working with Rules
-
-The project includes versioned rules for AI code generation and quality evaluation.
-
-**Location**: `rules/` directory
-
-**Key Files**:
-- `rules/versions.yaml` - Version configuration
-- `rules/generation/v*/` - Code generation rules (Markdown)
-- `rules/README.md` - Rule system documentation
+## Project Structure
 
-**Rule States**: draft → active → deprecated → archived
+```
+pyplots/
+├── api/                # FastAPI backend
+├── app/                # React frontend
+├── core/               # Shared business logic
+├── plots/              # Plot specifications and implementations
+├── prompts/            # AI agent prompts
+├── tests/              # Test suite
+│   ├── unit/           # Fast, mocked tests
+│   ├── integration/    # SQLite in-memory
+│   └── e2e/            # Real PostgreSQL
+└── docs/               # Documentation
+```
 
-**See Also**:
-- [A/B Testing Strategies](./concepts/ab-testing-rules.md)
-- [Rules README](../rules/README.md)
+See [Repository Structure](reference/repository.md) for details.
 
 ---
 
-## Deployment
-
-pyplots runs on **Google Cloud Platform** (europe-west4 region):
-
-| Service | Component | Purpose |
-|---------|-----------|---------|
-| **Cloud Run** | `pyplots-backend` | FastAPI API (serverless, auto-scaling) |
-| **Cloud Run** | `pyplots-frontend` | React SPA via nginx |
-| **Cloud SQL** | PostgreSQL 18 | Database (Unix socket in production) |
-| **Cloud Storage** | `pyplots-images` | Preview images (GCS bucket) |
-| **Secret Manager** | `DATABASE_URL` | Secure credential storage |
-| **Cloud Build** | Triggers | Auto-deploy on push to main |
-
-### Automatic Deployment (Recommended)
-
-Push to `main` triggers Cloud Build automatically:
-- `api/**`, `core/**`, `pyproject.toml` changes → Backend redeploy
-- `app/**` changes → Frontend redeploy
-
-### Manual Deployment
+## Useful Commands
 
 ```bash
-# Backend
-gcloud builds submit --config=api/cloudbuild.yaml --project=YOUR_PROJECT_ID
+# Run single test file
+uv run pytest tests/unit/api/test_routers.py
 
-# Frontend
-gcloud builds submit --config=app/cloudbuild.yaml --project=YOUR_PROJECT_ID
-```
+# Run single test
+uv run pytest tests/unit/api/test_routers.py::test_get_specs -v
 
-### Key Files
+# Debug test failures
+uv run pytest --pdb
 
-- `api/cloudbuild.yaml` - Backend build config (Cloud SQL + Secrets)
-- `api/Dockerfile` - Python 3.13 + uv + uvicorn
-- `app/cloudbuild.yaml` - Frontend build config
-- `app/Dockerfile` - Multi-stage: Node build → nginx serve
+# Check database connection
+uv run python -c "from core.database import is_db_configured; print(is_db_configured())"
+```
 
 ---
 
-## Resources
+## Troubleshooting
 
-**Documentation**:
-- [FastAPI Docs](https://fastapi.tiangolo.com/)
-- [SQLAlchemy Docs](https://docs.sqlalchemy.org/)
-- [Pytest Docs](https://docs.pytest.org/)
-- [Matplotlib Docs](https://matplotlib.org/stable/contents.html)
+### Import errors
+```bash
+uv sync --reinstall
+```
 
-**Tools**:
-- [uv Package Manager](https://github.com/astral-sh/uv)
-- [Ruff Linter/Formatter](https://github.com/astral-sh/ruff)
-- [Alembic Migrations](https://alembic.sqlalchemy.org/)
+### Database connection issues
+```bash
+# Test connection
+psql -U pyplots -d pyplots -h localhost
 
----
+# Check migrations
+uv run alembic current
+```
 
-*For architecture details, see [architecture/](./architecture/)*
+### Test failures
+- Unit/integration tests should work without DATABASE_URL
+- E2E tests are skipped if DATABASE_URL is not set
+- Run `uv run pytest tests/unit -v` to isolate issues
diff --git a/docs/index.md b/docs/index.md
new file mode 100644
index 0000000000..0cfe1a23cf
--- /dev/null
+++ b/docs/index.md
@@ -0,0 +1,85 @@
+# Documentation
+
+Welcome to the pyplots documentation. Start here to find what you're looking for.
+
+---
+
+## Quick Links
+
+| I want to... | Go to |
+|--------------|-------|
+| Understand the project | [Vision](concepts/vision.md) |
+| Contribute plot ideas | [Contributing](contributing.md) |
+| Set up local development | [Development Guide](development.md) |
+| See how automation works | [Workflows](workflows/overview.md) |
+| Look up API endpoints | [API Reference](reference/api.md) |
+| Understand the database | [Database Schema](reference/database.md) |
+| Explore repository structure | [Repository Structure](reference/repository.md) |
+
+---
+
+## Documentation Structure
+
+```
+docs/
+├── index.md              # You are here
+├── contributing.md       # How to contribute
+├── development.md        # Local development setup
+├── concepts/             # Philosophy and design
+│   └── vision.md         # Product vision and mission
+├── workflows/            # Process documentation
+│   └── overview.md       # GitHub Actions automation
+├── reference/            # Technical details
+│   ├── api.md            # REST API endpoints
+│   ├── database.md       # PostgreSQL schema
+│   ├── repository.md     # Directory structure
+│   ├── tagging-system.md # Tag taxonomy reference
+│   ├── plausible.md      # Analytics integration
+│   └── seo.md            # SEO configuration
+└── plot-types-catalog.md # Future plot ideas
+```
+
+---
+
+## Concepts
+
+High-level understanding of why things work the way they do.
+
+- **[Vision](concepts/vision.md)** - Product mission, the problem we solve, and how we're different
+
+---
+
+## Workflows
+
+How the automation pipeline works.
+
+- **[Overview](workflows/overview.md)** - Specification and implementation pipelines, label system
+
+---
+
+## Reference
+
+Technical details for development and integration.
+
+- **[API](reference/api.md)** - REST endpoints, request/response formats
+- **[Database](reference/database.md)** - PostgreSQL schema and models
+- **[Repository](reference/repository.md)** - Directory structure and file organization
+- **[Tagging System](reference/tagging-system.md)** - Tag taxonomy (used by spec-create workflow)
+- **[Plausible](reference/plausible.md)** - Analytics integration
+- **[SEO](reference/seo.md)** - Search engine optimization setup
+
+---
+
+## Contributing
+
+- **[Contributing Guide](contributing.md)** - How to propose plot ideas and improve specs
+- **[Development Guide](development.md)** - Local setup, testing, code quality
+
+---
+
+## Other Resources
+
+- **[README](../README.md)** - Project overview and quick start
+- **[CLAUDE.md](../CLAUDE.md)** - AI assistant instructions (for Claude Code)
+- **[copilot-instructions.md](../.github/copilot-instructions.md)** - AI assistant instructions (for GitHub Copilot)
+- **[prompts/](../prompts/)** - AI agent prompts for code generation
diff --git a/docs/plot-types-catalog.md b/docs/plot-types-catalog.md
index 1c0ce542df..f001561b3d 100644
--- a/docs/plot-types-catalog.md
+++ b/docs/plot-types-catalog.md
@@ -820,7 +820,7 @@ A comprehensive catalog of plot types for the pyplots platform. Each plot is imp
 **Description:** A quiver plot displays vector fields using arrows positioned at grid points. Each arrow represents a vector at that location, with direction indicating the vector's angle and length proportional to its magnitude.
 
 ### streamline-basic 📋
-**Description:** Strömungslinien eines Vektorfelds als glatte Kurven.
+**Description:** Streamlines of a vector field as smooth curves.
 
 ### stem-basic ✅
 **Description:** A stem plot displays data points as markers connected to a baseline by vertical lines (stems).
@@ -871,7 +871,7 @@ A comprehensive catalog of plot types for the pyplots platform. Each plot is imp
 
 ## 30. Printable & Fun
 
-Druckbare Vorlagen und spielerische Visualisierungen.
+Printable templates and playful visualizations.
 
 ### Puzzles & Games
 
diff --git a/docs/reference/api.md b/docs/reference/api.md
new file mode 100644
index 0000000000..a1bb7d1062
--- /dev/null
+++ b/docs/reference/api.md
@@ -0,0 +1,416 @@
+# 🔌 API Reference
+
+## Overview
+
+The pyplots API is a **FastAPI-based REST API** serving plot data to the frontend.
+
+**Base URL**: `https://api.pyplots.ai`
+
+**Key Principle**: Database is derived from repository via `sync-postgres.yml`. API reads from PostgreSQL.
+
+---
+
+## Core Endpoints
+
+### Specs
+
+#### GET `/specs`
+
+**Purpose**: List all specs with at least one implementation
+
+**Response**:
+```json
+[
+  {
+    "id": "scatter-basic",
+    "title": "Basic Scatter Plot",
+    "description": "A fundamental scatter plot showing...",
+    "tags": {
+      "plot_type": ["scatter"],
+      "domain": ["statistics"],
+      "features": ["basic", "2d"],
+      "data_type": ["numeric"]
+    },
+    "library_count": 9
+  }
+]
+```
+
+---
+
+#### GET `/specs/{spec_id}`
+
+**Purpose**: Get detailed spec with all implementations
+
+**Response**:
+```json
+{
+  "id": "scatter-basic",
+  "title": "Basic Scatter Plot",
+  "description": "A fundamental scatter plot...",
+  "applications": ["Show correlation", "Compare distributions"],
+  "data": ["x: numeric values", "y: numeric values"],
+  "notes": ["Use alpha for overlapping points"],
+  "tags": {
+    "plot_type": ["scatter"],
+    "domain": ["statistics"],
+    "features": ["basic"],
+    "data_type": ["numeric"]
+  },
+  "issue": 42,
+  "suggested": "CoolContributor",
+  "created": "2025-01-10T08:00:00Z",
+  "updated": "2025-01-15T10:30:00Z",
+  "implementations": [
+    {
+      "library_id": "matplotlib",
+      "library_name": "Matplotlib",
+      "preview_url": "https://storage.googleapis.com/pyplots-images/plots/scatter-basic/matplotlib/plot.png",
+      "preview_thumb": "https://storage.googleapis.com/pyplots-images/plots/scatter-basic/matplotlib/plot_thumb.png",
+      "preview_html": null,
+      "quality_score": 92.0,
+      "code": "import matplotlib.pyplot as plt...",
+      "generated_at": "2025-01-15T10:30:00Z",
+      "generated_by": "claude-opus-4-5-20251101",
+      "python_version": "3.13",
+      "library_version": "3.10.0",
+      "review_strengths": ["Clean code structure"],
+      "review_weaknesses": ["Grid could be more subtle"],
+      "review_image_description": "The plot shows...",
+      "review_criteria_checklist": {...},
+      "review_verdict": "APPROVED"
+    }
+  ]
+}
+```
+
+---
+
+#### GET `/specs/{spec_id}/images`
+
+**Purpose**: Get preview images for a spec across all libraries
+
+**Response**:
+```json
+{
+  "spec_id": "scatter-basic",
+  "images": [
+    {
+      "library": "matplotlib",
+      "url": "https://storage.googleapis.com/.../plot.png",
+      "thumb": "https://storage.googleapis.com/.../plot_thumb.png",
+      "html": null
+    }
+  ]
+}
+```
+
+---
+
+### Libraries
+
+#### GET `/libraries`
+
+**Purpose**: List supported plotting libraries
+
+**Response**:
+```json
+{
+  "libraries": [
+    {
+      "id": "matplotlib",
+      "name": "Matplotlib",
+      "version": "3.10.0",
+      "documentation_url": "https://matplotlib.org",
+      "description": "The classic standard..."
+    }
+  ]
+}
+```
+
+---
+
+#### GET `/libraries/{library_id}/images`
+
+**Purpose**: Get all plot images for a library across all specs
+
+**Response**:
+```json
+{
+  "library": "matplotlib",
+  "images": [
+    {
+      "spec_id": "scatter-basic",
+      "library": "matplotlib",
+      "url": "https://storage.googleapis.com/.../plot.png",
+      "thumb": "https://storage.googleapis.com/.../plot_thumb.png",
+      "html": null,
+      "code": "import matplotlib.pyplot as plt..."
+    }
+  ]
+}
+```
+
+---
+
+### Plots Filter
+
+#### GET `/plots/filter`
+
+**Purpose**: Filter plots with faceted counts for all filter categories
+
+**Query Parameters** (combinable):
+- `lib` - Library filter (matplotlib, seaborn, etc.)
+- `spec` - Spec ID filter
+- `plot` - Plot type tag filter
+- `data` - Data type tag filter
+- `dom` - Domain tag filter
+- `feat` - Features tag filter
+
+**Filter Logic**:
+- Comma-separated values: OR (`lib=matplotlib,seaborn`)
+- Multiple params same name: AND (`lib=matplotlib&lib=seaborn`)
+- Different categories: AND (`lib=matplotlib&plot=scatter`)
+
+**Response**:
+```json
+{
+  "total": 42,
+  "images": [
+    {
+      "spec_id": "scatter-basic",
+      "library": "matplotlib",
+      "quality": 92,
+      "url": "https://storage.googleapis.com/.../plot.png",
+      "thumb": "https://storage.googleapis.com/.../plot_thumb.png",
+      "html": null
+    }
+  ],
+  "counts": {
+    "lib": {"matplotlib": 5, "seaborn": 3},
+    "spec": {"scatter-basic": 2},
+    "plot": {"scatter": 10},
+    "data": {"numeric": 15},
+    "dom": {"statistics": 8},
+    "feat": {"basic": 12}
+  },
+  "globalCounts": {...},
+  "orCounts": [...]
+}
+```
+
+---
+
+### Stats
+
+#### GET `/stats`
+
+**Purpose**: Platform statistics
+
+**Response**:
+```json
+{
+  "specs": 42,
+  "plots": 378,
+  "libraries": 9
+}
+```
+
+---
+
+### Download
+
+#### GET `/download/{spec_id}/{library}`
+
+**Purpose**: Download plot image (proxy to avoid CORS)
+
+**Response**: PNG image file with `Content-Disposition: attachment`
+
+---
+
+### Health
+
+#### GET `/`
+
+**Purpose**: Root endpoint
+
+**Response**:
+```json
+{
+  "message": "Welcome to pyplots API",
+  "version": "0.2.0",
+  "docs": "/docs",
+  "health": "/health"
+}
+```
+
+---
+
+#### GET `/health`
+
+**Purpose**: Health check for Cloud Run
+
+**Response**:
+```json
+{
+  "status": "healthy",
+  "service": "pyplots-api",
+  "version": "0.2.0"
+}
+```
+
+---
+
+## SEO Endpoints
+
+### GET `/sitemap.xml`
+
+**Purpose**: Dynamic XML sitemap for search engines
+
+Includes: root, catalog, all specs with implementations, all implementation pages.
+
+---
+
+### GET `/seo-proxy/`
+
+**Purpose**: Bot-optimized home page with og:tags
+
+Used by nginx to serve correct meta tags to social media bots.
+
+---
+
+### GET `/seo-proxy/catalog`
+
+**Purpose**: Bot-optimized catalog page
+
+---
+
+### GET `/seo-proxy/{spec_id}`
+
+**Purpose**: Bot-optimized spec overview page with collage og:image
+
+---
+
+### GET `/seo-proxy/{spec_id}/{library}`
+
+**Purpose**: Bot-optimized implementation page with branded og:image
+
+---
+
+## OG Image Endpoints
+
+All endpoints are under `/og/` prefix.
+
+### GET `/og/home.png`
+
+**Purpose**: OG image for home page (with tracking)
+
+---
+
+### GET `/og/catalog.png`
+
+**Purpose**: OG image for catalog page
+
+---
+
+### GET `/og/{spec_id}.png`
+
+**Purpose**: Collage OG image for spec overview (2x3 grid of top implementations)
+
+---
+
+### GET `/og/{spec_id}/{library}.png`
+
+**Purpose**: Branded OG image for implementation (1200x630 with pyplots.ai header)
+
+---
+
+## Proxy Endpoints
+
+### GET `/proxy/html`
+
+**Purpose**: Proxy HTML from GCS with size reporting script injection
+
+**Query Parameters**:
+- `url` - GCS URL (must be from `pyplots-images` bucket)
+- `origin` - Target origin for postMessage (optional)
+
+Used to load interactive plots (plotly, bokeh, altair) in iframes with dynamic sizing.
+
+---
+
+## Error Responses
+
+### Standard Error Format
+
+```json
+{
+  "detail": "Spec not found"
+}
+```
+
+### HTTP Status Codes
+
+| Status | Description |
+|--------|-------------|
+| 200 | Success |
+| 400 | Bad request (invalid parameters) |
+| 404 | Resource not found |
+| 502 | External service error (GCS) |
+| 503 | Database not available |
+
+---
+
+## Caching
+
+### In-Memory Cache
+
+API uses in-memory caching with TTL:
+- Stats: 5 min
+- Specs list: 2 min
+- Individual specs: 2 min
+- Filter results: 30 sec
+
+### HTTP Cache Headers
+
+```http
+Cache-Control: public, max-age=120, stale-while-revalidate=600
+```
+
+Applied to:
+- `/libraries` - 5 min
+- `/stats` - 5 min
+- `/specs` - 2 min
+- `/specs/{spec_id}` - 2 min
+- `/plots/filter` - 30 sec
+
+---
+
+## CORS Configuration
+
+**Allowed Origins**:
+- `https://pyplots.ai`
+- `http://localhost:*` (development)
+
+**Allowed Methods**: All
+
+---
+
+## GZip Compression
+
+Responses > 500 bytes are compressed with GZip.
+
+Example: `/plots/filter` response: 301KB → ~40KB compressed.
+
+---
+
+## OpenAPI Documentation
+
+Interactive API documentation available at:
+- **Swagger UI**: `https://api.pyplots.ai/docs`
+- **ReDoc**: `https://api.pyplots.ai/redoc`
+- **OpenAPI JSON**: `https://api.pyplots.ai/openapi.json`
+
+---
+
+*For database schema, see [database.md](./database.md)*
diff --git a/docs/architecture/database.md b/docs/reference/database.md
similarity index 71%
rename from docs/architecture/database.md
rename to docs/reference/database.md
index 465cfc63a5..5f6ddb30c8 100644
--- a/docs/architecture/database.md
+++ b/docs/reference/database.md
@@ -2,9 +2,9 @@
 
 ## Overview
 
-pyplots uses **PostgreSQL** (Cloud SQL) to store metadata about plots, specs, and implementations. The database stores **references and metadata only** - not code or images.
+pyplots uses **PostgreSQL** (Cloud SQL) as the primary data store for the website. The database contains **all data needed to serve the frontend** - specs, implementations (including full code), and metadata.
 
-**Key Principle**: Lightweight metadata store, not a code repository.
+**Key Principle**: Repository is source of truth, database is derived via `sync-postgres.yml`.
 
 ---
 
@@ -12,33 +12,45 @@ pyplots uses **PostgreSQL** (Cloud SQL) to store metadata about plots, specs, an
 
 | Database | Status | Use Case | When to Consider |
 |----------|--------|----------|------------------|
-| **PostgreSQL** | ✅ **Current** | All data: specs, implementations, tags, quality scores, promotion queue | Start here - handles everything |
+| **PostgreSQL** | ✅ **Current** | All data: specs, implementations, tags, quality scores | Start here - handles everything |
 | **Google Cloud Storage** | ✅ **Current** | Preview images, user-generated plots | Already implemented |
-| **GitHub** | ✅ **Current** | Code, specs, quality reports (as Issue comments) | Already implemented |
-| **Firestore** | 📋 **Future Optimization** | Multi-dimensional tag queries (5-level hierarchy) | IF tag search becomes performance bottleneck with >10,000 specs |
+| **GitHub** | ✅ **Current** | Code, specs, workflow state (via labels) | Already implemented |
 
-**Current Approach**: All data in PostgreSQL + GCS + GitHub. This is sufficient for MVP and beyond.
-
-**Future Optimization**: See [Firestore for Advanced Tagging](#future-optimization-firestore-for-advanced-tagging) section at the end of this document.
+**Current Approach**: All data in PostgreSQL + GCS + GitHub.
 
 ---
 
-## What's Stored vs. What's Not
+## What's Stored Where
+
+### ✅ Stored in Database (PostgreSQL)
+
+**Specs:**
+- Full spec content (title, description, applications, data, notes)
+- Tags (JSONB with plot_type, domain, features, data_type)
+- Metadata (created, updated, issue, suggested)
+
+**Implementations:**
+- Full Python source code (`impls.code`)
+- GCS URLs for preview images
+- Quality scores and review feedback
+- Generation metadata (model, workflow run, versions)
+
+**Other:**
+- Library information (name, version, docs URL)
+
+### ✅ Stored in GCS (Google Cloud Storage)
 
-### ✅ Stored in Database
+- Preview images (PNG, thumbnails)
+- Interactive HTML plots (plotly, bokeh, altair, etc.)
 
-- Spec metadata (title, description, tags)
-- Implementation metadata (library, variant, quality score)
-- GCS URLs (preview images)
-- Promotion queue (social media posts)
-- Library information
-- Usage analytics (optional)
+### ✅ Stored in GitHub
 
-### ❌ NOT Stored in Database
+- Source of truth for all code and specs (`plots/` directory)
+- Quality reports (as Issue comments)
+- Workflow state (via labels on Issues/PRs)
+
+### ❌ NOT Stored Anywhere Permanently
 
-- Plot code (stored in GitHub repository)
-- Preview images (stored in Google Cloud Storage)
-- Quality reports (stored in GitHub Issues as comments)
 - User uploaded data (processed in-memory only)
 
 ---
@@ -90,7 +102,8 @@ CREATE TABLE libraries
     id                VARCHAR PRIMARY KEY,      -- "matplotlib", "seaborn", "plotly"
     name              VARCHAR NOT NULL,         -- "Matplotlib"
     version           VARCHAR,                  -- "3.9.0"
-    documentation_url VARCHAR                   -- "https://matplotlib.org"
+    documentation_url VARCHAR,                  -- "https://matplotlib.org"
+    description       TEXT                      -- Short library description
 );
 
 -- Library-specific implementations
@@ -127,6 +140,11 @@ CREATE TABLE impls
     review_strengths  VARCHAR[],                -- What's good about this implementation
     review_weaknesses VARCHAR[],                -- What needs improvement
 
+    -- Extended review data (issue #2845)
+    review_image_description TEXT,              -- AI's visual description of the plot
+    review_criteria_checklist JSONB,            -- Detailed scoring breakdown
+    review_verdict  VARCHAR(20),                -- "APPROVED" or "REJECTED"
+
     updated_at      TIMESTAMP DEFAULT NOW(),
 
     UNIQUE (spec_id, library_id)
@@ -137,7 +155,7 @@ CREATE INDEX idx_impls_spec ON impls (spec_id);
 CREATE INDEX idx_impls_library ON impls (library_id);
 ```
 
-**Note**: The `tags` and `promotion_queue` tables are planned but not yet implemented.
+**Note**: Tags are stored as JSONB in the `specs` table (not a separate table).
 
 ---
 
@@ -275,7 +293,7 @@ Use **Alembic** for schema migrations:
 
 ```bash
 # Create new migration
-alembic revision -m "add promotion queue table"
+alembic revision -m "add new column"
 
 # Apply migrations
 alembic upgrade head
@@ -454,97 +472,6 @@ await session.execute(f"SELECT * FROM specs WHERE id = '{spec_id}'")
 
 ---
 
-## Future Optimization: Firestore for Advanced Tagging
-
-**Status**: 📋 **Planned** (not currently implemented)
-
-**Current State**: Tags are stored as JSONB in the `specs.tags` column with structured categories (plot_type, domain, features, audience, data_type). This is sufficient for MVP and early growth.
-
-**Future Consideration**: As the platform scales beyond 10,000+ specs with complex multi-dimensional search requirements, consider adding Firestore for advanced tag functionality.
-
----
-
-### Why Firestore Could Help (Future)
-
-**Problem it solves**:
-- Multi-dimensional tag queries (5-level hierarchy: Library → Plot Type → Data Type → Domain → Features)
-- Filtering across multiple dimensions simultaneously (e.g., "matplotlib + timeseries + finance + beginner")
-- Real-time search index updates
-- Automatic scaling for high-volume tag searches
-
-**When to implement**:
-- PostgreSQL tag queries become slow (>500ms for common searches)
-- Need for complex tag hierarchy beyond simple array
-- User feedback requests advanced filtering
-- Catalog grows beyond 10,000 specs
-
----
-
-### Proposed Architecture (When Implemented)
-
-**Data Split**:
-- **PostgreSQL**: Spec metadata, implementation records, quality scores, promotion queue (no change)
-- **Firestore**: Multi-dimensional tags, search keywords, similarity clusters
-
-**Example Document Structure**:
-```javascript
-{
-  "plot_id": "matplotlib-scatter-basic-001-default",
-  "spec_id": "scatter-basic-001",
-  "tags": {
-    "library": "matplotlib",
-    "plot_type": "scatter",
-    "data_type": "tabular",
-    "domain": "data-science",
-    "features": {"complexity": "beginner", "interactivity": "static"}
-  },
-  "search_keywords": ["scatter", "matplotlib", "basic", "2d"],
-  "confidence_scores": {"overall": 0.89}
-}
-```
-
-**Query Example**:
-```javascript
-// Find all beginner matplotlib plots for data-science
-db.collection('plot_tags')
-  .where('tags.library', '==', 'matplotlib')
-  .where('tags.domain', '==', 'data-science')
-  .where('tags.features.complexity', '==', 'beginner')
-  .get();
-```
-
----
-
-### Implementation Checklist (When Ready)
-
-- [ ] Confirm PostgreSQL performance is actually bottleneck
-- [ ] Design detailed Firestore schema (based on actual usage patterns)
-- [ ] Create composite indices for common query combinations
-- [ ] Implement sync mechanism (PostgreSQL → Firestore)
-- [ ] Add consistency checks (daily verification)
-- [ ] Monitor costs (estimated <$1/month for 10K docs)
-- [ ] Migrate existing tags from PostgreSQL to Firestore
-- [ ] Update API to query Firestore for tag searches
-- [ ] Keep PostgreSQL tags as backup/audit trail
-
----
-
-### Cost Estimate (For Future Reference)
-
-**Storage**: 10,000 documents × 3 KB = ~30 MB → <$0.50/month
-**Reads**: 1M reads/month → ~$0.36/month
-**Writes**: 100K writes/month → ~$0.18/month
-**Total**: <$1/month
-
----
-
-**See Also**:
-- **Tag Taxonomy**: `docs/concepts/tagging-system.md`
-- **Tagging Rules**: `rules/generation/v1.0.0-draft/tagging-rules.md`
-- **Auto-Tagging Workflow**: `docs/workflow.md` (Flow 4.5)
-
----
-
 ## Monitoring
 
 ### Key Metrics
diff --git a/docs/architecture/plausible.md b/docs/reference/plausible.md
similarity index 100%
rename from docs/architecture/plausible.md
rename to docs/reference/plausible.md
diff --git a/docs/architecture/repository.md b/docs/reference/repository.md
similarity index 86%
rename from docs/architecture/repository.md
rename to docs/reference/repository.md
index 5137fd5510..60a264d967 100644
--- a/docs/architecture/repository.md
+++ b/docs/reference/repository.md
@@ -18,7 +18,7 @@ plots/{specification-id}/
     └── ...
 ```
 
-**Key Principle**: The repository contains **only production code and final specs**. Quality reports and workflow state are managed in GitHub Issues. Preview images are stored in GCS.
+**Key Principle**: The repository is the **source of truth** for all code, specs, and quality data. Preview images are stored in GCS. Database is derived via sync.
 
 **Key Benefit**: Per-library metadata files eliminate merge conflicts when multiple implementations are generated in parallel.
 
@@ -64,23 +64,32 @@ pyplots/
 ├── prompts/                           # AI agent prompts
 │   ├── plot-generator.md              # Base rules for code generation
 │   ├── quality-criteria.md            # Quality evaluation criteria
-│   ├── quality-evaluator.md           # Multi-LLM evaluation prompt
-│   ├── auto-tagger.md                 # Automatic tagging
+│   ├── quality-evaluator.md           # AI quality evaluation prompt
 │   ├── spec-validator.md              # Validates plot requests
 │   ├── spec-id-generator.md           # Assigns spec IDs
-│   └── library/                       # Library-specific rules
-│       ├── matplotlib.md
-│       ├── seaborn.md
+│   ├── default-style-guide.md         # Default visual style rules
+│   ├── library/                       # Library-specific rules (9 files)
+│   │   ├── matplotlib.md
+│   │   ├── seaborn.md
+│   │   └── ...
+│   ├── templates/                     # Templates for new specs
+│   │   ├── specification.md
+│   │   └── specification.yaml
+│   └── workflow-prompts/              # Workflow-specific prompts
 │       └── ...
 │
 ├── core/                              # Shared business logic
 │   ├── __init__.py
 │   ├── config.py                      # Configuration (.env-based)
+│   ├── constants.py                   # Library metadata, constants
+│   ├── images.py                      # Image processing utilities
+│   ├── utils.py                       # General utilities
 │   ├── database/                      # Database layer
 │   │   ├── __init__.py
 │   │   ├── connection.py              # Async connection management
 │   │   ├── models.py                  # SQLAlchemy ORM models
-│   │   └── repositories.py            # Repository pattern
+│   │   ├── repositories.py            # Repository pattern
+│   │   └── types.py                   # Custom SQLAlchemy types
 │   └── generators/                    # Reusable code generators
 │       └── plot_generator.py          # Plot code generation utilities
 │
@@ -105,11 +114,13 @@ pyplots/
 │       └── workflow_cli.py            # CLI for workflows
 │
 ├── tests/                             # Test suite
-│   └── unit/
-│       ├── api/
-│       ├── core/
-│       ├── prompts/
-│       └── workflows/
+│   ├── conftest.py                    # Shared fixtures
+│   ├── unit/                          # Fast, mocked tests
+│   │   ├── api/
+│   │   ├── core/
+│   │   └── ...
+│   ├── integration/                   # SQLite in-memory tests
+│   └── e2e/                           # Real PostgreSQL tests
 │
 ├── .github/
 │   └── workflows/                     # GitHub Actions CI/CD
@@ -136,10 +147,11 @@ pyplots/
 │   └── upgrade_specs*.py              # Spec upgrade utilities
 │
 ├── docs/                              # Documentation
-│   ├── architecture/
-│   ├── workflow.md
-│   ├── specs-guide.md
-│   └── development.md
+│   ├── concepts/
+│   ├── reference/
+│   ├── workflows/
+│   ├── contributing.md
+│   └── index.md
 │
 ├── pyproject.toml                     # Python project config (uv)
 ├── uv.lock                            # Dependency lock file
@@ -171,12 +183,12 @@ plots/{specification-id}/
 ```
 
 **Characteristics**:
-- ✅ Self-contained (spec + metadata + code together)
+- ✅ Self-contained (spec + metadata + code + quality reports together)
 - ✅ Easy to navigate (one folder = one plot type)
 - ✅ Synced to PostgreSQL via `sync-postgres.yml`
 - ✅ No merge conflicts (per-library metadata files)
+- ✅ Quality reports in `metadata/{library}.yaml` (review section)
 - ❌ NO preview images (stored in GCS)
-- ❌ NO quality reports (stored in GitHub Issues)
 
 **Example**: `plots/scatter-basic/` contains everything for the basic scatter plot.
 
@@ -391,14 +403,17 @@ plt.savefig('plot.png', dpi=300)
 **Purpose**: AI agent prompts for code generation and quality evaluation
 
 **Subdirectories**:
-- `templates/` - Templates for new specs (`spec.md`, `metadata.yaml`)
+- `library/` - Library-specific rules (9 files: matplotlib, seaborn, plotly, etc.)
+- `templates/` - Templates for new specs (`specification.md`, `specification.yaml`)
+- `workflow-prompts/` - Workflow-specific prompt templates
 
 **Files**:
 - `plot-generator.md` - Base rules for all implementations
 - `quality-criteria.md` - Definition of quality
-- `quality-evaluator.md` - Multi-LLM evaluation
-- `auto-tagger.md` - Automatic tagging
-- `library/*.md` - Library-specific rules (9 files)
+- `quality-evaluator.md` - AI quality evaluation
+- `spec-validator.md` - Validates plot requests
+- `spec-id-generator.md` - Assigns spec IDs
+- `default-style-guide.md` - Default visual style rules
 
 ---
 
@@ -508,15 +523,13 @@ Always named by library: `{library}.py`
 - **Where**: Google Cloud Storage (`gs://pyplots-images/plots/...`)
 - **Why**: Binary files bloat git history
 
-### ❌ Quality Reports
-- **Where**: GitHub Issues (as bot comments)
-- **Why**: Keeps repo clean, increases transparency
-
 ### ❌ Secrets
 - **Where**: Environment variables, Cloud Secret Manager
 - **Why**: Security
 - **Note**: `.env.example` shows required variables without values
 
+**Note**: Quality reports ARE stored in the repository in `metadata/{library}.yaml` (the `review:` section with strengths, weaknesses, criteria_checklist, verdict).
+
 ---
 
 ## Database Sync
@@ -529,9 +542,10 @@ The `sync-postgres.yml` workflow syncs `plots/` to PostgreSQL on push to main:
 - Implementation code (full Python source)
 - Implementation metadata (quality score, generation info from metadata/*.yaml)
 - Preview URLs from per-library metadata files
+- Quality review data (strengths, weaknesses, criteria_checklist, verdict)
 
 **Source of Truth**: The `plots/` directory is authoritative. Database is derived.
 
 ---
 
-*For implementation details, see [specs-guide.md](../specs-guide.md) and [development.md](../development.md)*
+*For contribution guidelines, see [contributing.md](../contributing.md)*
diff --git a/docs/architecture/seo.md b/docs/reference/seo.md
similarity index 100%
rename from docs/architecture/seo.md
rename to docs/reference/seo.md
diff --git a/docs/concepts/tagging-system.md b/docs/reference/tagging-system.md
similarity index 100%
rename from docs/concepts/tagging-system.md
rename to docs/reference/tagging-system.md
diff --git a/docs/specs-guide.md b/docs/specs-guide.md
deleted file mode 100644
index 36f0d0b77e..0000000000
--- a/docs/specs-guide.md
+++ /dev/null
@@ -1,121 +0,0 @@
-# Plot Specification Guide
-
-## Overview
-
-Plot specifications are **library-agnostic descriptions** of what a plot should show. They live in `plots/{spec-id}/spec.md`.
-
-**Key Principle**: A spec describes **WHAT** to visualize, not **HOW** to implement it.
-
----
-
-## File Location
-
-Each spec lives in its own directory:
-```
-plots/{spec-id}/
-├── spec.md              ← Specification file
-├── metadata.yaml        ← Tags, generation info
-└── implementations/     ← Library code
-```
-
----
-
-## Spec Format
-
-```markdown
-# {spec-id}: {Title}
-
-## Description
-
-{2-4 sentences: What does this plot show? When should you use it?}
-
-## Applications
-
-- {Realistic scenario 1 with domain context}
-- {Realistic scenario 2}
-- {Realistic scenario 3}
-
-## Data
-
-- `{column}` ({type}) - {what it represents}
-- `{column}` ({type}) - {what it represents}
-- Size: {recommended data size}
-- Example: {dataset reference or description}
-
-## Notes
-
-- {Optional implementation hints or special requirements}
-```
-
----
-
-## Sections
-
-### Title
-Format: `# {spec-id}: {Human-Readable Title}`
-
-Example: `# scatter-basic: Basic Scatter Plot`
-
-### Description
-2-4 sentences (prose text) explaining:
-- What the plot visualizes
-- When to use it
-- What makes it useful
-
-### Applications
-3-4 realistic scenarios with domain context (finance, science, marketing, etc.)
-
-### Data
-Simple list format:
-- Required columns with types (numeric, categorical, datetime)
-- Recommended data size
-- Example dataset reference
-
-### Notes (Optional)
-Implementation hints, visual preferences, or special requirements.
-
----
-
-## Spec ID Naming
-
-Format: `{type}-{variant}` or `{type}-{variant}-{modifier}`
-
-Examples:
-- `scatter-basic` - Simple scatter plot
-- `scatter-color-groups` - Scatter with color-coded groups
-- `bar-grouped-horizontal` - Horizontal grouped bars
-
-Rules:
-- Lowercase only
-- Hyphens as separators
-- Descriptive names (no numbers needed)
-
----
-
-## Workflow
-
-1. **User creates GitHub Issue** with plot idea
-2. **Bot assigns spec ID** and validates request
-3. **Maintainer adds `approved` label**
-4. **AI generates spec file** in `plots/{spec-id}/spec.md`
-5. **AI generates implementations** for all 9 libraries
-6. **Quality check** runs automatically
-7. **Auto-merge** if quality passes
-
----
-
-## Writing Good Specs
-
-### DO
-- Be specific about data requirements
-- Use realistic applications with domain context
-- Keep description concise (2-4 sentences)
-
-### DON'T
-- Include library-specific details
-- Add quality criteria (handled by central prompts)
-- Over-specify styling (AI decides based on style guide)
-
----
-
-*See `prompts/templates/spec.md` for the full template.*
diff --git a/docs/workflow.md b/docs/workflow.md
deleted file mode 100644
index 3e6b9543ef..0000000000
--- a/docs/workflow.md
+++ /dev/null
@@ -1,562 +0,0 @@
-# 🔄 pyplots Automation Workflow
-
-## Overview
-
-pyplots is a **community-driven, AI-powered platform** that automatically discovers, generates, tests, and maintains Python plotting examples. This document describes the high-level automation architecture that makes this possible.
-
-### Philosophy
-
-- **Start Simple, Scale Intelligently**: Begin with basics (Twitter, matplotlib), expand based on learnings
-- **Cost-Conscious Design**: Leverage existing subscriptions and smart resource allocation
-- **Quality Over Quantity**: Multi-LLM validation ensures only excellent examples go live
-- **Community-Driven**: Ideas from the data science community, curated by AI, approved by humans
-- **Always Current**: Event-based maintenance keeps examples updated with latest libraries and LLMs
-
-### Key Principles
-
-1. **Images in GCS, Code in GitHub**: Plot PNGs stored in Google Cloud Storage, source code version-controlled
-2. **Multi-Version Support**: All plots tested across Python 3.11+ (3.11, 3.12, 3.13, 3.13 primary)
-3. **Hybrid Automation**: AI handles routine tasks, humans approve critical decisions
-4. **Standard Datasets**: Use well-known datasets (pandas iris, seaborn tips, kaggle) for realistic previews
-5. **Event-Based Optimization**: Update plots when LLM/library versions change, not on fixed schedules
-
----
-
-## System Architecture
-
-```mermaid
-graph TB
-    subgraph "Discovery & Input"
-        SM[Social Media<br/>Twitter, Reddit, GitHub, ArXiv]
-        GI[GitHub Issues<br/>Community Ideas]
-    end
-
-    subgraph "Orchestration Layer"
-        N8N[n8n Cloud<br/>Workflow Engine]
-    end
-
-    subgraph "AI Processing"
-        CCM[Claude Code Max<br/>Primary AI]
-        VTX[Vertex AI<br/>Multi-LLM Critical Decisions]
-    end
-
-    subgraph "Testing & Generation"
-        GHA[GitHub Actions<br/>Multi-Version Tests + Preview Gen]
-        DS[Standard Datasets<br/>pandas, seaborn, kaggle]
-    end
-
-    subgraph "Storage & Deployment"
-        GH[GitHub Repository<br/>Code Storage]
-        GCS[Google Cloud Storage<br/>Image Hosting]
-        SQL[Cloud SQL<br/>Metadata]
-        CR[Cloud Run<br/>Web Platform]
-    end
-
-    SM --> N8N
-    GI --> N8N
-    N8N --> CCM
-    CCM --> GHA
-    GHA --> DS
-    GHA --> GCS
-    GHA --> GH
-    CCM --> VTX
-    VTX --> SQL
-    GCS --> CR
-    SQL --> CR
-    GH --> CR
-```
-
-### Component Responsibilities
-
-| Component | Purpose | Usage Notes |
-|-----------|---------|-------------|
-| **GitHub Actions** | Code generation, testing, preview gen, quality checks, deployment | See `.github/workflows/` for implementation |
-| **n8n Cloud Pro** | Social media monitoring, posting, issue triage, maintenance scheduling | External service integration |
-| **Claude Code Max** | Code generation, routine evaluation, post content | Primary AI workload |
-| **Vertex AI (Multi-LLM)** | Critical quality decisions | Multi-LLM consensus for complex plots |
-| **Google Cloud Storage** | PNG hosting with lifecycle management | Preview images + generated plots |
-| **Cloud SQL (PostgreSQL)** | Metadata, tags, quality scores, promotion queue | All structured data |
-| **X (Twitter) API** | Social media posting | Max 2 posts/day |
-
-**Workflow files**: See `.github/workflows/` for all automation implementations (ci-*, bot-*, gen-*, util-*).
-
----
-
-## Core Automation Flows
-
-### Flow 1: Discovery & Ideation
-n8n monitors social media daily → AI extracts plot ideas → Creates GitHub issues with draft specs → Human reviews and approves
-
-### Flow 2: Specification Creation (with Approval Gate)
-
-User adds `spec-request` label to issue → **`spec-create.yml`** runs:
-
-1. Creates branch: `specification/{specification-id}`
-2. Claude generates: `plots/{specification-id}/specification.md` + `specification.yaml`
-3. Creates PR: `specification/{specification-id}` → `main`
-4. Posts analysis comment, waits for approval
-
-```
-Issue + [spec-request] label
-       ↓
-spec-create.yml
-  ├─ Creates branch: specification/scatter-basic
-  ├─ Creates: plots/scatter-basic/specification.md
-  ├─ Creates: plots/scatter-basic/specification.yaml
-  └─ Creates PR → main (waits for approval)
-       ↓
-Maintainer adds [approved] label
-       ↓
-spec-create.yml (merge job)
-  ├─ Merges PR to main
-  ├─ Adds [spec-ready] label
-  └─ sync-postgres.yml triggers
-```
-
-**Specification is now in main, ready for implementations.**
-
-### Flow 3: Implementation Generation
-
-Implementations run **independently** - each library gets its own workflow:
-
-**Triggers:**
-- `generate:{library}` label on issue (e.g., `generate:matplotlib`)
-- `workflow_dispatch` for manual triggering
-- `bulk-generate.yml` for batch operations
-
-**Process per library:**
-1. **`impl-generate.yml`** creates branch: `implementation/{specification-id}/{library}`
-2. Claude generates code, tests, uploads preview to GCS staging
-3. Creates PR: `implementation/{specification-id}/{library}` → `main`
-4. Triggers `impl-review.yml`
-
-```
-Issue + [generate:matplotlib] label  OR  workflow_dispatch
-       ↓
-impl-generate.yml
-  ├─ Creates branch: implementation/scatter-basic/matplotlib
-  ├─ Generates: plots/scatter-basic/implementations/matplotlib.py
-  ├─ Uploads to GCS staging
-  └─ Creates PR → main, triggers impl-review.yml
-```
-
-**Key benefit**: Each library runs independently - no single point of failure!
-
-### Flow 4: Multi-Version Testing
-PR created → `ci-plottest.yml` runs tests across Python 3.11+ → Reports results
-
-### Flow 5: AI Review
-PR created → **`impl-review.yml`** runs:
-
-1. Downloads plot images from GCS staging
-2. Claude evaluates: Spec ↔ Code ↔ Preview
-3. Posts review comment with score
-4. Adds labels: `quality:XX`, `ai-approved` OR `ai-rejected`
-
-```
-impl-review.yml
-  ├─ Score ≥90 → [ai-approved] → triggers impl-merge.yml
-  └─ Score <90 → [ai-rejected] → triggers impl-repair.yml
-```
-
-### Flow 6: Repair Loop (max 3 attempts)
-PR labeled `ai-rejected` → **`impl-repair.yml`** triggers:
-
-1. Reads AI feedback from PR comments
-2. Claude fixes the implementation
-3. Re-uploads to GCS staging
-4. Re-triggers `impl-review.yml`
-5. After 3 failures: `not-feasible` label
-
-**Note**: Each library repairs independently - matplotlib can be on attempt 3 while seaborn already merged!
-
-### Flow 7: Auto-Merge
-
-PR labeled `ai-approved` → **`impl-merge.yml`** triggers:
-
-1. Squash-merges PR to main
-2. Creates `metadata/{library}.yaml` with quality score and generation info
-3. Promotes GCS images: staging → production
-4. Updates issue labels: `impl:{library}:done`
-5. `sync-postgres.yml` triggers automatically
-
-```
-impl-merge.yml
-  ├─ Squash merge PR → main
-  ├─ Creates: plots/scatter-basic/metadata/matplotlib.yaml
-  ├─ Promotes GCS: staging → production
-  └─ sync-postgres.yml triggers (database updated)
-```
-
-### Flow 8: Deployment & Maintenance
-Merged to main → Deploy to Cloud Run → Publicly visible on website → Event-based maintenance (LLM/library updates) → A/B test improvements
-
-### Flow 9: Social Media Promotion
-Deployed plot → Added to promotion queue (prioritized by quality score) → n8n posts 2x/day at 10 AM & 3 PM CET → Claude generates content → Posts to X with preview image
-
----
-
-## Decoupled Architecture
-
-The new architecture separates specification and implementation processes:
-
-**Benefits:**
-- **No single point of failure** - Each library runs independently
-- **Specifications can land in main without implementations**
-- **Partial implementations OK** - 6/9 done = fine
-- **No merge conflicts** - Per-library metadata files
-- **Flexible triggers** - Labels for single, dispatch for bulk
-- **PostgreSQL synced on every merge to main**
-
-### Implementation Lifecycle
-
-```mermaid
-graph LR
-    A[Issue + generate:matplotlib] --> B[impl-generate.yml]
-    B --> C[PR created]
-    C --> D[impl-review.yml]
-    D -->|Score ≥90| E[ai-approved]
-    D -->|Score <90| F[ai-rejected]
-    F -->|Attempt <3| G[impl-repair.yml]
-    G --> D
-    F -->|Attempt =3| H[not-feasible]
-    E --> I[impl-merge.yml]
-    I --> J[merged to main]
-    J --> K[impl:matplotlib:done]
-```
-
-### Label System
-
-**Specification Labels:**
-| Label | Meaning |
-|-------|---------|
-| `spec-request` | New specification request |
-| `spec-update` | Update existing specification |
-| `spec-ready` | Specification merged to main |
-
-**Implementation Labels:**
-| Label | Meaning |
-|-------|---------|
-| `generate:{library}` | Trigger generation (e.g., `generate:matplotlib`) |
-| `impl:{library}:pending` | Generation in progress |
-| `impl:{library}:done` | Implementation merged to main |
-| `impl:{library}:failed` | Max retries exhausted |
-
-**PR Labels:**
-| Label | Meaning |
-|-------|---------|
-| `ai-approved` | Passed review (score ≥90, or ≥50 after 3 attempts) |
-| `ai-rejected` | Failed review (score <90), triggers repair loop |
-| `ai-attempt-1/2/3` | Retry counter |
-| `quality-poor` | Score <50, needs fundamental fixes |
-| `quality:XX` | Quality score (e.g., `quality:92`) |
-
-**Quality Workflow:**
-- **≥ 90**: ai-approved, merged immediately
-- **< 90**: ai-rejected, repair loop (up to 3 attempts)
-- **After 3 attempts**: ≥ 50 → merge, < 50 → close PR and regenerate
-
-### Bulk Operations
-
-Use `bulk-generate.yml` for batch operations:
-
-```bash
-# All libraries for one spec:
-workflow_dispatch: specification_id=scatter-basic, library=all
-
-# One library across all specs:
-workflow_dispatch: specification_id=all, library=matplotlib
-```
-
-**Concurrency**: Max 3 parallel implementation workflows globally.
-
----
-
-## Flow Integration
-
-```mermaid
-graph TD
-    A[Flow 1: Discovery] -->|GitHub Issue| B{Human Review}
-    B -->|Add spec-request| C[Flow 2: spec-create.yml]
-    B -->|Rejected| Z[End]
-
-    C -->|Creates PR| C1[Specification PR]
-    C1 -->|Maintainer adds approved| C2[Merge to main]
-    C2 -->|spec-ready label| D[Ready for implementations]
-
-    D -->|Add generate:lib label| E[Flow 3: impl-generate.yml]
-    E -->|Creates PR| F[Implementation PR]
-    F --> G{Flow 5: impl-review.yml}
-
-    G -->|Score ≥90| H[ai-approved]
-    G -->|Score <90| I[ai-rejected]
-
-    I --> J{Attempts < 3?}
-    J -->|Yes| K[Flow 6: impl-repair.yml]
-    K --> G
-    J -->|No| L[not-feasible]
-
-    H --> M[Flow 7: impl-merge.yml]
-    M -->|Merge to main| M1[Creates metadata/lib.yaml]
-    M1 --> M2[sync-postgres.yml]
-    M2 --> N[🌐 Publicly Visible]
-
-    L --> L1[impl:lib:failed label]
-
-    N --> P[Flow 9: Promotion Queue]
-    P --> Q{Daily Limit?}
-    Q -->|< 2 posts| R[Post to X]
-    Q -->|Limit| S[Wait]
-    R --> Z
-    S -.->|Next day| Q
-
-    style A fill:#e1f5ff
-    style C fill:#ffe4b5
-    style E fill:#fff4e1
-    style G fill:#f0e1ff
-    style M fill:#98FB98
-    style N fill:#90EE90
-    style L fill:#FF6B6B
-    style P fill:#E6E6FA
-    style R fill:#98FB98
-```
-
----
-
-## Decision Framework
-
-### AI Decides Automatically
-
-✅ **Similar plots** (high semantic similarity to existing specs)
-✅ **Routine quality checks** (standard visualizations)
-✅ **Tag generation** (categorization and clustering)
-✅ **Version compatibility** detection (which Python versions supported)
-✅ **Standard optimizations** (code formatting, best practices)
-
-### Human Approval Required
-
-⚠️ **New plot types** (low similarity to existing specs)
-⚠️ **Complex visualizations** (3D, animations, interactive)
-⚠️ **Multi-LLM disagreement** (no majority consensus)
-⚠️ **Breaking changes** (major spec modifications)
-
-### Approval Mechanism
-
-Via **GitHub Issue Labels**:
-- `approved` → Proceed to code generation
-- `rejected` → Close issue
-- `needs-revision` → Request changes from proposer
-
----
-
-## Resource Management
-
-### Leveraging Existing Subscriptions
-
-| Resource | Subscription | Usage | Monthly Cost |
-|----------|-------------|-------|--------------|
-| **GitHub Pro** | ✅ Active | Actions (testing + preview gen) | Included |
-| **n8n Cloud Pro** | ✅ Active | Workflow orchestration | Included (subscribed) |
-| **Claude Code Max** | ✅ Active | Primary AI workload | Included |
-| **Google Cloud** | Pay-as-you-go | GCS, Cloud SQL, Cloud Run | Variable |
-| **Vertex AI** | Pay-per-use | Multi-LLM critical decisions only | Minimal |
-
-### Cost Optimization Strategies
-
-1. **Smart AI Usage**:
-   - Claude Code Max for routine work (already subscribed)
-   - Vertex AI multi-LLM only for critical decisions
-   - Avoid redundant evaluations
-
-2. **Efficient Storage**:
-   - Path structure: `plots/{spec-id}/{library}/plot.png`
-   - Thumbnails: `plot_thumb.png` (600px width) for gallery views
-   - Images never in git repository
-   - Only latest version stored (no version history)
-
-3. **Smart Scheduling**:
-   - Event-based maintenance (not daily scheduled)
-   - Batch processing when possible
-   - GitHub Actions matrix for parallel testing
-
-4. **Data Efficiency**:
-   - Standard datasets (no AI generation needed)
-   - Small CSVs in repo acceptable
-   - Reuse datasets across similar plots
-
----
-
-## Data & Testing Strategy
-
-### Sample Data for Previews
-
-**Critical Principle**: All plot code must be **100% standalone and deterministic**
-
-**Data Embedding Strategy**:
-
-1. **Small datasets** - Hardcoded dict/list directly in code (recommended)
-2. **Standard datasets** - Use `sns.load_dataset('iris')` or similar (always produces same data)
-3. **AI-generated data** - AI generates once with fixed seed, then hardcoded
-4. **Seeded random** - Use `np.random.seed(42)` for reproducibility
-
-**Why This Matters**:
-- Same code must produce same image every single time
-- Quality reviewers must see the exact image that will be deployed
-- Users must see the exact image shown in previews
-- No surprises, no randomness, complete reproducibility
-
-**Code Requirements**:
-- ✅ Self-contained (no external file loading)
-- ✅ Deterministic (same output every run)
-- ✅ Includes explanation text as docstring
-- ✅ Sample data embedded directly in code
-- ❌ No CSV file loading
-- ❌ No random data without fixed seed
-- ❌ No external API calls
-
-### Multi-Version Testing
-
-**Python Versions Supported**: 3.11+ (tested on 3.11, 3.12, 3.13, 3.13)
-
-**Primary Version**: Python 3.13 (required to pass, generates plot images)
-
-**Testing Infrastructure**: GitHub Actions matrix tests all Python versions in parallel. See `ci-plottest.yml`.
-
-**Test Triggers**:
-- On Pull Request creation
-- Before Quality Assurance flow
-- Not on every commit (saves resources)
-
-**Version Compatibility Documentation**:
-- Code optimized for Python 3.13 (newest)
-- Older versions (3.11-3.13) run as compatibility tests
-- Failures in older versions don't block the PR
-
-**Test Requirements**:
-- Python 3.13 tests must pass (primary)
-- Plot images only generated with Python 3.13
-- Older version failures logged but don't block merge
-
----
-
-## Phased Rollout
-
-### Phase 1: MVP (Current Focus)
-
-**Scope**:
-- 🎯 **Monitoring**: Twitter only
-- 📊 **Libraries**: All 8 supported (matplotlib, seaborn, plotly, bokeh, altair, plotnine, pygal, highcharts)
-- 🐍 **Python**: 3.13+ (primary), tested on 3.11-3.13
-- ✋ **Approval**: Manual for all new plots
-- ✅ **Quality**: Basic Claude evaluation
-- 📱 **Promotion**: X (Twitter) posting with 2/day limit
-
-**Supported Libraries**:
-| Library | Strength |
-|---------|----------|
-| matplotlib | The classic standard, maximum flexibility |
-| seaborn | Statistical visualizations, beautiful defaults |
-| plotly | Interactive web plots, dashboards, 3D |
-| bokeh | Interactive, streaming data, large datasets |
-| altair | Declarative/Vega-Lite, elegant exploration |
-| plotnine | ggplot2 syntax for R users |
-| pygal | Minimalistic SVG charts |
-| highcharts | Interactive web charts, stock charts |
-| lets-plot | ggplot2 grammar of graphics by JetBrains |
-
-**Goal**: Prove automation pipeline works end-to-end with all libraries
-
----
-
-### Phase 2: Expansion
-
-**Add**:
-- 🎯 **Monitoring**: + Reddit (r/dataisbeautiful, r/Python)
-- 🎯 **Monitoring**: + GitHub Trending/Discussions
-- 🤖 **Approval**: Hybrid (auto for similar, manual for new)
-- ✅ **Quality**: Multi-LLM for critical decisions
-- 📱 **Promotion**: + LinkedIn posts for professional audience
-
-**Goal**: Scale content production and improve automation
-
----
-
-### Phase 3: Full Automation
-
-**Add**:
-- 🎯 **Monitoring**: + ArXiv papers (academic visualizations)
-- 📊 **Libraries**: + specialized libraries as needed
-- 🤖 **Approval**: Intelligent auto-approval (high confidence)
-- 🔄 **Maintenance**: Proactive optimization suggestions
-- 🌐 **Community**: Public spec submissions via issues
-- 📱 **Promotion**: + Reddit posts (r/dataisbeautiful, r/Python), cross-platform coordination
-
-**Goal**: Comprehensive, self-maintaining plot library
-
----
-
-## Rule Versioning & Testing
-
-**NEW**: The system now includes versioned rules for code generation and quality evaluation.
-
-**Location**: `rules/` directory
-
-**Key Features**:
-- 📋 **Versioned Rules**: Generation rules and quality criteria stored as Markdown (vX.Y.Z)
-- 🧪 **A/B Testing**: Compare rule versions before deploying
-- 📊 **Audit Trail**: Know which rule version generated each plot
-- 🔄 **Rollback**: Instant rollback to previous rules if issues arise
-- 📈 **Scientific Improvement**: Prove new rules are better with data
-
-**Current Status** (Documentation Phase):
-- ✅ Rule templates created (rules/templates/)
-- ✅ Initial draft rules (rules/generation/v1.0.0-draft/)
-- ⏳ Automation not yet implemented
-- ⏳ A/B testing framework planned
-
-**Integration with Workflow**:
-- When automation is implemented, all code generation will use specific rule versions
-- Quality evaluation will reference versioned criteria
-- Rule improvements will be A/B tested before deployment
-
-**See Also**:
-- [A/B Testing Strategies](docs/concepts/ab-testing-rules.md)
-- [Claude Skill Concept](docs/concepts/claude-skill-plot-generation.md)
-
----
-
-## Summary
-
-This workflow ensures:
-
-✅ **Decoupled Architecture**:
-   - Specification and implementation processes run independently
-   - No single point of failure
-   - Specifications can land in main without implementations
-   - Partial implementations OK (6/9 done = fine)
-   - Per-library metadata files (no merge conflicts!)
-
-✅ **Flexible Triggers**:
-   - Labels (`generate:matplotlib`) for single implementations
-   - `workflow_dispatch` for manual control
-   - `bulk-generate.yml` for batch operations
-   - Max 3 parallel implementations globally
-
-✅ **Multi-Layer Quality Control**:
-   - AI review with vision (code + image evaluation)
-   - Self-repair loop (max 3 attempts per library)
-   - Quality scores tracked in metadata
-   - Feedback-driven optimization on rejection
-
-✅ **PostgreSQL Synced on Every Merge**:
-   - `sync-postgres.yml` triggers on push to main
-   - Database always reflects repository state
-
-✅ **Only High-Quality Plots on Website**: Failed attempts never publicly visible
-✅ **Automated Marketing**: Queue-based social media promotion with smart rate limiting (max 2 posts/day)
-✅ **Cost-Conscious** design leveraging existing subscriptions
-✅ **Smart Storage** with GCS staging/production flow
-✅ **Deterministic & Reproducible**: Same code = same image every time
-✅ **Community-Driven** with AI curation and human oversight
-
-The system is designed to **scale from MVP to full automation** while maintaining the highest quality standards, controlling costs, and automatically promoting the best content to the community.
diff --git a/docs/workflows/overview.md b/docs/workflows/overview.md
new file mode 100644
index 0000000000..517c78b9ed
--- /dev/null
+++ b/docs/workflows/overview.md
@@ -0,0 +1,150 @@
+# Workflow Overview
+
+## How pyplots Automation Works
+
+pyplots uses GitHub Actions to automate the entire plot lifecycle: from specification creation to implementation generation, quality review, and deployment.
+
+---
+
+## The Two Main Pipelines
+
+### 1. Specification Pipeline
+
+```
+Issue + [spec-request] label
+       |
+       v
+spec-create.yml
+  |-- Creates branch: specification/{spec-id}
+  |-- Generates: specification.md + specification.yaml
+  |-- Creates PR --> main
+  |-- Posts analysis comment
+       |
+       v (maintainer adds [approved] label to Issue)
+       |
+spec-create.yml (merge job)
+  |-- Merges PR to main
+  |-- Adds [spec-ready] label
+  |-- Triggers sync-postgres.yml
+```
+
+### 2. Implementation Pipeline
+
+```
+Issue + [generate:{library}] label  OR  workflow_dispatch
+       |
+       v
+impl-generate.yml
+  |-- Creates branch: implementation/{spec-id}/{library}
+  |-- AI generates code
+  |-- Creates metadata/{library}.yaml (initial)
+  |-- Tests execution
+  |-- Uploads preview to GCS staging
+  |-- Creates PR --> main
+       |
+       v
+impl-review.yml
+  |-- AI evaluates code + image
+  |-- Posts review comment with score
+  |-- Updates metadata/{library}.yaml (quality_score, review feedback)
+  |-- Adds [quality:XX] label
+       |
+       |-- Score >= 90 --> [ai-approved] --> impl-merge.yml
+       |                                        |-- Squash merge
+       |                                        |-- Promotes GCS: staging --> production
+       |                                        |-- Triggers sync-postgres.yml
+       |
+       |-- Score < 90 --> [ai-rejected] --> impl-repair.yml (max 3 attempts)
+                                               |-- Reads AI feedback
+                                               |-- Fixes implementation
+                                               |-- Re-triggers impl-review.yml
+```
+
+---
+
+## Label System
+
+### Specification Labels (on Issues)
+
+| Label | Meaning | Set By |
+|-------|---------|--------|
+| `spec-request` | New specification request | User |
+| `spec-update` | Update existing specification | User |
+| `spec-ready` | Specification merged, ready for implementations | Workflow |
+
+### Implementation Labels (on Issues)
+
+| Label | Meaning | Set By |
+|-------|---------|--------|
+| `generate:{library}` | Trigger generation for library | User |
+| `impl:{library}:pending` | Generation in progress | Workflow |
+| `impl:{library}:done` | Implementation merged to main | Workflow |
+| `impl:{library}:failed` | Max retries exhausted | Workflow |
+
+### PR Labels (on Pull Requests)
+
+| Label | Meaning | Set By |
+|-------|---------|--------|
+| `ai-approved` | Quality check passed (score >= 90, or >= 50 after 3 attempts) | Workflow |
+| `ai-rejected` | Quality check failed, triggers repair | Workflow |
+| `ai-attempt-1/2/3` | Retry counter | Workflow |
+| `quality:XX` | Quality score (e.g., quality:92) | Workflow |
+| `quality-poor` | Score < 50, needs fundamental fixes | Workflow |
+
+### Approval Labels
+
+| Label | Meaning | Set By |
+|-------|---------|--------|
+| `approved` | Human approved specification | Maintainer |
+| `rejected` | Human rejected | Maintainer |
+
+---
+
+## Quality Workflow
+
+- **Score >= 90**: Immediately approved and merged
+- **Score < 90**: Repair loop (up to 3 attempts)
+- **After 3 attempts**:
+  - Score >= 50: Merge anyway
+  - Score < 50: Close PR, mark as failed
+
+---
+
+## Key Principles
+
+1. **Decoupled**: Each library runs independently (no single point of failure)
+2. **Partial OK**: 6/9 implementations done = fine
+3. **No merge conflicts**: Per-library metadata files
+4. **Auto-sync**: Database updated on every merge to main
+5. **GCS flow**: staging --> production only after merge
+
+---
+
+## Workflow Files
+
+Located in `.github/workflows/`:
+
+| Workflow | Purpose |
+|----------|---------|
+| `spec-create.yml` | Creates new specifications |
+| `spec-update.yml` | Updates existing specifications |
+| `impl-generate.yml` | Generates single implementation |
+| `impl-review.yml` | AI quality review |
+| `impl-repair.yml` | Fixes rejected implementations |
+| `impl-merge.yml` | Merges approved PRs |
+| `bulk-generate.yml` | Batch implementation generation |
+| `sync-postgres.yml` | Syncs plots/ to database |
+
+---
+
+## Bulk Operations
+
+```bash
+# All libraries for one spec:
+gh workflow run bulk-generate.yml -f specification_id=scatter-basic -f library=all
+
+# One library across all specs:
+gh workflow run bulk-generate.yml -f specification_id=all -f library=matplotlib
+```
+
+**Concurrency limit**: Max 3 parallel implementations globally.
diff --git a/plots/alluvial-basic/specification.yaml b/plots/alluvial-basic/specification.yaml
index 8432b133ed..2010e2bfde 100644
--- a/plots/alluvial-basic/specification.yaml
+++ b/plots/alluvial-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 1878
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - alluvial
diff --git a/plots/andrews-curves/specification.yaml b/plots/andrews-curves/specification.yaml
index 39da086d23..3dac679b76 100644
--- a/plots/andrews-curves/specification.yaml
+++ b/plots/andrews-curves/specification.yaml
@@ -11,7 +11,7 @@ issue: 2859
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - line
diff --git a/plots/area-stacked/specification.yaml b/plots/area-stacked/specification.yaml
index f07236c4f5..ee4bef11dc 100644
--- a/plots/area-stacked/specification.yaml
+++ b/plots/area-stacked/specification.yaml
@@ -11,7 +11,7 @@ issue: 2022
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - area
diff --git a/plots/bar-3d/specification.yaml b/plots/bar-3d/specification.yaml
index 8cbc3038d0..ff1844d7b1 100644
--- a/plots/bar-3d/specification.yaml
+++ b/plots/bar-3d/specification.yaml
@@ -11,7 +11,7 @@ issue: 2857
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - bar
diff --git a/plots/bar-diverging/specification.yaml b/plots/bar-diverging/specification.yaml
index 3c7b074c93..f0174fcc2a 100644
--- a/plots/bar-diverging/specification.yaml
+++ b/plots/bar-diverging/specification.yaml
@@ -11,7 +11,7 @@ issue: 2009
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - bar
diff --git a/plots/bar-error/specification.yaml b/plots/bar-error/specification.yaml
index d226e7a7be..98a3175409 100644
--- a/plots/bar-error/specification.yaml
+++ b/plots/bar-error/specification.yaml
@@ -11,7 +11,7 @@ issue: 2376
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - bar
diff --git a/plots/bar-feature-importance/specification.yaml b/plots/bar-feature-importance/specification.yaml
index 93261be854..7a41fd9c83 100644
--- a/plots/bar-feature-importance/specification.yaml
+++ b/plots/bar-feature-importance/specification.yaml
@@ -11,7 +11,7 @@ issue: 2276
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - bar
diff --git a/plots/bar-grouped/specification.yaml b/plots/bar-grouped/specification.yaml
index 31e655999a..ef8eea729a 100644
--- a/plots/bar-grouped/specification.yaml
+++ b/plots/bar-grouped/specification.yaml
@@ -11,7 +11,7 @@ issue: 1822
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - bar
diff --git a/plots/bar-horizontal/specification.yaml b/plots/bar-horizontal/specification.yaml
index d9c0ad9a90..7abbffe0c5 100644
--- a/plots/bar-horizontal/specification.yaml
+++ b/plots/bar-horizontal/specification.yaml
@@ -11,7 +11,7 @@ issue: 1946
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - bar
diff --git a/plots/bar-permutation-importance/specification.yaml b/plots/bar-permutation-importance/specification.yaml
index 5ba10dc348..7de1b64081 100644
--- a/plots/bar-permutation-importance/specification.yaml
+++ b/plots/bar-permutation-importance/specification.yaml
@@ -11,7 +11,7 @@ issue: 2998
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - bar
diff --git a/plots/bar-stacked-percent/specification.yaml b/plots/bar-stacked-percent/specification.yaml
index b70fb9650c..24506f6a99 100644
--- a/plots/bar-stacked-percent/specification.yaml
+++ b/plots/bar-stacked-percent/specification.yaml
@@ -11,7 +11,7 @@ issue: 2008
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - bar
diff --git a/plots/bar-stacked/specification.yaml b/plots/bar-stacked/specification.yaml
index 0894439dd5..a9c6013ee3 100644
--- a/plots/bar-stacked/specification.yaml
+++ b/plots/bar-stacked/specification.yaml
@@ -11,7 +11,7 @@ issue: 1947
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - bar
diff --git a/plots/bland-altman-basic/specification.yaml b/plots/bland-altman-basic/specification.yaml
index 54e13cf88f..4f0b72b92a 100644
--- a/plots/bland-altman-basic/specification.yaml
+++ b/plots/bland-altman-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 2032
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - bland-altman
diff --git a/plots/box-grouped/specification.yaml b/plots/box-grouped/specification.yaml
index 69bbefd815..1efde8c473 100644
--- a/plots/box-grouped/specification.yaml
+++ b/plots/box-grouped/specification.yaml
@@ -11,7 +11,7 @@ issue: 2017
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - box
diff --git a/plots/box-notched/specification.yaml b/plots/box-notched/specification.yaml
index c52a757a84..a69e48ce50 100644
--- a/plots/box-notched/specification.yaml
+++ b/plots/box-notched/specification.yaml
@@ -11,7 +11,7 @@ issue: 2019
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - box
diff --git a/plots/calibration-curve/specification.yaml b/plots/calibration-curve/specification.yaml
index bc6b0cb4d5..2494e72d0e 100644
--- a/plots/calibration-curve/specification.yaml
+++ b/plots/calibration-curve/specification.yaml
@@ -11,7 +11,7 @@ issue: 2331
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - calibration
diff --git a/plots/candlestick-volume/specification.yaml b/plots/candlestick-volume/specification.yaml
index d7a02a0bb5..d438372aad 100644
--- a/plots/candlestick-volume/specification.yaml
+++ b/plots/candlestick-volume/specification.yaml
@@ -11,7 +11,7 @@ issue: 3068
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - candlestick
diff --git a/plots/chernoff-basic/specification.yaml b/plots/chernoff-basic/specification.yaml
index 354b40b0f1..b956b165bd 100644
--- a/plots/chernoff-basic/specification.yaml
+++ b/plots/chernoff-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 3003
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - chernoff
diff --git a/plots/choropleth-basic/specification.yaml b/plots/choropleth-basic/specification.yaml
index 4af133ecd8..208fa1afbe 100644
--- a/plots/choropleth-basic/specification.yaml
+++ b/plots/choropleth-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 3069
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - choropleth
diff --git a/plots/circlepacking-basic/specification.yaml b/plots/circlepacking-basic/specification.yaml
index 8df1a1a1a7..c3903cf02c 100644
--- a/plots/circlepacking-basic/specification.yaml
+++ b/plots/circlepacking-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 2498
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - circle-packing
diff --git a/plots/circos-basic/specification.yaml b/plots/circos-basic/specification.yaml
index 20371ed253..37e35ccdd5 100644
--- a/plots/circos-basic/specification.yaml
+++ b/plots/circos-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 3005
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - circos
diff --git a/plots/confusion-matrix/specification.yaml b/plots/confusion-matrix/specification.yaml
index fdaad95a8b..11d03de4e9 100644
--- a/plots/confusion-matrix/specification.yaml
+++ b/plots/confusion-matrix/specification.yaml
@@ -11,7 +11,7 @@ issue: 2272
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - heatmap
diff --git a/plots/contour-decision-boundary/specification.yaml b/plots/contour-decision-boundary/specification.yaml
index 7e6bfa7cb2..243c4de52a 100644
--- a/plots/contour-decision-boundary/specification.yaml
+++ b/plots/contour-decision-boundary/specification.yaml
@@ -11,7 +11,7 @@ issue: 2921
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - contour
diff --git a/plots/contour-filled/specification.yaml b/plots/contour-filled/specification.yaml
index c56a94c08f..9a1120637d 100644
--- a/plots/contour-filled/specification.yaml
+++ b/plots/contour-filled/specification.yaml
@@ -11,7 +11,7 @@ issue: 2500
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - contour
diff --git a/plots/count-basic/specification.yaml b/plots/count-basic/specification.yaml
index 7a932da34c..77081e1c5d 100644
--- a/plots/count-basic/specification.yaml
+++ b/plots/count-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 2033
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - bar
diff --git a/plots/donut-nested/specification.yaml b/plots/donut-nested/specification.yaml
index 821decb846..8b8e2053b1 100644
--- a/plots/donut-nested/specification.yaml
+++ b/plots/donut-nested/specification.yaml
@@ -11,7 +11,7 @@ issue: 2015
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - donut
diff --git a/plots/elbow-curve/specification.yaml b/plots/elbow-curve/specification.yaml
index 7b98df8d41..1a26e43bc4 100644
--- a/plots/elbow-curve/specification.yaml
+++ b/plots/elbow-curve/specification.yaml
@@ -11,7 +11,7 @@ issue: 2333
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - line
diff --git a/plots/errorbar-asymmetric/specification.yaml b/plots/errorbar-asymmetric/specification.yaml
index 6899cf39da..123c23e78b 100644
--- a/plots/errorbar-asymmetric/specification.yaml
+++ b/plots/errorbar-asymmetric/specification.yaml
@@ -11,7 +11,7 @@ issue: 2781
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - errorbar
diff --git a/plots/forest-basic/specification.yaml b/plots/forest-basic/specification.yaml
index 61028c60b5..a8e770d22a 100644
--- a/plots/forest-basic/specification.yaml
+++ b/plots/forest-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 2378
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - forest
diff --git a/plots/gain-curve/specification.yaml b/plots/gain-curve/specification.yaml
index 9b7d1c0de0..3c245a049a 100644
--- a/plots/gain-curve/specification.yaml
+++ b/plots/gain-curve/specification.yaml
@@ -11,7 +11,7 @@ issue: 2440
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - line
diff --git a/plots/gantt-basic/specification.yaml b/plots/gantt-basic/specification.yaml
index 64f4a2c86b..6b3e169e67 100644
--- a/plots/gantt-basic/specification.yaml
+++ b/plots/gantt-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 2377
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - gantt
diff --git a/plots/heatmap-annotated/specification.yaml b/plots/heatmap-annotated/specification.yaml
index 86420f44d4..a5969451e8 100644
--- a/plots/heatmap-annotated/specification.yaml
+++ b/plots/heatmap-annotated/specification.yaml
@@ -11,7 +11,7 @@ issue: 1824
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - heatmap
diff --git a/plots/heatmap-clustered/specification.yaml b/plots/heatmap-clustered/specification.yaml
index fe59a092dc..ffe0d5dff2 100644
--- a/plots/heatmap-clustered/specification.yaml
+++ b/plots/heatmap-clustered/specification.yaml
@@ -11,7 +11,7 @@ issue: 2021
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - heatmap
diff --git a/plots/heatmap-correlation/specification.yaml b/plots/heatmap-correlation/specification.yaml
index f34eea4b54..2624442031 100644
--- a/plots/heatmap-correlation/specification.yaml
+++ b/plots/heatmap-correlation/specification.yaml
@@ -11,7 +11,7 @@ issue: 1948
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - heatmap
diff --git a/plots/histogram-2d/specification.yaml b/plots/histogram-2d/specification.yaml
index 98d4d49f52..122c0081f7 100644
--- a/plots/histogram-2d/specification.yaml
+++ b/plots/histogram-2d/specification.yaml
@@ -11,7 +11,7 @@ issue: 2012
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - histogram
diff --git a/plots/histogram-density/specification.yaml b/plots/histogram-density/specification.yaml
index 541849fca4..130805b58a 100644
--- a/plots/histogram-density/specification.yaml
+++ b/plots/histogram-density/specification.yaml
@@ -11,7 +11,7 @@ issue: 2442
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - histogram
diff --git a/plots/histogram-kde/specification.yaml b/plots/histogram-kde/specification.yaml
index bd8b10902a..5466c3431f 100644
--- a/plots/histogram-kde/specification.yaml
+++ b/plots/histogram-kde/specification.yaml
@@ -11,7 +11,7 @@ issue: 1823
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - histogram
diff --git a/plots/histogram-overlapping/specification.yaml b/plots/histogram-overlapping/specification.yaml
index 5cf51fa52a..fe010dc930 100644
--- a/plots/histogram-overlapping/specification.yaml
+++ b/plots/histogram-overlapping/specification.yaml
@@ -11,7 +11,7 @@ issue: 2010
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - histogram
diff --git a/plots/hive-basic/specification.yaml b/plots/hive-basic/specification.yaml
index 13e66fea2b..0374419016 100644
--- a/plots/hive-basic/specification.yaml
+++ b/plots/hive-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 1879
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - hive
diff --git a/plots/horizon-basic/specification.yaml b/plots/horizon-basic/specification.yaml
index 9c733faba5..980976bee1 100644
--- a/plots/horizon-basic/specification.yaml
+++ b/plots/horizon-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 1877
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - horizon
diff --git a/plots/learning-curve-basic/specification.yaml b/plots/learning-curve-basic/specification.yaml
index 367cc8f7bf..8a0c3cf6c3 100644
--- a/plots/learning-curve-basic/specification.yaml
+++ b/plots/learning-curve-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 2275
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - line
diff --git a/plots/lift-curve/specification.yaml b/plots/lift-curve/specification.yaml
index 83d97fe775..24c2c5ba9e 100644
--- a/plots/lift-curve/specification.yaml
+++ b/plots/lift-curve/specification.yaml
@@ -11,7 +11,7 @@ issue: 2379
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - lift
diff --git a/plots/line-annotated-events/specification.yaml b/plots/line-annotated-events/specification.yaml
index 40fa6346f4..09182ff94c 100644
--- a/plots/line-annotated-events/specification.yaml
+++ b/plots/line-annotated-events/specification.yaml
@@ -11,7 +11,7 @@ issue: 2997
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - line
diff --git a/plots/line-confidence/specification.yaml b/plots/line-confidence/specification.yaml
index f24539483f..3a4def987b 100644
--- a/plots/line-confidence/specification.yaml
+++ b/plots/line-confidence/specification.yaml
@@ -11,7 +11,7 @@ issue: 2007
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - line
diff --git a/plots/line-interactive/specification.yaml b/plots/line-interactive/specification.yaml
index dbc2ba5366..b96f6e1a8c 100644
--- a/plots/line-interactive/specification.yaml
+++ b/plots/line-interactive/specification.yaml
@@ -11,7 +11,7 @@ issue: 2787
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - line
diff --git a/plots/line-loss-training/specification.yaml b/plots/line-loss-training/specification.yaml
index 879a360003..4350aa2803 100644
--- a/plots/line-loss-training/specification.yaml
+++ b/plots/line-loss-training/specification.yaml
@@ -11,7 +11,7 @@ issue: 2860
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - line
diff --git a/plots/line-multi/specification.yaml b/plots/line-multi/specification.yaml
index c6430acb56..aaaf0b78f2 100644
--- a/plots/line-multi/specification.yaml
+++ b/plots/line-multi/specification.yaml
@@ -11,7 +11,7 @@ issue: 1825
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - line
diff --git a/plots/line-realtime/specification.yaml b/plots/line-realtime/specification.yaml
index dd7459aabf..068a2eb7f4 100644
--- a/plots/line-realtime/specification.yaml
+++ b/plots/line-realtime/specification.yaml
@@ -11,7 +11,7 @@ issue: 3073
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - line
diff --git a/plots/line-timeseries-rolling/specification.yaml b/plots/line-timeseries-rolling/specification.yaml
index 8bbb1da47a..e87fa74374 100644
--- a/plots/line-timeseries-rolling/specification.yaml
+++ b/plots/line-timeseries-rolling/specification.yaml
@@ -11,7 +11,7 @@ issue: 2786
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - line
diff --git a/plots/line-timeseries/specification.yaml b/plots/line-timeseries/specification.yaml
index b169884ae0..f13f7c9e29 100644
--- a/plots/line-timeseries/specification.yaml
+++ b/plots/line-timeseries/specification.yaml
@@ -11,7 +11,7 @@ issue: 2006
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - line
diff --git a/plots/manhattan-gwas/specification.yaml b/plots/manhattan-gwas/specification.yaml
index a9d70eccb0..6cb64b9844 100644
--- a/plots/manhattan-gwas/specification.yaml
+++ b/plots/manhattan-gwas/specification.yaml
@@ -11,7 +11,7 @@ issue: 2925
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - scatter
diff --git a/plots/network-directed/specification.yaml b/plots/network-directed/specification.yaml
index 54aeb72888..a1c216e6bb 100644
--- a/plots/network-directed/specification.yaml
+++ b/plots/network-directed/specification.yaml
@@ -11,7 +11,7 @@ issue: 2858
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - network
diff --git a/plots/parallel-categories-basic/specification.yaml b/plots/parallel-categories-basic/specification.yaml
index 4866541453..2083488dee 100644
--- a/plots/parallel-categories-basic/specification.yaml
+++ b/plots/parallel-categories-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 2501
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - parallel-categories
diff --git a/plots/parliament-basic/specification.yaml b/plots/parliament-basic/specification.yaml
index a7f74a207a..1f9c5c0064 100644
--- a/plots/parliament-basic/specification.yaml
+++ b/plots/parliament-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 2499
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - parliament
diff --git a/plots/pdp-basic/specification.yaml b/plots/pdp-basic/specification.yaml
index f4fc5fb529..4001404c3e 100644
--- a/plots/pdp-basic/specification.yaml
+++ b/plots/pdp-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 2922
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - line
diff --git a/plots/phase-diagram/specification.yaml b/plots/phase-diagram/specification.yaml
index 2c384e51c8..78a557f135 100644
--- a/plots/phase-diagram/specification.yaml
+++ b/plots/phase-diagram/specification.yaml
@@ -11,7 +11,7 @@ issue: 3004
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - line
diff --git a/plots/pie-drilldown/specification.yaml b/plots/pie-drilldown/specification.yaml
index 83df54ab11..110f65ae41 100644
--- a/plots/pie-drilldown/specification.yaml
+++ b/plots/pie-drilldown/specification.yaml
@@ -11,7 +11,7 @@ issue: 3072
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - pie
diff --git a/plots/pie-exploded/specification.yaml b/plots/pie-exploded/specification.yaml
index b065326ac0..72eb34565f 100644
--- a/plots/pie-exploded/specification.yaml
+++ b/plots/pie-exploded/specification.yaml
@@ -11,7 +11,7 @@ issue: 2013
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - pie
diff --git a/plots/precision-recall/specification.yaml b/plots/precision-recall/specification.yaml
index da7b912afe..d89e161aaf 100644
--- a/plots/precision-recall/specification.yaml
+++ b/plots/precision-recall/specification.yaml
@@ -11,7 +11,7 @@ issue: 2274
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - line
diff --git a/plots/radar-multi/specification.yaml b/plots/radar-multi/specification.yaml
index dfa7eb6852..fc918f9d0f 100644
--- a/plots/radar-multi/specification.yaml
+++ b/plots/radar-multi/specification.yaml
@@ -11,7 +11,7 @@ issue: 2026
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - radar
diff --git a/plots/raincloud-basic/specification.yaml b/plots/raincloud-basic/specification.yaml
index b3018aae4e..96687986db 100644
--- a/plots/raincloud-basic/specification.yaml
+++ b/plots/raincloud-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 1876
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - raincloud
diff --git a/plots/residual-basic/specification.yaml b/plots/residual-basic/specification.yaml
index de2c70d4f7..8b1035f263 100644
--- a/plots/residual-basic/specification.yaml
+++ b/plots/residual-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 2030
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - residual
diff --git a/plots/residual-plot/specification.yaml b/plots/residual-plot/specification.yaml
index 49eadcfe1d..92ec2598d1 100644
--- a/plots/residual-plot/specification.yaml
+++ b/plots/residual-plot/specification.yaml
@@ -11,7 +11,7 @@ issue: 2332
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - scatter
diff --git a/plots/roc-curve/specification.yaml b/plots/roc-curve/specification.yaml
index e2c43141d5..8277df6b6b 100644
--- a/plots/roc-curve/specification.yaml
+++ b/plots/roc-curve/specification.yaml
@@ -11,7 +11,7 @@ issue: 2273
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - line
diff --git a/plots/scatter-animated-controls/specification.yaml b/plots/scatter-animated-controls/specification.yaml
index 4f2c232599..197fc1c71b 100644
--- a/plots/scatter-animated-controls/specification.yaml
+++ b/plots/scatter-animated-controls/specification.yaml
@@ -11,7 +11,7 @@ issue: 3067
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - scatter
diff --git a/plots/scatter-annotated/specification.yaml b/plots/scatter-annotated/specification.yaml
index a12273f8f4..5434044c5c 100644
--- a/plots/scatter-annotated/specification.yaml
+++ b/plots/scatter-annotated/specification.yaml
@@ -11,7 +11,7 @@ issue: 2790
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - scatter
diff --git a/plots/scatter-color-mapped/specification.yaml b/plots/scatter-color-mapped/specification.yaml
index e3a1fd40d3..593cef7d8b 100644
--- a/plots/scatter-color-mapped/specification.yaml
+++ b/plots/scatter-color-mapped/specification.yaml
@@ -11,7 +11,7 @@ issue: 2004
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - scatter
diff --git a/plots/scatter-marginal/specification.yaml b/plots/scatter-marginal/specification.yaml
index 7c17c2bf94..a2fe7fab46 100644
--- a/plots/scatter-marginal/specification.yaml
+++ b/plots/scatter-marginal/specification.yaml
@@ -11,7 +11,7 @@ issue: 2005
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - scatter
diff --git a/plots/scatter-matrix/specification.yaml b/plots/scatter-matrix/specification.yaml
index 0579b84ed5..124e239419 100644
--- a/plots/scatter-matrix/specification.yaml
+++ b/plots/scatter-matrix/specification.yaml
@@ -11,7 +11,7 @@ issue: 2035
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - scatter
diff --git a/plots/scatter-regression-linear/specification.yaml b/plots/scatter-regression-linear/specification.yaml
index aef3800b08..72f68d0b6e 100644
--- a/plots/scatter-regression-linear/specification.yaml
+++ b/plots/scatter-regression-linear/specification.yaml
@@ -11,7 +11,7 @@ issue: 1821
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - scatter
diff --git a/plots/scatter-regression-lowess/specification.yaml b/plots/scatter-regression-lowess/specification.yaml
index f02736f4a2..67f692ce5e 100644
--- a/plots/scatter-regression-lowess/specification.yaml
+++ b/plots/scatter-regression-lowess/specification.yaml
@@ -11,7 +11,7 @@ issue: 2855
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - scatter
diff --git a/plots/scatter-regression-polynomial/specification.yaml b/plots/scatter-regression-polynomial/specification.yaml
index 6c97dd97a1..94d56ead4b 100644
--- a/plots/scatter-regression-polynomial/specification.yaml
+++ b/plots/scatter-regression-polynomial/specification.yaml
@@ -11,7 +11,7 @@ issue: 2028
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - scatter
diff --git a/plots/shap-summary/specification.yaml b/plots/shap-summary/specification.yaml
index d889f9c781..8bb7ecea12 100644
--- a/plots/shap-summary/specification.yaml
+++ b/plots/shap-summary/specification.yaml
@@ -11,7 +11,7 @@ issue: 2923
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - shap
diff --git a/plots/silhouette-basic/specification.yaml b/plots/silhouette-basic/specification.yaml
index 24095118a7..575e64cd00 100644
--- a/plots/silhouette-basic/specification.yaml
+++ b/plots/silhouette-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 2334
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - silhouette
diff --git a/plots/slider-control-basic/specification.yaml b/plots/slider-control-basic/specification.yaml
index 1e3fbbd94b..b93f749036 100644
--- a/plots/slider-control-basic/specification.yaml
+++ b/plots/slider-control-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 3071
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - scatter
diff --git a/plots/spectrogram-basic/specification.yaml b/plots/spectrogram-basic/specification.yaml
index b898b1fee9..8a95c38ff5 100644
--- a/plots/spectrogram-basic/specification.yaml
+++ b/plots/spectrogram-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 2927
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - spectrogram
diff --git a/plots/spectrum-basic/specification.yaml b/plots/spectrum-basic/specification.yaml
index ec2110beb1..0d6a1a16ac 100644
--- a/plots/spectrum-basic/specification.yaml
+++ b/plots/spectrum-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 2926
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - spectrum
diff --git a/plots/streamline-basic/specification.yaml b/plots/streamline-basic/specification.yaml
index 2bbef536c4..6ea31ad617 100644
--- a/plots/streamline-basic/specification.yaml
+++ b/plots/streamline-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 2861
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - streamline
diff --git a/plots/subplot-grid-custom/specification.yaml b/plots/subplot-grid-custom/specification.yaml
index 19a6c52952..97722a8ddc 100644
--- a/plots/subplot-grid-custom/specification.yaml
+++ b/plots/subplot-grid-custom/specification.yaml
@@ -11,7 +11,7 @@ issue: 2856
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - subplot
diff --git a/plots/subplot-grid/specification.yaml b/plots/subplot-grid/specification.yaml
index 6ce44fcc0a..d69678db56 100644
--- a/plots/subplot-grid/specification.yaml
+++ b/plots/subplot-grid/specification.yaml
@@ -11,7 +11,7 @@ issue: 2782
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - subplot
diff --git a/plots/subplot-mosaic/specification.yaml b/plots/subplot-mosaic/specification.yaml
index 9fa8523cb3..34d1965705 100644
--- a/plots/subplot-mosaic/specification.yaml
+++ b/plots/subplot-mosaic/specification.yaml
@@ -11,7 +11,7 @@ issue: 3002
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - subplot
diff --git a/plots/sudoku-basic/specification.yaml b/plots/sudoku-basic/specification.yaml
index 88260f4eb4..4f86d1463e 100644
--- a/plots/sudoku-basic/specification.yaml
+++ b/plots/sudoku-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 1311
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - grid
diff --git a/plots/survival-kaplan-meier/specification.yaml b/plots/survival-kaplan-meier/specification.yaml
index 9a197e02dc..3e0793f248 100644
--- a/plots/survival-kaplan-meier/specification.yaml
+++ b/plots/survival-kaplan-meier/specification.yaml
@@ -11,7 +11,7 @@ issue: 2441
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - survival
diff --git a/plots/timeline-basic/specification.yaml b/plots/timeline-basic/specification.yaml
index eed3c6f92b..b76b7f0db3 100644
--- a/plots/timeline-basic/specification.yaml
+++ b/plots/timeline-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 2443
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - timeline
diff --git a/plots/timeseries-decomposition/specification.yaml b/plots/timeseries-decomposition/specification.yaml
index a79b51d176..5dd40c9b02 100644
--- a/plots/timeseries-decomposition/specification.yaml
+++ b/plots/timeseries-decomposition/specification.yaml
@@ -11,7 +11,7 @@ issue: 2992
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - line
diff --git a/plots/tree-phylogenetic/specification.yaml b/plots/tree-phylogenetic/specification.yaml
index 709fa2fa74..3b30afdcae 100644
--- a/plots/tree-phylogenetic/specification.yaml
+++ b/plots/tree-phylogenetic/specification.yaml
@@ -11,7 +11,7 @@ issue: 3070
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - tree
diff --git a/plots/venn-basic/specification.yaml b/plots/venn-basic/specification.yaml
index 52393d51dd..b45f0b3267 100644
--- a/plots/venn-basic/specification.yaml
+++ b/plots/venn-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 2444
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - venn
diff --git a/plots/violin-split/specification.yaml b/plots/violin-split/specification.yaml
index b45810b60d..21a4e34a58 100644
--- a/plots/violin-split/specification.yaml
+++ b/plots/violin-split/specification.yaml
@@ -11,7 +11,7 @@ issue: 1949
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - violin
diff --git a/plots/volcano-basic/specification.yaml b/plots/volcano-basic/specification.yaml
index fc14b6ea48..133baa5b7a 100644
--- a/plots/volcano-basic/specification.yaml
+++ b/plots/volcano-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 2924
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - volcano
diff --git a/plots/windrose-basic/specification.yaml b/plots/windrose-basic/specification.yaml
index ccc27f94e2..c2e743710f 100644
--- a/plots/windrose-basic/specification.yaml
+++ b/plots/windrose-basic/specification.yaml
@@ -11,7 +11,7 @@ issue: 1880
 suggested: MarkusNeusinger
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - windrose
diff --git a/prompts/README.md b/prompts/README.md
index f85be07129..6d27cc3941 100644
--- a/prompts/README.md
+++ b/prompts/README.md
@@ -12,10 +12,9 @@ Git history shows all changes (`git log -p prompts/plot-generator.md`).
 | File | Agent | Task |
 |------|-------|------|
 | `plot-generator.md` | Plot Generator | Base rules for all plot implementations |
-| `library/*.md` | Plot Generator | Library-specific rules (8 files) |
+| `library/*.md` | Plot Generator | Library-specific rules (9 files) |
 | `quality-criteria.md` | All | Definition of what "good code" means |
-| `quality-evaluator.md` | Quality Checker | Multi-LLM evaluation (Claude/Gemini/GPT) |
-| `auto-tagger.md` | Auto-Tagger | Automatic tagging across 5 dimensions |
+| `quality-evaluator.md` | Quality Checker | AI quality evaluation |
 | `spec-validator.md` | Spec Validator | Validates plot request issues |
 | `spec-id-generator.md` | Spec ID Generator | Assigns unique spec IDs |
 | `workflow-prompts/*.md` | GitHub Actions | Workflow-specific prompts (see below) |
@@ -26,9 +25,9 @@ Located in `workflow-prompts/` - templates for GitHub Actions workflows:
 
 | File | Workflow | Purpose |
 |------|----------|---------|
-| `generate-implementation.md` | gen-library-impl.yml | Initial code generation |
-| `improve-from-feedback.md` | gen-update-plot.yml | Code improvement after rejection |
-| `ai-quality-review.md` | bot-ai-review.yml | Quality evaluation |
+| `generate-implementation.md` | impl-generate.yml | Initial code generation |
+| `improve-from-feedback.md` | impl-repair.yml | Code improvement after rejection |
+| `ai-quality-review.md` | impl-review.yml | Quality evaluation |
 
 See `workflow-prompts/README.md` for variable reference and usage.
 
@@ -46,7 +45,7 @@ See `workflow-prompts/README.md` for variable reference and usage.
     ${PROMPT_LIB}
 
     ## Spec
-    $(cat plots/${{ inputs.spec_id }}/spec.md)"
+    $(cat plots/${{ inputs.spec_id }}/specification.md)"
 ```
 
 ## Prompt Structure
diff --git a/prompts/templates/specification.yaml b/prompts/templates/specification.yaml
index 63213962e3..a08c0008c0 100644
--- a/prompts/templates/specification.yaml
+++ b/prompts/templates/specification.yaml
@@ -11,7 +11,7 @@ issue: null            # GitHub issue number
 suggested: null        # GitHub username or 'pyplots' for seed plots
 
 # Classification tags (applies to all library implementations)
-# See docs/concepts/tagging-system.md for detailed guidelines
+# See docs/reference/tagging-system.md for detailed guidelines
 tags:
   plot_type:
     - {type}           # Primary plot type (scatter, bar, line, heatmap, etc.)