ClickHouse · Blargian · May 21, 2026 · May 21, 2026 · May 21, 2026 · May 21, 2026
diff --git a/.claude/skills/migrate-docusaurus-to-mintlify/SKILL.md b/.claude/skills/migrate-docusaurus-to-mintlify/SKILL.md
@@ -7,7 +7,7 @@ description: Use when migrating ClickHouse docs pages from Docusaurus (clickhous
 
 This skill describes the deterministic rules for converting a Docusaurus `.md`/`.mdx` page (source: `~/Desktop/clickhouse-docs/docs/**`) into a Mintlify page in this repo. The reference implementation lives at `~/Desktop/clickhouse-main` (a Mintlify-mapped snapshot Mintlify produced) — when in doubt, diff against it.
 
-The migration is driven by a script at `scripts/migrate.py` (see the "Script" section). When the script can't decide, it leaves the original content with a `<!-- MIGRATE: ... -->` marker; resolve those by hand using the rules below.
+The migration is driven by a script at `_migration/migrate.py` (see the "Script" section). When the script can't decide, it leaves the original content with a `<!-- MIGRATE: ... -->` marker; resolve those by hand using the rules below.
 
 ## Hard rules
 
@@ -176,7 +176,7 @@ Snippet partials live at `docs/**/_snippets/*.md` in Docusaurus and migrate to `
 
 ## 9. Slug map CSV (QA aid)
 
-`scripts/generate-slug-map.py` writes `slug-map.csv` at the repo root. It pairs every Docusaurus slug with its Mintlify URL so a reviewer can open both pages side-by-side.
+`_migration/generate-slug-map.py` writes `_migration/slug-map.csv`. It pairs every Docusaurus slug with its Mintlify URL so a reviewer can open both pages side-by-side.
 
 How it builds rows:
 1. Walk the Docusaurus repo (`--docusaurus`, default `~/Desktop/clickhouse-docs`) and collect every `slug:`.
@@ -197,14 +197,14 @@ Tracking columns:
 - `migrated_at` — UTC ISO timestamp of the last migration. Diagnostic only.
 - `manually_checked` (default `false`) — flip to `true` once a human has opened `old_url` and `new_url` side-by-side and confirmed parity. Never written by tools.
 
-**Staleness rule:** a page is up-to-date iff `migrated == true` AND `migrated_hash == source_hash`. Any drift means the Docusaurus source has changed since the last migration → the page should be re-migrated. `scripts/migrate.py` enforces this by default; pass `--force` to override.
+**Staleness rule:** a page is up-to-date iff `migrated == true` AND `migrated_hash == source_hash`. Any drift means the Docusaurus source has changed since the last migration → the page should be re-migrated. `_migration/migrate.py` enforces this by default; pass `--force` to override.
 
 The generator preserves all tracking columns (`migrated`, `migrated_hash`, `migrated_at`, `manually_checked`) when re-run, so it's safe to regenerate at any time without losing progress.
 
 Regenerate any time pages move or slugs change:
 ```
-python scripts/generate-slug-map.py
-python scripts/generate-slug-map.py --docusaurus ~/Desktop/clickhouse-docs \
+python _migration/generate-slug-map.py
+python _migration/generate-slug-map.py --docusaurus ~/Desktop/clickhouse-docs \
     --mintlify-base https://private-7c7dfe99.mintlify.app
 ```
 
@@ -220,19 +220,19 @@ Do not change these inside the migration pass:
 
 ## Script
 
-The migration script is `scripts/migrate.py`. Invocation:
+The migration script is `_migration/migrate.py`. Invocation:
 
 ```
-python scripts/migrate.py <path>          # one file or dir
-python scripts/migrate.py --all           # whole repo
-python scripts/migrate.py <path> --dry-run
-python scripts/migrate.py --all --force   # re-migrate even up-to-date pages
+python _migration/migrate.py <path>          # one file or dir
+python _migration/migrate.py --all           # whole repo
+python _migration/migrate.py <path> --dry-run
+python _migration/migrate.py --all --force   # re-migrate even up-to-date pages
 ```
 
 **Standard workflow (incremental):**
 ```
-python scripts/generate-slug-map.py       # refresh source_hash for all pages
-python scripts/migrate.py --all           # process only pages whose source changed
+python _migration/generate-slug-map.py       # refresh source_hash for all pages
+python _migration/migrate.py --all           # process only pages whose source changed
 ```
 
 `generate-slug-map.py` recomputes every page's `source_hash`; `migrate.py` skips any page where `migrated=true` AND `migrated_hash == source_hash`. After a Docusaurus repo pull, run both in sequence and only the changed pages are re-touched.

diff --git a/AGENTS.md b/AGENTS.md
@@ -1,10 +1,9 @@
 ## ClickHouse documentation
 
-ClickHouse docs are stored in the `/docs` folder and use [Mintlify](https://www.mintlify.com/) to host the docs.
+ClickHouse docs are stored in this folder. The documentation site at clickhouse.com/docs is hosted using the [Mintlify](https://www.mintlify.com/) documentation platform.
 
-- Always make updates only to the English documentation in `/docs`. There is no need to update files in the `i18n` folder as an agent will handle the translation separately.
-- Documentation inside of `{/*AUTOGENERATED_START*/}` and `{/*AUTOGENERATED_END*/}` tags is generated from ClickHouse system tables.
-  It should not be edited directly, but rather the source file should be updated in one of the following locations:
+- Always make updates only to the english documentation in `/docs` (ignore `/i18n`). There is no need to update files in the `i18n` folder as an agent will handle the translation separately.
+- Documentation inside of `{/*AUTOGENERATED_START*/}` and `{/*AUTOGENERATED_END*/}` tags is generated from ClickHouse system tables. It should not be edited directly, but rather the source file should be updated:
   - `/src/Functions`
 
 ## PR instructions

diff --git a/_migration/README.md b/_migration/README.md
@@ -0,0 +1,26 @@
+# Migration
+
+One-shot tooling and data used to migrate the ClickHouse docs from Docusaurus
+to Mintlify. **This entire folder can be removed once the migration is
+complete** — nothing in the live site depends on it at runtime.
+
+## Contents
+
+- `slug-map.csv` — pairing of every Docusaurus slug → Mintlify file/URL, with
+  source/migrated hashes for staleness tracking.
+- `slug-aliases.csv` — manually-reviewed aliases for slugs that couldn't be
+  resolved automatically.
+- `migrate.py` — main migration script. Reads `slug-map.csv`, applies
+  Mintlify-necessary transforms, writes the result.
+- `generate-slug-map.py` — regenerates `slug-map.csv` from upstream Docusaurus
+  + this repo's current layout.
+- `apply-slug-aliases.py` — rewrites `<!-- MIGRATE: unknown slug -->` markers
+  using `slug-aliases.csv`.
+- `suggest-slug-aliases.py` — proposes alias candidates for unresolved slugs.
+- `match_slugless.py` — finds Mintlify pages with no upstream slug match.
+- `verify_mapping.py` — sanity-checks the slug map for duplicates/drift.
+- `find_dup_imports.py` — detects MDX import conflicts caused by Mintlify
+  hoisting snippet imports into the parent bundle.
+
+See `.claude/skills/migrate-docusaurus-to-mintlify/SKILL.md` for the
+migration workflow.
diff --git a/scripts/apply-slug-aliases.py → _migration/apply-slug-aliases.py b/scripts/apply-slug-aliases.py → _migration/apply-slug-aliases.py
@@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-"""Rewrite `<!-- MIGRATE: unknown slug -->` markers using scripts/slug-aliases.csv.
+"""Rewrite `<!-- MIGRATE: unknown slug -->` markers using _migration/slug-aliases.csv.
 
 Reads the alias CSV produced by suggest-slug-aliases.py and rewrites every
 marker whose old_slug appears with a non-empty `suggested_target`.
@@ -11,9 +11,9 @@
 Fragments (`#anchor`) on the original link are preserved.
 
 Usage:
-    python scripts/apply-slug-aliases.py
-    python scripts/apply-slug-aliases.py --dry-run
-    python scripts/apply-slug-aliases.py --include-ambiguous
+    python _migration/apply-slug-aliases.py
+    python _migration/apply-slug-aliases.py --dry-run
+    python _migration/apply-slug-aliases.py --include-ambiguous
 """
 from __future__ import annotations
 
@@ -47,7 +47,7 @@ def split_frag(href: str) -> tuple[str, str]:
 
 def main():
     ap = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter)
-    ap.add_argument("--aliases", type=Path, default=THIS_REPO / "scripts" / "slug-aliases.csv")
+    ap.add_argument("--aliases", type=Path, default=THIS_REPO / "_migration" / "slug-aliases.csv")
     ap.add_argument("--dry-run", action="store_true")
     ap.add_argument("--include-ambiguous", action="store_true",
                     help="also apply basename-ambiguous suggestions")

diff --git a/scripts/find_dup_imports.py → _migration/find_dup_imports.py b/scripts/find_dup_imports.py → _migration/find_dup_imports.py
@@ -116,7 +116,7 @@ def main():
     pages = []
     for p in REPO.rglob("*.mdx"):
         rel = p.relative_to(REPO).as_posix()
-        if rel.startswith((".claude/", "node_modules/", "scripts/")):
+        if rel.startswith((".claude/", "node_modules/", "_site/", "_migration/")):
             continue
         pages.append(p)
 

diff --git a/_migration/find_orphans.py b/_migration/find_orphans.py
@@ -0,0 +1,106 @@
+#!/usr/bin/env python3
+"""Find .mdx pages on disk that have no entry in docs.json."""
+import json
+from collections import defaultdict
+from pathlib import Path
+
+REPO = Path("/Users/sstruw/Desktop/mintlify-docs-dev")
+SKIP_DIRS = {
+    ".git", "node_modules", ".playwright-mcp", ".idea", ".mintlify", ".claude",
+    "snippets", "_snippets", "_site", "_migration",
+    "i18n",  # localization, not in default nav
+}
+SKIP_NAMES = {"AGENTS.md", "README.md", "changelog_entry_guidelines.mdx"}
+# Path prefixes whose pages are intentionally outside docs.json (e.g. wired
+# via a dynamic explorer component instead of the sidebar nav).
+SKIP_PREFIXES = (
+    "core/get-started/quickstarts/",
+)
+
+
+def collect_disk_pages() -> set[str]:
+    pages = set()
+    for path in REPO.rglob("*"):
+        if not path.is_file():
+            continue
+        if path.suffix not in {".mdx", ".md"}:
+            continue
+        parts = path.relative_to(REPO).parts
+        if set(parts) & SKIP_DIRS:
+            continue
+        # Skip partials (underscore-prefixed file or any underscore-prefixed dir)
+        if any(p.startswith("_") for p in parts):
+            continue
+        rel = path.relative_to(REPO).as_posix()
+        if rel in SKIP_NAMES or path.name == "README.md":
+            continue
+        ref = rel[:-4] if rel.endswith(".mdx") else rel[:-3]
+        if any(ref.startswith(p) for p in SKIP_PREFIXES):
+            continue
+        pages.add(ref)
+    return pages
+
+
+def collect_docs_json_refs(node) -> set[str]:
+    refs = set()
+
+    def visit(obj):
+        if isinstance(obj, list):
+            for item in obj:
+                visit(item)
+        elif isinstance(obj, dict):
+            # Follow $ref includes (e.g. products/kubernetes-operator/navigation.json)
+            if "$ref" in obj:
+                ref_path = obj["$ref"]
+                ref_file = REPO / ref_path
+                if ref_file.exists():
+                    visit(json.loads(ref_file.read_text()))
+                return
+            for k, v in obj.items():
+                if k in ("pages", "groups"):
+                    visit(v)
+                elif k == "root" and isinstance(v, str):
+                    refs.add(v)
+                elif isinstance(v, (dict, list)):
+                    visit(v)
+        elif isinstance(obj, str):
+            if obj.startswith(("http://", "https://", "/")):
+                return
+            if obj.endswith((".json", ".js", ".css", ".svg", ".png", ".jpg", ".ico")):
+                return
+            refs.add(obj)
+
+    visit(node)
+    return refs
+
+
+def main():
+    docs_json = json.loads((REPO / "docs.json").read_text())
+    disk = collect_disk_pages()
+    referenced = collect_docs_json_refs(docs_json)
+
+    # A page referenced as `X` also covers `X/index` (Mintlify auto-routes both)
+    expanded = set(referenced)
+    for r in referenced:
+        expanded.add(r + "/index")
+
+    orphans = sorted(disk - expanded)
+
+    # Group by top-level section
+    by_section = defaultdict(list)
+    for o in orphans:
+        section = o.split("/", 1)[0]
+        by_section[section].append(o)
+
+    print(f"Disk pages: {len(disk)}")
+    print(f"docs.json refs: {len(referenced)}")
+    print(f"Orphans: {len(orphans)}\n")
+    for section in sorted(by_section):
+        print(f"=== {section} ({len(by_section[section])}) ===")
+        for p in by_section[section]:
+            print(f"  {p}")
+        print()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/generate-slug-map.py → _migration/generate-slug-map.py b/scripts/generate-slug-map.py → _migration/generate-slug-map.py
@@ -17,8 +17,8 @@
 slug field.
 
 Usage:
-    python scripts/generate-slug-map.py
-    python scripts/generate-slug-map.py --docusaurus ~/Desktop/clickhouse-docs
+    python _migration/generate-slug-map.py
+    python _migration/generate-slug-map.py --docusaurus ~/Desktop/clickhouse-docs
 """
 from __future__ import annotations
 
@@ -117,7 +117,7 @@ def main():
                     help="path to clickhouse-docs (Docusaurus) repo")
     ap.add_argument("--mintlify", type=Path, default=THIS_REPO,
                     help="path to Mintlify repo (default: this repo)")
-    ap.add_argument("--out", type=Path, default=THIS_REPO / "slug-map.csv")
+    ap.add_argument("--out", type=Path, default=THIS_REPO / "_migration" / "slug-map.csv")
     ap.add_argument("--old-base", default="https://clickhouse.com/docs")
     ap.add_argument("--mintlify-base", default="https://private-7c7dfe99.mintlify.app")
     args = ap.parse_args()

diff --git a/scripts/match_slugless.py → _migration/match_slugless.py b/scripts/match_slugless.py → _migration/match_slugless.py