Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
24 changes: 12 additions & 12 deletions .claude/skills/migrate-docusaurus-to-mintlify/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: Use when migrating ClickHouse docs pages from Docusaurus (clickhous

This skill describes the deterministic rules for converting a Docusaurus `.md`/`.mdx` page (source: `~/Desktop/clickhouse-docs/docs/**`) into a Mintlify page in this repo. The reference implementation lives at `~/Desktop/clickhouse-main` (a Mintlify-mapped snapshot Mintlify produced) — when in doubt, diff against it.

The migration is driven by a script at `scripts/migrate.py` (see the "Script" section). When the script can't decide, it leaves the original content with a `<!-- MIGRATE: ... -->` marker; resolve those by hand using the rules below.
The migration is driven by a script at `_migration/migrate.py` (see the "Script" section). When the script can't decide, it leaves the original content with a `<!-- MIGRATE: ... -->` marker; resolve those by hand using the rules below.

## Hard rules

Expand Down Expand Up @@ -176,7 +176,7 @@ Snippet partials live at `docs/**/_snippets/*.md` in Docusaurus and migrate to `

## 9. Slug map CSV (QA aid)

`scripts/generate-slug-map.py` writes `slug-map.csv` at the repo root. It pairs every Docusaurus slug with its Mintlify URL so a reviewer can open both pages side-by-side.
`_migration/generate-slug-map.py` writes `_migration/slug-map.csv`. It pairs every Docusaurus slug with its Mintlify URL so a reviewer can open both pages side-by-side.

How it builds rows:
1. Walk the Docusaurus repo (`--docusaurus`, default `~/Desktop/clickhouse-docs`) and collect every `slug:`.
Expand All @@ -197,14 +197,14 @@ Tracking columns:
- `migrated_at` — UTC ISO timestamp of the last migration. Diagnostic only.
- `manually_checked` (default `false`) — flip to `true` once a human has opened `old_url` and `new_url` side-by-side and confirmed parity. Never written by tools.

**Staleness rule:** a page is up-to-date iff `migrated == true` AND `migrated_hash == source_hash`. Any drift means the Docusaurus source has changed since the last migration → the page should be re-migrated. `scripts/migrate.py` enforces this by default; pass `--force` to override.
**Staleness rule:** a page is up-to-date iff `migrated == true` AND `migrated_hash == source_hash`. Any drift means the Docusaurus source has changed since the last migration → the page should be re-migrated. `_migration/migrate.py` enforces this by default; pass `--force` to override.

The generator preserves all tracking columns (`migrated`, `migrated_hash`, `migrated_at`, `manually_checked`) when re-run, so it's safe to regenerate at any time without losing progress.

Regenerate any time pages move or slugs change:
```
python scripts/generate-slug-map.py
python scripts/generate-slug-map.py --docusaurus ~/Desktop/clickhouse-docs \
python _migration/generate-slug-map.py
python _migration/generate-slug-map.py --docusaurus ~/Desktop/clickhouse-docs \
--mintlify-base https://private-7c7dfe99.mintlify.app
```

Expand All @@ -220,19 +220,19 @@ Do not change these inside the migration pass:

## Script

The migration script is `scripts/migrate.py`. Invocation:
The migration script is `_migration/migrate.py`. Invocation:

```
python scripts/migrate.py <path> # one file or dir
python scripts/migrate.py --all # whole repo
python scripts/migrate.py <path> --dry-run
python scripts/migrate.py --all --force # re-migrate even up-to-date pages
python _migration/migrate.py <path> # one file or dir
python _migration/migrate.py --all # whole repo
python _migration/migrate.py <path> --dry-run
python _migration/migrate.py --all --force # re-migrate even up-to-date pages
```

**Standard workflow (incremental):**
```
python scripts/generate-slug-map.py # refresh source_hash for all pages
python scripts/migrate.py --all # process only pages whose source changed
python _migration/generate-slug-map.py # refresh source_hash for all pages
python _migration/migrate.py --all # process only pages whose source changed
```

`generate-slug-map.py` recomputes every page's `source_hash`; `migrate.py` skips any page where `migrated=true` AND `migrated_hash == source_hash`. After a Docusaurus repo pull, run both in sequence and only the changed pages are re-touched.
Expand Down
7 changes: 3 additions & 4 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
## ClickHouse documentation

ClickHouse docs are stored in the `/docs` folder and use [Mintlify](https://www.mintlify.com/) to host the docs.
ClickHouse docs are stored in this folder. The documentation site at clickhouse.com/docs is hosted using the [Mintlify](https://www.mintlify.com/) documentation platform.

- Always make updates only to the English documentation in `/docs`. There is no need to update files in the `i18n` folder as an agent will handle the translation separately.
- Documentation inside of `{/*AUTOGENERATED_START*/}` and `{/*AUTOGENERATED_END*/}` tags is generated from ClickHouse system tables.
It should not be edited directly, but rather the source file should be updated in one of the following locations:
- Always make updates only to the english documentation in `/docs` (ignore `/i18n`). There is no need to update files in the `i18n` folder as an agent will handle the translation separately.
- Documentation inside of `{/*AUTOGENERATED_START*/}` and `{/*AUTOGENERATED_END*/}` tags is generated from ClickHouse system tables. It should not be edited directly, but rather the source file should be updated:
- `/src/Functions`

## PR instructions
Expand Down
26 changes: 26 additions & 0 deletions _migration/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Migration

One-shot tooling and data used to migrate the ClickHouse docs from Docusaurus
to Mintlify. **This entire folder can be removed once the migration is
complete** — nothing in the live site depends on it at runtime.

## Contents

- `slug-map.csv` — pairing of every Docusaurus slug → Mintlify file/URL, with
source/migrated hashes for staleness tracking.
- `slug-aliases.csv` — manually-reviewed aliases for slugs that couldn't be
resolved automatically.
- `migrate.py` — main migration script. Reads `slug-map.csv`, applies
Mintlify-necessary transforms, writes the result.
- `generate-slug-map.py` — regenerates `slug-map.csv` from upstream Docusaurus
+ this repo's current layout.
- `apply-slug-aliases.py` — rewrites `<!-- MIGRATE: unknown slug -->` markers
using `slug-aliases.csv`.
- `suggest-slug-aliases.py` — proposes alias candidates for unresolved slugs.
- `match_slugless.py` — finds Mintlify pages with no upstream slug match.
- `verify_mapping.py` — sanity-checks the slug map for duplicates/drift.
- `find_dup_imports.py` — detects MDX import conflicts caused by Mintlify
hoisting snippet imports into the parent bundle.

See `.claude/skills/migrate-docusaurus-to-mintlify/SKILL.md` for the
migration workflow.
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env python3
"""Rewrite `<!-- MIGRATE: unknown slug -->` markers using scripts/slug-aliases.csv.
"""Rewrite `<!-- MIGRATE: unknown slug -->` markers using _migration/slug-aliases.csv.

Reads the alias CSV produced by suggest-slug-aliases.py and rewrites every
marker whose old_slug appears with a non-empty `suggested_target`.
Expand All @@ -11,9 +11,9 @@
Fragments (`#anchor`) on the original link are preserved.

Usage:
python scripts/apply-slug-aliases.py
python scripts/apply-slug-aliases.py --dry-run
python scripts/apply-slug-aliases.py --include-ambiguous
python _migration/apply-slug-aliases.py
python _migration/apply-slug-aliases.py --dry-run
python _migration/apply-slug-aliases.py --include-ambiguous
"""
from __future__ import annotations

Expand Down Expand Up @@ -47,7 +47,7 @@ def split_frag(href: str) -> tuple[str, str]:

def main():
ap = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter)
ap.add_argument("--aliases", type=Path, default=THIS_REPO / "scripts" / "slug-aliases.csv")
ap.add_argument("--aliases", type=Path, default=THIS_REPO / "_migration" / "slug-aliases.csv")
ap.add_argument("--dry-run", action="store_true")
ap.add_argument("--include-ambiguous", action="store_true",
help="also apply basename-ambiguous suggestions")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ def main():
pages = []
for p in REPO.rglob("*.mdx"):
rel = p.relative_to(REPO).as_posix()
if rel.startswith((".claude/", "node_modules/", "scripts/")):
if rel.startswith((".claude/", "node_modules/", "_site/", "_migration/")):
continue
pages.append(p)

Expand Down
106 changes: 106 additions & 0 deletions _migration/find_orphans.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
#!/usr/bin/env python3
"""Find .mdx pages on disk that have no entry in docs.json."""
import json
from collections import defaultdict
from pathlib import Path

REPO = Path("/Users/sstruw/Desktop/mintlify-docs-dev")
SKIP_DIRS = {
".git", "node_modules", ".playwright-mcp", ".idea", ".mintlify", ".claude",
"snippets", "_snippets", "_site", "_migration",
"i18n", # localization, not in default nav
}
SKIP_NAMES = {"AGENTS.md", "README.md", "changelog_entry_guidelines.mdx"}
# Path prefixes whose pages are intentionally outside docs.json (e.g. wired
# via a dynamic explorer component instead of the sidebar nav).
SKIP_PREFIXES = (
"core/get-started/quickstarts/",
)


def collect_disk_pages() -> set[str]:
pages = set()
for path in REPO.rglob("*"):
if not path.is_file():
continue
if path.suffix not in {".mdx", ".md"}:
continue
parts = path.relative_to(REPO).parts
if set(parts) & SKIP_DIRS:
continue
# Skip partials (underscore-prefixed file or any underscore-prefixed dir)
if any(p.startswith("_") for p in parts):
continue
rel = path.relative_to(REPO).as_posix()
if rel in SKIP_NAMES or path.name == "README.md":
continue
ref = rel[:-4] if rel.endswith(".mdx") else rel[:-3]
if any(ref.startswith(p) for p in SKIP_PREFIXES):
continue
pages.add(ref)
return pages


def collect_docs_json_refs(node) -> set[str]:
refs = set()

def visit(obj):
if isinstance(obj, list):
for item in obj:
visit(item)
elif isinstance(obj, dict):
# Follow $ref includes (e.g. products/kubernetes-operator/navigation.json)
if "$ref" in obj:
ref_path = obj["$ref"]
ref_file = REPO / ref_path
if ref_file.exists():
visit(json.loads(ref_file.read_text()))
return
for k, v in obj.items():
if k in ("pages", "groups"):
visit(v)
elif k == "root" and isinstance(v, str):
refs.add(v)
elif isinstance(v, (dict, list)):
visit(v)
elif isinstance(obj, str):
if obj.startswith(("http://", "https://", "/")):
return
if obj.endswith((".json", ".js", ".css", ".svg", ".png", ".jpg", ".ico")):
return
refs.add(obj)

visit(node)
return refs


def main():
docs_json = json.loads((REPO / "docs.json").read_text())
disk = collect_disk_pages()
referenced = collect_docs_json_refs(docs_json)

# A page referenced as `X` also covers `X/index` (Mintlify auto-routes both)
expanded = set(referenced)
for r in referenced:
expanded.add(r + "/index")

orphans = sorted(disk - expanded)

# Group by top-level section
by_section = defaultdict(list)
for o in orphans:
section = o.split("/", 1)[0]
by_section[section].append(o)

print(f"Disk pages: {len(disk)}")
print(f"docs.json refs: {len(referenced)}")
print(f"Orphans: {len(orphans)}\n")
for section in sorted(by_section):
print(f"=== {section} ({len(by_section[section])}) ===")
for p in by_section[section]:
print(f" {p}")
print()


if __name__ == "__main__":
main()
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@
slug field.

Usage:
python scripts/generate-slug-map.py
python scripts/generate-slug-map.py --docusaurus ~/Desktop/clickhouse-docs
python _migration/generate-slug-map.py
python _migration/generate-slug-map.py --docusaurus ~/Desktop/clickhouse-docs
"""
from __future__ import annotations

Expand Down Expand Up @@ -117,7 +117,7 @@ def main():
help="path to clickhouse-docs (Docusaurus) repo")
ap.add_argument("--mintlify", type=Path, default=THIS_REPO,
help="path to Mintlify repo (default: this repo)")
ap.add_argument("--out", type=Path, default=THIS_REPO / "slug-map.csv")
ap.add_argument("--out", type=Path, default=THIS_REPO / "_migration" / "slug-map.csv")
ap.add_argument("--old-base", default="https://clickhouse.com/docs")
ap.add_argument("--mintlify-base", default="https://private-7c7dfe99.mintlify.app")
args = ap.parse_args()
Expand Down
File renamed without changes.
Loading