This issue is the home for the design discussion that came out of PR #1196's review.
Background
PR #1196 introduces the corpus-mutation extension system with a single entry point: a script defines a transform_corpus(corpus) global, and the host calls it. This is the simplest possible shape (let's call it Option 1) and it's what shipped in that PR.
During review, Alan raised the question of whether this shape is appropriate, particularly as we add:
- More lifecycle hooks (e.g.
before_extract, after_render).
- Extensions that register multiple capabilities from one file (a corpus transform and a Handlebars helper).
- Extensions that bundle non-code files (templates, assets, configuration).
- Extension enabling/disabling.
The PR ships Option 1. This issue captures the trade-off analysis and lays out the ladder of richer options we can add as needs arise.
The four entry-point patterns
Option 1: Reserved function names
Script defines a function with a known name (transform_corpus); the host introspects.
Examples: pytest, Sphinx.
Trade-off: minimal syntax, but each new hook reserves another global. Doesn't scale past a small fixed set.
My evaluation: right starting point for Mr. Docs. The "reserves a global per hook" weakness only bites once the hook count grows; we have one hook today. pytest and Sphinx are mature systems using this pattern successfully.
Option 2: Top-level registration calls
Script calls host.register_*(fn) in top-level code; the host stores the registration and invokes the callback at the right time.
Exampled: Darktable, LLVM/Clang plugins.
Trade-off: explicit, scalable, one file can register multiple things; syntactically heavier.
My evaluation: necessary eventually; not yet. Becomes the right move when we want paired helpers (one file registering both a corpus transform and a Handlebars helper) or more lifecycle hooks. IMHO pre-paying for it before there's a concrete need is overkill.
Option 3: Reserved register function + event emitter
Scripts export one reserved name (register); inside, it subscribes to host events.
Example: Antora.
Trade-off: single reserved name + familiar event pattern; adds an emitter abstraction layer.
My evaluation: probably not the right rung for Mr. Docs. Antora's pattern fits a pipeline with many extension points throughout the build; we have fewer. The emitter abstraction is overhead for our shape. We could skip rung 3 and jump from rung 2 straight to rang 4 if/when needed.
Option 4: Manifest + accompanying code
An extension is a directory: a manifest file (JSON/YAML) declares the extension name and capabilities; one or more accompanying files contain the actual logic.
Examples: Claude Code skills (Markdown frontmatter + body).
Trade-off: most expressive; supports paired helpers, auxiliary files, enable/disable, configurable extensions. Requires the most infrastructure.
My evaluation: the right answer once we want enable/disable, named extensions, auxiliary files, or configurable extensions. Heaviest but most expressive. The natural top of the ladder.
The ladder
The options aren't mutually exclusive. They form a complexity ladder:
| Rung |
Pattern |
What you get |
| 1 |
Reserved name (Option 1) |
Simplest case: one capability, one file, no ceremony |
| 2 |
Registration calls (Option 2) |
Shared data: one file registers multiple capabilities (e.g., a corpus transform alongside a Handlebars helper) |
| 3 |
Manifest + code (Option 4) |
Shared files: an extension is a directory bundling code, helpers, and assets |
| ... |
... |
enable/disable, configuration schemas, ... |
PR #1196 ships rung 1. Higher rungs land as concrete use cases surface.
Future questions to settle here
These came up in the PR review. They are not blocking PR #1196 but should inform the ladder above.
- Paired helpers: should one extension file be able to register both a corpus transform and a Handlebars helper? This forces rung 2+.
- Auxiliary files: should an extension be a directory with assets/templates/config, not just a script? This forces rung 3.
- Enable/disable: how do users opt individual extensions in or out? Likely needs a config-side knob and probably an extension name (which forces a manifest).
- Registering generators: should extensions be able to add new output formats (e.g., a Markdown generator)? Forces rung 3 and a richer registry.
- Invariant safety: we all seem to agree that extensions should not break invariants; but some features require breaking them. As real use cases land, this tension will need a concrete resolution (tighter allowlist, opt-in unsafe mutations, post-hoc validation, etc.).
This issue is the home for the design discussion that came out of PR #1196's review.
Background
PR #1196 introduces the corpus-mutation extension system with a single entry point: a script defines a
transform_corpus(corpus)global, and the host calls it. This is the simplest possible shape (let's call it Option 1) and it's what shipped in that PR.During review, Alan raised the question of whether this shape is appropriate, particularly as we add:
before_extract,after_render).The PR ships Option 1. This issue captures the trade-off analysis and lays out the ladder of richer options we can add as needs arise.
The four entry-point patterns
Option 1: Reserved function names
Script defines a function with a known name (
transform_corpus); the host introspects.Examples: pytest, Sphinx.
Trade-off: minimal syntax, but each new hook reserves another global. Doesn't scale past a small fixed set.
My evaluation: right starting point for Mr. Docs. The "reserves a global per hook" weakness only bites once the hook count grows; we have one hook today. pytest and Sphinx are mature systems using this pattern successfully.
Option 2: Top-level registration calls
Script calls
host.register_*(fn)in top-level code; the host stores the registration and invokes the callback at the right time.Exampled: Darktable, LLVM/Clang plugins.
Trade-off: explicit, scalable, one file can register multiple things; syntactically heavier.
My evaluation: necessary eventually; not yet. Becomes the right move when we want paired helpers (one file registering both a corpus transform and a Handlebars helper) or more lifecycle hooks. IMHO pre-paying for it before there's a concrete need is overkill.
Option 3: Reserved
registerfunction + event emitterScripts export one reserved name (
register); inside, it subscribes to host events.Example: Antora.
Trade-off: single reserved name + familiar event pattern; adds an emitter abstraction layer.
My evaluation: probably not the right rung for Mr. Docs. Antora's pattern fits a pipeline with many extension points throughout the build; we have fewer. The emitter abstraction is overhead for our shape. We could skip rung 3 and jump from rung 2 straight to rang 4 if/when needed.
Option 4: Manifest + accompanying code
An extension is a directory: a manifest file (JSON/YAML) declares the extension name and capabilities; one or more accompanying files contain the actual logic.
Examples: Claude Code skills (Markdown frontmatter + body).
Trade-off: most expressive; supports paired helpers, auxiliary files, enable/disable, configurable extensions. Requires the most infrastructure.
My evaluation: the right answer once we want enable/disable, named extensions, auxiliary files, or configurable extensions. Heaviest but most expressive. The natural top of the ladder.
The ladder
The options aren't mutually exclusive. They form a complexity ladder:
PR #1196 ships rung 1. Higher rungs land as concrete use cases surface.
Future questions to settle here
These came up in the PR review. They are not blocking PR #1196 but should inform the ladder above.