docs: graduate plugins out of experimental mode#603
docs: graduate plugins out of experimental mode#603johnnygreco wants to merge 17 commits intomainfrom
Conversation
|
Docs preview: https://cac89582.dd-docs-preview.pages.dev
|
Review: PR #603 —
|
Greptile SummaryThis PR graduates the plugin system out of experimental status by consolidating the plugin authoring guides into
|
| Filename | Overview |
|---|---|
| mkdocs.yml | Nav restructured from flat Code Reference list into Config/Engine/Interface groups; deleted plugin pages replaced with build_your_own and models; url_download: true added to pymdownx.snippets to enable remote catalog embedding. |
| docs/plugins/build_your_own.md | New consolidated plugin authoring guide covering column generators, seed readers, and processors with correct import paths and base classes verified against engine source. |
| docs/plugins/available.md | Replaced placeholder with live plugin catalog; embeds remote GitHub raw URL via pymdownx snippets slice syntax (:5:) requiring the url_download: true flag added in mkdocs.yml. |
| docs/plugins/models.md | New guide for model-backed plugin patterns; import paths and base class usage (ColumnGeneratorWithModel, ColumnGeneratorWithModelRegistry) match actual engine classes. |
| packages/data-designer-config/src/data_designer/plugins/plugin.py | Added class-level docstrings to PluginType and Plugin; fixed typo in config_qualified_name field description. |
| packages/data-designer-config/src/data_designer/config/processors.py | Corrected docstring directory names to match actual artifact storage constants: dropped-columns-parquet-files and processors-files, both verified in artifact_storage.py. |
| packages/data-designer-engine/src/data_designer/engine/column_generators/generators/base.py | Added class docstrings to ColumnGeneratorCellByCell and ColumnGeneratorFullColumn, and abstract generate() method docstrings; no behavioral changes. |
| packages/data-designer-engine/src/data_designer/engine/processing/init.py | New empty init.py to make the processing subpackage importable by mkdocstrings/griffe for doc generation. |
| packages/data-designer-config/src/data_designer/config/seed_source.py | Added class docstring to FileSystemSeedSource with accurate field descriptions for path, file_pattern, and recursive. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Plugin package published] --> B[Entry point registered\ndata_designer.plugins group]
B --> C{import data_designer}
C --> D[Discovery scans installed\nentry points]
D --> E{DISABLE_DATA_DESIGNER_PLUGINS?}
E -- true --> F[Discovery skipped]
E -- false --> G[Load Plugin objects]
G --> H{PluginType?}
H -- COLUMN_GENERATOR --> I[Register config discriminator\nin column_type union]
H -- SEED_READER --> J[Register config discriminator\nin seed_type union]
H -- PROCESSOR --> K[Register config discriminator\nin processor_type union]
I --> L[Config builder add_column]
J --> M[Seed source resolution]
K --> N[Config builder add_processor]
Reviews (6): Last reviewed commit: "docs: clarify plugin model usage" | Re-trigger Greptile
|
|
||
| ### 3. Install and Test Locally | ||
| - Install your plugin locally with `uv pip install -e .` from the plugin package directory. | ||
| - No publishing is required. Your plugin is usable immediately after a local install. |
There was a problem hiding this comment.
Codex caught this: "usable immediately after a local install" is only true in a fresh Python process. PluginRegistry snapshots entry points on first init (registry.py:33), and column_types/processor_types/seed_source_types/DEFAULT_SEED_READERS all inject plugins at module-import time. Notebook authors who uv pip install -e . after import data_designer will get no discovery without restarting the kernel. wdyt about a one-line restart caveat?
There was a problem hiding this comment.
Good catch. The overview no longer has the old local-install walkthrough, but the current “Use an Installed Plugin” paragraph still needed the same cache caveat. I added a line noting that if data_designer has already been imported, users should restart the Python process so discovery rebuilds from the new entry points.
| uv pip install -e . | ||
| ``` | ||
|
|
||
| The editable install registers the `data_designer.plugins` entry point so Data Designer can discover the plugin. |
There was a problem hiding this comment.
same caveat as on overview.md:58 — "the editable install registers the entry point so Data Designer can discover the plugin" is true at first import only. PluginRegistry discovers once and caches; the column/processor/seed-source unions are built at import. Worth a one-liner about kernel restart since iterative install in a notebook is the obvious workflow. (Also applies to processor.md:136, which has the same line.)
There was a problem hiding this comment.
This is covered now in the install section directly below this sentence: it notes that Data Designer caches the plugin registry on first import, and tells notebook users to restart the kernel or interpreter after uv pip install -e .. The old processor.md page was folded into build_your_own.md, so there is no second copy left to update.
The PR description says these As shipped, only column generators get an actual mkdocstrings-rendered API reference (via Either land |
- Add an Implementation base section to code_reference/processors.md rendering the engine-side Processor class. This justifies the engine/processing/__init__.py files added earlier and gives processor plugin authors an auto-rendered API reference, matching the pattern used by code_reference/generators.md and seed_readers.md. - build_your_own.md: replace the placeholder "x" emoji on the IndexMultiplier example with the actual multiplication sign. - build_your_own.md: drop the manual `re.compile + apply(lambda)` pattern in the regex-filter processor in favor of the idiomatic `Series.str.contains(..., regex=True)`. - build_your_own.md: add a kernel-restart caveat after the editable install instructions — PluginRegistry caches discovery on first import, so notebooks need a fresh kernel to pick up freshly installed plugins. - build_your_own.md: state explicitly what `assert_valid_plugin` checks (config base + plugin-type-appropriate impl base). - code_reference/plugins.md: link out to the processors code reference alongside generators and seed_readers.
|
@andreatgretel Re: #603 (comment) This is addressed in the current PR state. The branch now has |
Griffe (used by mkdocstrings) skips directories without __init__.py when resolving module paths, which prevented the new plugins code reference from rendering SeedReader, FileSystemSeedReader, and Processor. Adding empty __init__.py files in engine/resources/, engine/processing/, and engine/processing/processors/ aligns with the convention already used in engine/mcp/, engine/models/, etc.
Plugin authors now see meaningful descriptions for every field and method on the bases rendered in the plugins code reference: - Plugin and PluginType: class docstrings + Attributes tables for fields and enum members; fix typo in config_qualified_name field description. - SingleColumnConfig: document allow_resize. - ProcessorConfig: document processor_type discriminator. - SeedSource: document seed_type discriminator. - FileSystemSeedSource: add class docstring + Attributes table for path / file_pattern / recursive. - ColumnGeneratorFullColumn and ColumnGeneratorCellByCell: add class docstrings explaining when to use each base, plus method docstrings on the abstract generate() implementations.
Restructures plugin documentation around the now-stable extension
points (column generator, seed reader, processor) and treats plugins
as a first-class story for customizing Data Designer.
- Add code_reference/plugins.md: single-stop reference for the Plugin
object and the config + implementation base classes used by all
three plugin types.
- Add code_reference/generators.md: column generator implementation
base classes, separated from column configs.
- Surface SingleColumnConfig in code_reference/column_configs.md.
- Add plugins/implement.md ("Build Your Own"): per-type implementation
instructions across column generators, seed readers, and processors.
- Add plugins/processor.md: complete processor plugin package example.
- Rewrite plugins/overview.md: open with why plugins exist, drop the
internal-helpers note (PluginRegistry / PluginManager), and focus
the guide on what plugin builders need.
- Refresh plugins/available.md (Catalog) and
plugins/filesystem_seed_reader.md to match the new structure.
- Delete plugins/example.md (replaced by per-type guides).
- Reorder Code Reference nav alphabetically and add the new pages.
- Minor link / wording fixes in concepts/processors.md and
concepts/deployment-options.md.
Replace the overview's how-to walkthrough and the per-type plugin guides with a single Build Your Own page that covers all three plugin types side-by-side. Add a dedicated Using Models in Plugins guide and a seed_readers code reference, and trim the overview down to what the plugin types are, how to use one, and how discovery works. - Rename plugins/implement.md to plugins/build_your_own.md. - Delete plugins/filesystem_seed_reader.md and plugins/processor.md (their content is now in build_your_own.md and the per-type code references). - Add plugins/models.md for model-backed column generator authoring. - Add code_reference/seed_readers.md for seed reader implementation base classes. - Rewrite plugins/overview.md: shorter intro, type bullets link to the relevant code reference, drop the multi-step "How do you create plugins" walkthrough in favor of a single Build a Plugin pointer, tighten Discovery troubleshooting. - Refresh plugins/available.md (Available Plugins): point to the DataDesignerPlugins catalog and explain how to request a community listing. - Update cross-page links in concepts/processors.md, concepts/seed-datasets.md, recipes/plugin_development/markdown_seed_reader.md, code_reference/plugins.md, and code_reference/generators.md to match the new structure. - Update mkdocs.yml nav: rename to Build Your Own, add Using Models, add seed_readers code reference.
Code-heavy reference tables (plugin bases, column generators, etc.) were wrapping aggressively on narrow viewports, breaking long identifiers across multiple lines. Switch the table container to horizontal overflow and prevent code cells from wrapping so identifiers stay readable.
- Add an Implementation base section to code_reference/processors.md rendering the engine-side Processor class. This justifies the engine/processing/__init__.py files added earlier and gives processor plugin authors an auto-rendered API reference, matching the pattern used by code_reference/generators.md and seed_readers.md. - build_your_own.md: replace the placeholder "x" emoji on the IndexMultiplier example with the actual multiplication sign. - build_your_own.md: drop the manual `re.compile + apply(lambda)` pattern in the regex-filter processor in favor of the idiomatic `Series.str.contains(..., regex=True)`. - build_your_own.md: add a kernel-restart caveat after the editable install instructions — PluginRegistry caches discovery on first import, so notebooks need a fresh kernel to pick up freshly installed plugins. - build_your_own.md: state explicitly what `assert_valid_plugin` checks (config base + plugin-type-appropriate impl base). - code_reference/plugins.md: link out to the processors code reference alongside generators and seed_readers.
08d153d to
1e93465
Compare
|
All contributors have signed the DCO ✍️ ✅ |
|
@nabinchha could you take a close look at the Using Models in Plugins section? We want to establish a good pattern for using models in plugins, especially the recommended base class split for single-model vs multi-model generators and the alias validation / health-check behavior. |
b84bae9 to
a88c702
Compare
|
@johnnygreco — took a close pass at Things I'd change in this PR
Engine-side limitation surfaced by the review The reason point 3 needs to exist at all is that secondary aliases on a packaged plugin config can't be opted into the standard startup health check today — only I filed #606 to propose a small fix: a Verdict on the docs alone: the substance is right, the four items above are polish, and the multi-model alias story will be much cleaner once #606 lands. |
| assert_valid_plugin(plugin) | ||
| ``` | ||
|
|
||
| `assert_valid_plugin` checks that the plugin's config inherits from `ConfigBase` and that the implementation class inherits from the appropriate base for its plugin type (`ConfigurableTask` for column generators, `SeedReader` for seed readers). |
There was a problem hiding this comment.
Codex caught this: assert_valid_plugin doesn't actually have a PROCESSOR branch in engine/testing/utils.py, so for processor plugins it only checks the config inherits ConfigBase. The impl class isn't type-checked at all. With the processor tab right above, it reads like processors are covered. Maybe add a PROCESSOR branch checking issubclass(impl_cls, Processor), or call it out here? wdyt
|
|
||
| NVIDIA-maintained plugin packages live in the [DataDesignerPlugins](https://github.com/NVIDIA-NeMo/DataDesignerPlugins) repository. The catalog below is embedded from the generated [DataDesignerPlugins catalog](https://github.com/NVIDIA-NeMo/DataDesignerPlugins/blob/main/docs/catalog.md). | ||
|
|
||
| --8<-- "https://raw.githubusercontent.com/NVIDIA-NeMo/DataDesignerPlugins/main/docs/catalog.md:5:" |
There was a problem hiding this comment.
nit: worth double-checking that catalog URL resolves before merge, since pymdownx.snippets will hard-fail the build on a 404. also slightly nervous that every docs build now depends on DataDesignerPlugins/main being healthy. pinning to a tag or vendoring a snapshot might be safer, wdyt?
|
Bundling the code reference reorg with plugins-graduation makes sense given plugin authors now need to navigate |
Summary
Updates the docs around two related areas: plugin authoring now that the extension points are no longer experimental, and the code reference section so APIs are grouped by package/layer and have enough context to be useful from the docs site.
Changes
Added
docs/plugins/build_your_own.mdas the consolidated guide for building column generator, seed reader, and processor plugins.docs/plugins/models.mdfor model-backed plugin patterns and model registry usage.docs/code_reference/config/,docs/code_reference/engine/, anddocs/code_reference/interface/, with overview pages for each group.__init__.pyfiles in engineresourcesandprocessingsubpackages so mkdocstrings/griffe can discover seed reader and processor classes.Changed
docs/plugins/available.md.mkdocs.ymlby Config, Engine, and Interface, with updated cross-links from concepts and recipes.Removed
example.md,filesystem_seed_reader.md,processor.md) with the consolidated Build Your Own guide and targeted reference pages.Attention Areas
mkdocs.yml- navigation moved from flat reference pages to package groups, and remote markdown snippets are now enabled for the plugin catalog table.docs/plugins/available.md- the NVIDIA plugin table is pulled from the raw DataDesignerPlugins catalog during docs builds.docs/code_reference/- page paths and anchors changed as part of the Config / Engine / Interface split.packages/data-designer-*/src/data_designer/docstrings and discovery shims - these are mostly documentation-facing changes, but they affect what mkdocstrings exposes in the generated docs.Validation
uv run --group docs mkdocs buildDescription updated with AI