Skip to content

docs: graduate plugins out of experimental mode#603

Open
johnnygreco wants to merge 17 commits intomainfrom
johnny/docs/plugins-out-of-experimental-mode
Open

docs: graduate plugins out of experimental mode#603
johnnygreco wants to merge 17 commits intomainfrom
johnny/docs/plugins-out-of-experimental-mode

Conversation

@johnnygreco
Copy link
Copy Markdown
Contributor

@johnnygreco johnnygreco commented May 4, 2026

Summary

Updates the docs around two related areas: plugin authoring now that the extension points are no longer experimental, and the code reference section so APIs are grouped by package/layer and have enough context to be useful from the docs site.

Changes

Added

  • docs/plugins/build_your_own.md as the consolidated guide for building column generator, seed reader, and processor plugins.
  • docs/plugins/models.md for model-backed plugin patterns and model registry usage.
  • Package/layer-oriented code reference pages under docs/code_reference/config/, docs/code_reference/engine/, and docs/code_reference/interface/, with overview pages for each group.
  • __init__.py files in engine resources and processing subpackages so mkdocstrings/griffe can discover seed reader and processor classes.

Changed

  • Reworked the Plugins nav and overview around Overview, Build Your Own, Using Models, and Available Plugins.
  • Embedded the NVIDIA-maintained plugin catalog table from the DataDesignerPlugins repo in docs/plugins/available.md.
  • Reorganized the Code Reference nav in mkdocs.yml by Config, Engine, and Interface, with updated cross-links from concepts and recipes.
  • Expanded and corrected docstrings for plugin extension points, config objects, generators, seed readers, processors, interface classes, and analysis/config references so generated docs render with useful field and method descriptions.
  • Improved code reference table styling so wide generated tables remain readable on narrower viewports.

Removed

  • Replaced the older plugin example pages (example.md, filesystem_seed_reader.md, processor.md) with the consolidated Build Your Own guide and targeted reference pages.
  • Replaced the older flat code reference pages with package-grouped code reference pages.

Attention Areas

Reviewers: Please pay special attention to the following:

  • mkdocs.yml - navigation moved from flat reference pages to package groups, and remote markdown snippets are now enabled for the plugin catalog table.
  • docs/plugins/available.md - the NVIDIA plugin table is pulled from the raw DataDesignerPlugins catalog during docs builds.
  • docs/code_reference/ - page paths and anchors changed as part of the Config / Engine / Interface split.
  • packages/data-designer-*/src/data_designer/ docstrings and discovery shims - these are mostly documentation-facing changes, but they affect what mkdocstrings exposes in the generated docs.

Validation

  • uv run --group docs mkdocs build

Description updated with AI

@johnnygreco johnnygreco requested a review from a team as a code owner May 4, 2026 18:50
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Docs preview: https://cac89582.dd-docs-preview.pages.dev

Notebook tutorials are placeholder-only in previews.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Review: PR #603docs: graduate plugins out of experimental mode

Summary

This PR reorganizes plugin documentation now that the three plugin extension points (column generator, seed reader, processor) are stable. It removes the "Experimental Feature" banners, replaces the single plugins/example.md walkthrough with per-type implementation guides (implement.md, processor.md, expanded filesystem_seed_reader.md), and adds proper API reference pages (code_reference/plugins.md, code_reference/generators.md). The plugins/available.md page becomes a real "Catalog" that points to the NVIDIA-maintained DataDesignerPlugins repo.

Code changes are minimal and non-functional:

  • Docstrings added to Plugin, PluginType, SingleColumnConfig.allow_resize, ProcessorConfig.processor_type, SeedSource, FileSystemSeedSource, ColumnGeneratorCellByCell, and ColumnGeneratorFullColumn.
  • A typo fix in Plugin.config_qualified_name's description ("name o the""name of the").
  • Three empty __init__.py files under data_designer.engine so griffe/mkdocstrings can resolve SeedReader, FileSystemSeedReader, and Processor for the new reference pages.
  • mkdocs.yml nav reshuffle + alphabetizing of Code Reference pages.

Scope matches what the PR description claims.

Findings

Correctness

  • __init__.py additions respect the namespace-package invariant. AGENTS.md pins the PEP 420 rule at the top-level data_designer namespace. The three new files live under data_designer.engine.resources, data_designer.engine.processing, and data_designer.engine.processing.processors, all inside a single distribution package — this is the normal way to expose subpackages for griffe and does not break the cross-distribution namespace merge. No concern.
  • Nav + anchors cross-link correctly. Spot-checked docs/code_reference/plugins.md {#data_designer.plugins.plugin.Plugin} anchor against the references in docs/plugins/overview.md, docs/plugins/implement.md, docs/code_reference/generators.md, and the column_configs.md addition for SingleColumnConfig. All the page-relative links (../plugins/implement.md, ../code_reference/generators.md, etc.) match the new filenames in this PR.
  • Plugin description typo fix matches the docstring fix. Both say "name of the …" now. Good.
  • Env-var documentation. overview.md states DISABLE_DATA_DESIGNER_PLUGINS=true disables entry point discovery. Verify the name matches the actual variable in the discovery code — docs of this kind rot silently if the flag is renamed.

Example code quality

  • Processor example uses astype(str).apply(lambda …) in both implement.md and processor.md. Idiomatic pandas would be data[self.config.column].astype(str).str.contains(self.config.pattern, regex=True) (optionally pre-compiled is unnecessary when using the Series accessor). As a "minimum working example" it's fine; a short note that vectorized .str.contains is preferable for real workloads would help new plugin authors.
  • get_column_emoji() returns "x" in implement.md where the old example.md used "✖️". Intentional simplification is fine, but x looks like a placeholder — consider a real emoji so readers don't copy a bare letter into their log output.
  • Import style is consistent across the three tab examples: from __future__ import annotations + TYPE_CHECKING for pandas when only used in annotations. Good — this mirrors the style guide's fast-import guidance.
  • Multiple-plugins-per-package section dropped the tests_e2e reference. The removed example.md pointed at tests_e2e/ as a concrete example of this pattern; implement.md's "Multiple plugins in one package" section just shows a TOML snippet. If that e2e directory is still a working example, add the link back — it's a cheap pointer that saves plugin authors from guessing.

Documentation accuracy

  • assert_valid_plugin coverage. implement.md says "Data Designer provides a testing utility for common plugin structure checks" and shows a single example. The deleted example.md explicitly listed what it validates ("config is subclass of ConfigBase", etc.). The new, terser wording is fine for most readers, but the deleted enumeration was genuinely useful for the "what will this catch?" question. Worth preserving one sentence about it.
  • Discovery troubleshooting bullets in overview.md are good and concrete (discriminator must be a string, regex-filterREGEX_FILTER, etc.). This replaces an entire "Experimental" framing with something actionable — nice improvement.
  • Processor callback table in processor.md accurately lines up with the three process_* methods. The async-engine caveat note about row-count-changing pre/post-batch processors under DATA_DESIGNER_ASYNC_ENGINE=1 is a useful landmine callout.

Style / conventions

  • Docstring additions follow the existing Attributes-block format used elsewhere in base.py and plugin.py. No drift.
  • New __init__.py files use the right SPDX header (2026, Apache-2.0). Consistent with other engine packages.
  • mkdocs.yml alphabetization applies to Code Reference only, not to the top-level nav, which matches the comment # Keep code reference pages ordered alphabetically by nav label.. Confirm whether "Plugins" in the top-level nav should similarly list Catalog before Build Your Own (it currently goes Overview → Build Your Own → Catalog, which reads as a natural user journey and is probably preferable to alphabetical).

Risks

  • Low. This is almost entirely docs plus docstrings. The only runtime-observable change is the three new __init__.py files; because they are beneath a single installable package and do not introduce a data_designer/__init__.py, they cannot break the cross-package namespace merge.
  • One residual risk: if any tooling elsewhere in the repo relied on those three directories being implicit namespace packages (unlikely but worth a grep for pkgutil/find_namespace_packages usage around data_designer.engine.processing), it should still work — explicit subpackages are a strict superset of namespace behavior inside a single distribution.

Suggestions (non-blocking)

  1. Confirm DISABLE_DATA_DESIGNER_PLUGINS matches the env var the discovery code actually reads.
  2. Replace the "x" emoji placeholder in implement.md's column generator example with a real emoji.
  3. Prefer Series.str.contains(...) over astype(str).apply(lambda …) in the processor examples, or add a one-line note that the .apply form is for illustration.
  4. Restore the tests_e2e/ pointer in the "Multiple plugins in one package" section.
  5. Optional: keep one sentence listing what assert_valid_plugin actually checks, since the deleted page had that and it's useful signal.

Verdict

Looks good to merge. The restructure is a clear improvement over the old single-example layout, the docstring additions are well-scoped and match existing style, and the __init__.py additions are the correct fix for mkdocstrings discovery without breaking the namespace-package invariant. The suggestions above are small polish items and none of them should block this PR.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 4, 2026

Greptile Summary

This PR graduates the plugin system out of experimental status by consolidating the plugin authoring guides into build_your_own.md and models.md, restructuring the Code Reference nav into Config / Engine / Interface groups, and correcting docstrings throughout the config and engine packages.

  • Docs reorganization: Removed the three older experimental plugin pages, added build_your_own.md (covers column generators, seed readers, and processors) and models.md (model registry access patterns); restructured Code Reference from a flat list into config/, engine/, and interface/ subdirectories with corresponding nav entries in mkdocs.yml.
  • Python source changes: Added empty __init__.py files to engine/processing/ and engine/resources/ subpackages so mkdocstrings can discover the classes; corrected docstrings across plugin.py, base.py, processors.py, column_generators/base.py, and related config files, including fixing the actual artifact directory names (dropped-columns-parquet-files, processors-files).
  • Remote snippet: docs/plugins/available.md now embeds the NVIDIA plugin catalog from a raw GitHub URL via pymdownx.snippets with url_download: true; this adds a network dependency to docs builds.

Confidence Score: 5/5

This is a documentation-only restructuring with no behavioral changes to library code; safe to merge.

All Python source changes are limited to docstring corrections and empty init.py discovery shims. No logic paths, data flows, or runtime behavior changed. Link updates and nav restructuring are internally consistent. Docstring corrections align with the actual artifact directory names found in the engine code.

No files require special attention.

Important Files Changed

Filename Overview
mkdocs.yml Nav restructured from flat Code Reference list into Config/Engine/Interface groups; deleted plugin pages replaced with build_your_own and models; url_download: true added to pymdownx.snippets to enable remote catalog embedding.
docs/plugins/build_your_own.md New consolidated plugin authoring guide covering column generators, seed readers, and processors with correct import paths and base classes verified against engine source.
docs/plugins/available.md Replaced placeholder with live plugin catalog; embeds remote GitHub raw URL via pymdownx snippets slice syntax (:5:) requiring the url_download: true flag added in mkdocs.yml.
docs/plugins/models.md New guide for model-backed plugin patterns; import paths and base class usage (ColumnGeneratorWithModel, ColumnGeneratorWithModelRegistry) match actual engine classes.
packages/data-designer-config/src/data_designer/plugins/plugin.py Added class-level docstrings to PluginType and Plugin; fixed typo in config_qualified_name field description.
packages/data-designer-config/src/data_designer/config/processors.py Corrected docstring directory names to match actual artifact storage constants: dropped-columns-parquet-files and processors-files, both verified in artifact_storage.py.
packages/data-designer-engine/src/data_designer/engine/column_generators/generators/base.py Added class docstrings to ColumnGeneratorCellByCell and ColumnGeneratorFullColumn, and abstract generate() method docstrings; no behavioral changes.
packages/data-designer-engine/src/data_designer/engine/processing/init.py New empty init.py to make the processing subpackage importable by mkdocstrings/griffe for doc generation.
packages/data-designer-config/src/data_designer/config/seed_source.py Added class docstring to FileSystemSeedSource with accurate field descriptions for path, file_pattern, and recursive.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Plugin package published] --> B[Entry point registered\ndata_designer.plugins group]
    B --> C{import data_designer}
    C --> D[Discovery scans installed\nentry points]
    D --> E{DISABLE_DATA_DESIGNER_PLUGINS?}
    E -- true --> F[Discovery skipped]
    E -- false --> G[Load Plugin objects]
    G --> H{PluginType?}
    H -- COLUMN_GENERATOR --> I[Register config discriminator\nin column_type union]
    H -- SEED_READER --> J[Register config discriminator\nin seed_type union]
    H -- PROCESSOR --> K[Register config discriminator\nin processor_type union]
    I --> L[Config builder add_column]
    J --> M[Seed source resolution]
    K --> N[Config builder add_processor]
Loading

Reviews (6): Last reviewed commit: "docs: clarify plugin model usage" | Re-trigger Greptile

Comment thread mkdocs.yml Outdated
Comment thread docs/plugins/overview.md Outdated

### 3. Install and Test Locally
- Install your plugin locally with `uv pip install -e .` from the plugin package directory.
- No publishing is required. Your plugin is usable immediately after a local install.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex caught this: "usable immediately after a local install" is only true in a fresh Python process. PluginRegistry snapshots entry points on first init (registry.py:33), and column_types/processor_types/seed_source_types/DEFAULT_SEED_READERS all inject plugins at module-import time. Notebook authors who uv pip install -e . after import data_designer will get no discovery without restarting the kernel. wdyt about a one-line restart caveat?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The overview no longer has the old local-install walkthrough, but the current “Use an Installed Plugin” paragraph still needed the same cache caveat. I added a line noting that if data_designer has already been imported, users should restart the Python process so discovery rebuilds from the new entry points.

uv pip install -e .
```

The editable install registers the `data_designer.plugins` entry point so Data Designer can discover the plugin.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same caveat as on overview.md:58 — "the editable install registers the entry point so Data Designer can discover the plugin" is true at first import only. PluginRegistry discovers once and caches; the column/processor/seed-source unions are built at import. Worth a one-liner about kernel restart since iterative install in a notebook is the obvious workflow. (Also applies to processor.md:136, which has the same line.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered now in the install section directly below this sentence: it notes that Data Designer caches the plugin registry on first import, and tells notebook users to restart the kernel or interpreter after uv pip install -e .. The old processor.md page was folded into build_your_own.md, so there is no second copy left to update.

@andreatgretel
Copy link
Copy Markdown
Contributor

docs/code_reference/plugins.md:5 plus the three new __init__.py files (engine/resources/, engine/processing/, engine/processing/processors/)

The PR description says these __init__.py files exist "so griffe (mkdocstrings) can discover SeedReader, FileSystemSeedReader, and Processor for the new code reference." But I couldn't find any ::: directive in docs/ that targets data_designer.engine.resources.* or data_designer.engine.processing.* — those classes only appear inside code-block from … import … statements, which mkdocstrings doesn't process. Codex flagged the same thing.

As shipped, only column generators get an actual mkdocstrings-rendered API reference (via code_reference/generators.md). Processor and seed-reader authors still have prose examples but no auto-rendered base-class reference like the PR description promises.

Either land code_reference/seed_readers.md and code_reference/processors.md (engine-side) here to mirror the generators.md pattern, or drop the __init__.py files and the docstring-only churn on Processor/SeedReader/FileSystemSeedReader/SeedSource/FileSystemSeedSource/ProcessorConfig until a follow-up PR delivers the rendered pages. wdyt?

johnnygreco added a commit that referenced this pull request May 5, 2026
- Add an Implementation base section to code_reference/processors.md
  rendering the engine-side Processor class. This justifies the
  engine/processing/__init__.py files added earlier and gives
  processor plugin authors an auto-rendered API reference, matching
  the pattern used by code_reference/generators.md and seed_readers.md.
- build_your_own.md: replace the placeholder "x" emoji on the
  IndexMultiplier example with the actual multiplication sign.
- build_your_own.md: drop the manual `re.compile + apply(lambda)`
  pattern in the regex-filter processor in favor of the idiomatic
  `Series.str.contains(..., regex=True)`.
- build_your_own.md: add a kernel-restart caveat after the editable
  install instructions — PluginRegistry caches discovery on first
  import, so notebooks need a fresh kernel to pick up freshly
  installed plugins.
- build_your_own.md: state explicitly what `assert_valid_plugin`
  checks (config base + plugin-type-appropriate impl base).
- code_reference/plugins.md: link out to the processors code
  reference alongside generators and seed_readers.
@johnnygreco
Copy link
Copy Markdown
Contributor Author

@andreatgretel Re: #603 (comment)

This is addressed in the current PR state. The branch now has docs/code_reference/engine/seed_readers.md and docs/code_reference/engine/processors.md, both included under the Engine code-reference nav, and both pages contain mkdocstrings ::: directives for the relevant engine-side classes. I kept the __init__.py files and docstring work because those rendered reference pages now exist.

johnnygreco added 14 commits May 5, 2026 16:30
Griffe (used by mkdocstrings) skips directories without __init__.py
when resolving module paths, which prevented the new plugins code
reference from rendering SeedReader, FileSystemSeedReader, and
Processor. Adding empty __init__.py files in engine/resources/,
engine/processing/, and engine/processing/processors/ aligns with
the convention already used in engine/mcp/, engine/models/, etc.
Plugin authors now see meaningful descriptions for every field and
method on the bases rendered in the plugins code reference:

- Plugin and PluginType: class docstrings + Attributes tables for
  fields and enum members; fix typo in config_qualified_name field
  description.
- SingleColumnConfig: document allow_resize.
- ProcessorConfig: document processor_type discriminator.
- SeedSource: document seed_type discriminator.
- FileSystemSeedSource: add class docstring + Attributes table for
  path / file_pattern / recursive.
- ColumnGeneratorFullColumn and ColumnGeneratorCellByCell: add
  class docstrings explaining when to use each base, plus method
  docstrings on the abstract generate() implementations.
Restructures plugin documentation around the now-stable extension
points (column generator, seed reader, processor) and treats plugins
as a first-class story for customizing Data Designer.

- Add code_reference/plugins.md: single-stop reference for the Plugin
  object and the config + implementation base classes used by all
  three plugin types.
- Add code_reference/generators.md: column generator implementation
  base classes, separated from column configs.
- Surface SingleColumnConfig in code_reference/column_configs.md.
- Add plugins/implement.md ("Build Your Own"): per-type implementation
  instructions across column generators, seed readers, and processors.
- Add plugins/processor.md: complete processor plugin package example.
- Rewrite plugins/overview.md: open with why plugins exist, drop the
  internal-helpers note (PluginRegistry / PluginManager), and focus
  the guide on what plugin builders need.
- Refresh plugins/available.md (Catalog) and
  plugins/filesystem_seed_reader.md to match the new structure.
- Delete plugins/example.md (replaced by per-type guides).
- Reorder Code Reference nav alphabetically and add the new pages.
- Minor link / wording fixes in concepts/processors.md and
  concepts/deployment-options.md.
Replace the overview's how-to walkthrough and the per-type plugin
guides with a single Build Your Own page that covers all three
plugin types side-by-side. Add a dedicated Using Models in Plugins
guide and a seed_readers code reference, and trim the overview down
to what the plugin types are, how to use one, and how discovery
works.

- Rename plugins/implement.md to plugins/build_your_own.md.
- Delete plugins/filesystem_seed_reader.md and plugins/processor.md
  (their content is now in build_your_own.md and the per-type code
  references).
- Add plugins/models.md for model-backed column generator authoring.
- Add code_reference/seed_readers.md for seed reader implementation
  base classes.
- Rewrite plugins/overview.md: shorter intro, type bullets link to
  the relevant code reference, drop the multi-step "How do you
  create plugins" walkthrough in favor of a single Build a Plugin
  pointer, tighten Discovery troubleshooting.
- Refresh plugins/available.md (Available Plugins): point to the
  DataDesignerPlugins catalog and explain how to request a community
  listing.
- Update cross-page links in concepts/processors.md,
  concepts/seed-datasets.md, recipes/plugin_development/markdown_seed_reader.md,
  code_reference/plugins.md, and code_reference/generators.md to
  match the new structure.
- Update mkdocs.yml nav: rename to Build Your Own, add Using Models,
  add seed_readers code reference.
Code-heavy reference tables (plugin bases, column generators, etc.)
were wrapping aggressively on narrow viewports, breaking long
identifiers across multiple lines. Switch the table container to
horizontal overflow and prevent code cells from wrapping so
identifiers stay readable.
- Add an Implementation base section to code_reference/processors.md
  rendering the engine-side Processor class. This justifies the
  engine/processing/__init__.py files added earlier and gives
  processor plugin authors an auto-rendered API reference, matching
  the pattern used by code_reference/generators.md and seed_readers.md.
- build_your_own.md: replace the placeholder "x" emoji on the
  IndexMultiplier example with the actual multiplication sign.
- build_your_own.md: drop the manual `re.compile + apply(lambda)`
  pattern in the regex-filter processor in favor of the idiomatic
  `Series.str.contains(..., regex=True)`.
- build_your_own.md: add a kernel-restart caveat after the editable
  install instructions — PluginRegistry caches discovery on first
  import, so notebooks need a fresh kernel to pick up freshly
  installed plugins.
- build_your_own.md: state explicitly what `assert_valid_plugin`
  checks (config base + plugin-type-appropriate impl base).
- code_reference/plugins.md: link out to the processors code
  reference alongside generators and seed_readers.
@johnnygreco johnnygreco force-pushed the johnny/docs/plugins-out-of-experimental-mode branch from 08d153d to 1e93465 Compare May 5, 2026 16:30
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@johnnygreco
Copy link
Copy Markdown
Contributor Author

@nabinchha could you take a close look at the Using Models in Plugins section?

Link: https://github.com/NVIDIA-NeMo/DataDesigner/blob/johnny/docs/plugins-out-of-experimental-mode/docs/plugins/models.md

We want to establish a good pattern for using models in plugins, especially the recommended base class split for single-model vs multi-model generators and the alias validation / health-check behavior.

@nabinchha
Copy link
Copy Markdown
Contributor

@johnnygreco — took a close pass at docs/plugins/models.md. Cross-checked it against the engine code; the base-class split and the discovery → health-check chain are described correctly. A few things worth tightening, plus one engine-side limitation that came out of the review.

Things I'd change in this PR

  1. Tighten the wording around model_alias being required. The doc currently says "the config should keep a primary model_alias field because startup health checks collect that field…". Because _run_model_health_check_if_needed does model_aliases.add(config.model_alias) unconditionally for any column type whose impl inherits ColumnGeneratorWithModelRegistry, it's effectively required, not advisory — without it, plugin users get an AttributeError from inside the health-check loop before the friendlier registry "alias not found" error ever runs. Suggest something like "The config must include a model_alias: str field — startup health checks read it directly off any column config whose generator inherits from ColumnGeneratorWithModelRegistry (including via ColumnGeneratorWithModel)."

  2. Show the PairwiseJudgeColumnConfig alongside the multi-model generator example. The single-model example shows both halves; the multi-model one only imports PairwiseJudgeColumnConfig from data_designer_pairwise_judge.config, which makes it harder for readers to see that the config defines both model_alias and judge_model_alias. A small config snippet (or an inline comment) closes the loop with point 1 and makes it visually obvious which alias gets the standard health check vs which one only gets the _validate() resolution.

  3. Sharpen the alias-validation note. "Validate additional alias fields in _validate()… so missing aliases fail before generation starts" is true, but readers may infer a model health check happens. Something like "get_model_config(alias) only verifies the alias is registered; it does not call the endpoint. Endpoint reachability is only exercised for the primary model_alias collected by the standard startup health check."

  4. Tiny copy nit: "The engine already builds a ResourceProvider for each generator" reads as one-per-generator; in practice it's one ResourceProvider per builder shared with each generator. Easy fix: "…builds a ResourceProvider and exposes its model registry to every generator at:".

Engine-side limitation surfaced by the review

The reason point 3 needs to exist at all is that secondary aliases on a packaged plugin config can't be opted into the standard startup health check today — only CustomColumnConfig.model_aliases (plural) is rolled in via an isinstance branch in the builder. For a packaged plugin with model_alias + judge_model_alias, only the primary alias gets the endpoint ping; the secondary alias's reachability and credentials only surface at first generation call.

I filed #606 to propose a small fix: a get_model_aliases() accessor on SingleColumnConfig that defaults to [self.model_alias] (preserving current behavior) and that plugin configs override to declare every alias they depend on. The builder's isinstance(config, CustomColumnConfig) branch collapses into the same path, and the docs in this PR can switch from "validate manually in _validate()" to "override get_model_aliases()". Happy to do that as a follow-up to #603 once the docs land, or fold it in if you'd rather ship them together.

Verdict on the docs alone: the substance is right, the four items above are polish, and the multi-model alias story will be much cleaner once #606 lands.

assert_valid_plugin(plugin)
```

`assert_valid_plugin` checks that the plugin's config inherits from `ConfigBase` and that the implementation class inherits from the appropriate base for its plugin type (`ConfigurableTask` for column generators, `SeedReader` for seed readers).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex caught this: assert_valid_plugin doesn't actually have a PROCESSOR branch in engine/testing/utils.py, so for processor plugins it only checks the config inherits ConfigBase. The impl class isn't type-checked at all. With the processor tab right above, it reads like processors are covered. Maybe add a PROCESSOR branch checking issubclass(impl_cls, Processor), or call it out here? wdyt

Comment thread docs/plugins/available.md

NVIDIA-maintained plugin packages live in the [DataDesignerPlugins](https://github.com/NVIDIA-NeMo/DataDesignerPlugins) repository. The catalog below is embedded from the generated [DataDesignerPlugins catalog](https://github.com/NVIDIA-NeMo/DataDesignerPlugins/blob/main/docs/catalog.md).

--8<-- "https://raw.githubusercontent.com/NVIDIA-NeMo/DataDesignerPlugins/main/docs/catalog.md:5:"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: worth double-checking that catalog URL resolves before merge, since pymdownx.snippets will hard-fail the build on a 404. also slightly nervous that every docs build now depends on DataDesignerPlugins/main being healthy. pinning to a tag or vendoring a snapshot might be safer, wdyt?

@andreatgretel
Copy link
Copy Markdown
Contributor

Bundling the code reference reorg with plugins-graduation makes sense given plugin authors now need to navigate engine.column_generators, engine.processing.processors, etc. as a real public surface. The thing I'd still flag: the reorg landed across ~10 commits and grew the diff from 19 files to 72, which makes reviewing harder and links the two for reverts. Not blocking, just wanted to surface it. A sentence in the PR description tying the two together would help future readers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants