You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Plugin column generators that inherit from ColumnGeneratorWithModelRegistry and depend on more than one model alias (e.g. a generator + judge pattern) cannot opt their secondary aliases into the standard startup model health check. The builder collects aliases for health checks like this:
This has two practical consequences for plugin authors:
Only the primary model_alias is endpoint-checked at startup. Secondary aliases (e.g. judge_model_alias, critic_model_alias) on a plugin config are never passed to ModelRegistry.run_health_check. A typo, missing API key, or unreachable endpoint on a secondary model only surfaces at first generation call, after partial work has potentially already happened.
config.model_alias is implicitly required. Any column type whose impl inherits from ColumnGeneratorWithModelRegistry triggers model_aliases.add(config.model_alias) unconditionally. A plugin config without a model_alias field raises AttributeError from inside _run_model_health_check_if_needed rather than a friendly registry error.
The current workaround in the new plugin docs (docs/plugins/models.md on #603) tells plugin authors to call self.get_model_config(alias) from _validate() for each secondary alias. That only verifies the alias is registered; it does not exercise the endpoint.
The CustomColumnConfig.model_aliases: list[str] field already proves the engine knows how to roll multiple aliases into the central health check — packaged plugins just don't have the same hook.
Describe the solution you'd like
Add a single overridable accessor on SingleColumnConfig:
# packages/data-designer-config/src/data_designer/config/column_configs.pyclassSingleColumnConfig(...):
defget_model_aliases(self) ->list[str]:
"""Return every model alias this column depends on. The startup health check uses this to decide which endpoints to ping. Override on configs that depend on more than one model. """alias=getattr(self, "model_alias", None)
return [alias] ifaliaselse []
Plugin configs that depend on more than one model override it:
All built-in model-backed configs continue to work unchanged — the default get_model_aliases() reads model_alias exactly like today.
Plugin authors with multi-model configs get the same endpoint-level health check as the primary alias, with no manual _validate() workaround.
Plugin configs without a model_alias field can override get_model_aliases() and stop crashing the health-check loop.
The isinstance(config, CustomColumnConfig) branch in _run_model_health_check_if_needed is removed.
The plugin docs (docs/plugins/models.md) update to recommend overriding get_model_aliases() instead of validating manually in _validate().
Describe alternatives you've considered
Pydantic field annotation — mark fields with Annotated[str, ModelAlias()] and have the builder walk model_fields. More declarative, but more machinery and a one-way door. Reasonable to revisit if a second consumer (fingerprinting, secret resolution, redaction) wants to enumerate aliases.
Classmethod on the generator impl — e.g. ColumnGeneratorWithModelRegistry.get_required_model_aliases(config). Indirected: alias values still come from config, so the impl ends up reading config fields anyway. Useful if dynamic alias selection (e.g. "only include judge_model_alias if enable_critic=True") becomes a real requirement.
Convention-based scan — auto-collect any field ending in _alias. Conflicts with tool_alias (MCP), surprises plugin authors who name fields differently, and fails the explicit-over-implicit smell test. Skipping.
Status quo: validate manually in _validate() — the current docs recommendation. Works for "is the alias registered?" but does not exercise the endpoint, so a bad credential on a secondary model surfaces only at generation time.
Agent Investigation
Current findings from the codebase:
Health check site: _run_model_health_check_if_needed in packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py:749 runs in build() (line 208) and build_preview() (line 245), before_initialize_generators_and_graph(). Health check therefore cannot rely on generator instances; alias enumeration has to live on the config.
Plugin discovery: column_type_is_model_generated in packages/data-designer-engine/src/data_designer/engine/column_generators/utils/generator_classification.py:32 flags any plugin whose impl_cls inherits from ColumnGeneratorWithModelRegistry as model-generated. This is the gate that turns on the config.model_alias access.
Existing parallel: CustomColumnConfig.model_aliases: list[str] | None (packages/data-designer-config/src/data_designer/config/custom_column.py:38) is already collected into the same health-check set via an isinstance check, which the proposed accessor cleanly subsumes.
Plugin docs context: surfaced in #603, specifically docs/plugins/models.md lines 83–89 and 155–159.
Additional context
This is a small, additive API change: one new method on SingleColumnConfig, one updated method on CustomColumnConfig, and a one-line change in the dataset builder.
It does not change any existing CLI, builder, or plugin entry-point surface.
It improves the docs story for the multi-model plugin pattern that docs: graduate plugins out of experimental mode #603 introduces — the models.md guidance can switch from "manually validate aliases in _validate()" to "override get_model_aliases() and the health check pings every alias".
Checklist
I've reviewed existing issues and the documentation
This is a design proposal, not a "please build this" request
Priority Level
Medium (Nice to have)
Is your feature request related to a problem? Please describe.
Plugin column generators that inherit from
ColumnGeneratorWithModelRegistryand depend on more than one model alias (e.g. a generator + judge pattern) cannot opt their secondary aliases into the standard startup model health check. The builder collects aliases for health checks like this:This has two practical consequences for plugin authors:
model_aliasis endpoint-checked at startup. Secondary aliases (e.g.judge_model_alias,critic_model_alias) on a plugin config are never passed toModelRegistry.run_health_check. A typo, missing API key, or unreachable endpoint on a secondary model only surfaces at first generation call, after partial work has potentially already happened.config.model_aliasis implicitly required. Any column type whose impl inherits fromColumnGeneratorWithModelRegistrytriggersmodel_aliases.add(config.model_alias)unconditionally. A plugin config without amodel_aliasfield raisesAttributeErrorfrom inside_run_model_health_check_if_neededrather than a friendly registry error.The current workaround in the new plugin docs (docs/plugins/models.md on #603) tells plugin authors to call
self.get_model_config(alias)from_validate()for each secondary alias. That only verifies the alias is registered; it does not exercise the endpoint.The
CustomColumnConfig.model_aliases: list[str]field already proves the engine knows how to roll multiple aliases into the central health check — packaged plugins just don't have the same hook.Describe the solution you'd like
Add a single overridable accessor on
SingleColumnConfig:Plugin configs that depend on more than one model override it:
CustomColumnConfigoverrides to absorb the existingisinstancespecial case in the builder:The builder collapses to one loop:
Behavior after the change:
get_model_aliases()readsmodel_aliasexactly like today._validate()workaround.model_aliasfield can overrideget_model_aliases()and stop crashing the health-check loop.isinstance(config, CustomColumnConfig)branch in_run_model_health_check_if_neededis removed.The plugin docs (
docs/plugins/models.md) update to recommend overridingget_model_aliases()instead of validating manually in_validate().Describe alternatives you've considered
Annotated[str, ModelAlias()]and have the builder walkmodel_fields. More declarative, but more machinery and a one-way door. Reasonable to revisit if a second consumer (fingerprinting, secret resolution, redaction) wants to enumerate aliases.ColumnGeneratorWithModelRegistry.get_required_model_aliases(config). Indirected: alias values still come from config, so the impl ends up reading config fields anyway. Useful if dynamic alias selection (e.g. "only includejudge_model_aliasifenable_critic=True") becomes a real requirement._alias. Conflicts withtool_alias(MCP), surprises plugin authors who name fields differently, and fails the explicit-over-implicit smell test. Skipping._validate()— the current docs recommendation. Works for "is the alias registered?" but does not exercise the endpoint, so a bad credential on a secondary model surfaces only at generation time.Agent Investigation
Current findings from the codebase:
_run_model_health_check_if_neededinpackages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py:749runs inbuild()(line 208) andbuild_preview()(line 245), before_initialize_generators_and_graph(). Health check therefore cannot rely on generator instances; alias enumeration has to live on the config.column_type_is_model_generatedinpackages/data-designer-engine/src/data_designer/engine/column_generators/utils/generator_classification.py:32flags any plugin whoseimpl_clsinherits fromColumnGeneratorWithModelRegistryas model-generated. This is the gate that turns on theconfig.model_aliasaccess.CustomColumnConfig.model_aliases: list[str] | None(packages/data-designer-config/src/data_designer/config/custom_column.py:38) is already collected into the same health-check set via anisinstancecheck, which the proposed accessor cleanly subsumes.docs/plugins/models.mdlines 83–89 and 155–159.Additional context
SingleColumnConfig, one updated method onCustomColumnConfig, and a one-line change in the dataset builder.models.mdguidance can switch from "manually validate aliases in_validate()" to "overrideget_model_aliases()and the health check pings every alias".Checklist