-
Notifications
You must be signed in to change notification settings - Fork 62
Description
Priority Level
High (Major functionality broken)
Describe the bug
Affected code: data_designer/config/utils/visualization.py, lines 200–206
Description:
The display_sample_record function builds the "Generated Columns" table by querying a hardcoded list of built-in column types:
non_code_columns = (
config_builder.get_columns_of_type(DataDesignerColumnType.SAMPLER)
+ config_builder.get_columns_of_type(DataDesignerColumnType.EXPRESSION)
+ config_builder.get_columns_of_type(DataDesignerColumnType.LLM_TEXT)
+ config_builder.get_columns_of_type(DataDesignerColumnType.LLM_STRUCTURED)
+ config_builder.get_columns_of_type(DataDesignerColumnType.EMBEDDING)
+ config_builder.get_columns_of_type(DataDesignerColumnType.CUSTOM)
)
Plugin column types (registered via PluginManager / entry points) are not included in this list. As a result, any column generated by a plugin is silently skipped in the rich display output, even though the data is present in the underlying DataFrame.
Inconsistency: The codebase already accounts for plugin column types elsewhere. In data_designer/config/column_types.py, get_column_display_order() correctly appends plugin types:
def get_column_display_order() -> list[DataDesignerColumnType]:
display_order = [
DataDesignerColumnType.SEED_DATASET,
DataDesignerColumnType.SAMPLER,
# ... built-in types ...
DataDesignerColumnType.CUSTOM,
]
display_order.extend(plugin_manager.get_plugin_column_types(DataDesignerColumnType))
return display_order
But display_sample_record in visualization.py does not use get_column_display_order() — it maintains its own separate, incomplete list.
Suggested fix: Replace the hardcoded list in display_sample_record with a call to get_column_display_order() (filtering out SEED_DATASET), or at minimum append plugin_manager.get_plugin_column_types(DataDesignerColumnType) to the non_code_columns query. Something like:
from data_designer.config.column_types import get_column_display_order, DataDesignerColumnType
display_types = [
t for t in get_column_display_order()
if t not in (DataDesignerColumnType.SEED_DATASET,
DataDesignerColumnType.LLM_CODE,
DataDesignerColumnType.VALIDATION,
DataDesignerColumnType.LLM_JUDGE)
]
non_code_columns = []
for col_type in display_types:
non_code_columns.extend(config_builder.get_columns_of_type(col_type))
Reproduction: Register any plugin column type via entry points, add it to a config builder, run preview(), and call preview_result.display_sample_record(). The plugin column will be absent from the rich output despite being present in preview_result.dataset.
Steps/Code to reproduce bug
Register any plugin column type via entry points, add it to a config builder, run preview(), and call preview_result.display_sample_record(). The plugin column will be absent from the rich output despite being present in preview_result.dataset.
Expected behavior
plugin-generated columns should show up in a sample record
Additional context
No response