Skip to content

display_sample_record() silently omits plugin-generated columns #345

@3mei

Description

@3mei

Priority Level

High (Major functionality broken)

Describe the bug

Affected code: data_designer/config/utils/visualization.py, lines 200–206

Description:
The display_sample_record function builds the "Generated Columns" table by querying a hardcoded list of built-in column types:

non_code_columns = (
    config_builder.get_columns_of_type(DataDesignerColumnType.SAMPLER)
    + config_builder.get_columns_of_type(DataDesignerColumnType.EXPRESSION)
    + config_builder.get_columns_of_type(DataDesignerColumnType.LLM_TEXT)
    + config_builder.get_columns_of_type(DataDesignerColumnType.LLM_STRUCTURED)
    + config_builder.get_columns_of_type(DataDesignerColumnType.EMBEDDING)
    + config_builder.get_columns_of_type(DataDesignerColumnType.CUSTOM)
)

Plugin column types (registered via PluginManager / entry points) are not included in this list. As a result, any column generated by a plugin is silently skipped in the rich display output, even though the data is present in the underlying DataFrame.

Inconsistency: The codebase already accounts for plugin column types elsewhere. In data_designer/config/column_types.py, get_column_display_order() correctly appends plugin types:

def get_column_display_order() -> list[DataDesignerColumnType]:
    display_order = [
        DataDesignerColumnType.SEED_DATASET,
        DataDesignerColumnType.SAMPLER,
        # ... built-in types ...
        DataDesignerColumnType.CUSTOM,
    ]
    display_order.extend(plugin_manager.get_plugin_column_types(DataDesignerColumnType))
    return display_order

But display_sample_record in visualization.py does not use get_column_display_order() — it maintains its own separate, incomplete list.

Suggested fix: Replace the hardcoded list in display_sample_record with a call to get_column_display_order() (filtering out SEED_DATASET), or at minimum append plugin_manager.get_plugin_column_types(DataDesignerColumnType) to the non_code_columns query. Something like:

from data_designer.config.column_types import get_column_display_order, DataDesignerColumnType

display_types = [
    t for t in get_column_display_order()
    if t not in (DataDesignerColumnType.SEED_DATASET,
                 DataDesignerColumnType.LLM_CODE,
                 DataDesignerColumnType.VALIDATION,
                 DataDesignerColumnType.LLM_JUDGE)
]
non_code_columns = []
for col_type in display_types:
    non_code_columns.extend(config_builder.get_columns_of_type(col_type))

Reproduction: Register any plugin column type via entry points, add it to a config builder, run preview(), and call preview_result.display_sample_record(). The plugin column will be absent from the rich output despite being present in preview_result.dataset.

Steps/Code to reproduce bug

Register any plugin column type via entry points, add it to a config builder, run preview(), and call preview_result.display_sample_record(). The plugin column will be absent from the rich output despite being present in preview_result.dataset.

Expected behavior

plugin-generated columns should show up in a sample record

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions