Skip to content

Unused unnested column is not pruned #20118

@simonvandel

Description

@simonvandel

Describe the bug

When a column from a UnnestExec is not used, it is not removed

To Reproduce

Run datafusion-cli against this query:

CREATE TABLE test_table (
    id INT,
    value INT
) AS VALUES (1, 100), (2, 200), (3, 300);

EXPLAIN SELECT id
FROM (
    SELECT id, UNNEST(make_array(1, 2, 3)) as elem
    FROM test_table
)
GROUP BY id;

This currently generates this plan

+---------------+-------------------------------+
| plan_type     | plan                          |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
|               | │       AggregateExec       │ |
|               | │    --------------------   │ |
|               | │        group_by: id       │ |
|               | │                           │ |
|               | │           mode:           │ |
|               | │      FinalPartitioned     │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │      RepartitionExec      │ |
|               | │    --------------------   │ |
|               | │ partition_count(in->out): │ |
|               | │          14 -> 14         │ |
|               | │                           │ |
|               | │    partitioning_scheme:   │ |
|               | │      Hash([id@0], 14)     │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       AggregateExec       │ |
|               | │    --------------------   │ |
|               | │        group_by: id       │ |
|               | │       mode: Partial       │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │      RepartitionExec      │ |
|               | │    --------------------   │ |
|               | │ partition_count(in->out): │ |
|               | │          1 -> 14          │ |
|               | │                           │ |
|               | │    partitioning_scheme:   │ |
|               | │    RoundRobinBatch(14)    │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       ProjectionExec      │ |
|               | │    --------------------   │ |
|               | │           id: id          │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │         UnnestExec        │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       ProjectionExec      │ |
|               | │    --------------------   │ |
|               | │    __unnest_placeholder   │ |
|               | │    (make_array(Int64(1    │ |
|               | │   ),Int64(2),Int64(3))):  │ |
|               | │         [1, 2, 3]         │ |
|               | │                           │ |
|               | │           id: id          │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       DataSourceExec      │ |
|               | │    --------------------   │ |
|               | │         bytes: 224        │ |
|               | │       format: memory      │ |
|               | │          rows: 1          │ |
|               | └───────────────────────────┘ |
|               |                               |
+---------------+-------------------------------+

Expected behavior

I would have liked the UnnestExec to be removed entirely since it is not used further up the plan.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions