Skip to content

Unused Window functions experssion is wrongly removed from LogicalPlan during optimalization #2542

@mateuszkj

Description

@mateuszkj

Describe the bug
Unused window function is wrongly removed from Logical Plan during optimization. It leaves WindowAgg empty, which causes Impossibly got empty window expression error.

To Reproduce

RUST_LOG=debug datafusion-cli

WITH _sample_data AS (
  SELECT 1 as a, 'aa' AS b
  UNION ALL
  SELECT 3 as a, 'aa' AS b
  UNION ALL
  SELECT 5 as a, 'bb' AS b
  UNION ALL
  SELECT 7 as a, 'bb' AS b
  
), _data2 AS (
  SELECT
    row_number() OVER (PARTITION BY s.b ORDER BY s.a) AS seq,
    s.a,
    s.b
  FROM _sample_data s
)
SELECT d.b, MAX(d.a) AS max_a
FROM _data2 d
GROUP BY d.b
ORDER BY d.b;

-- Result
-- Error: Impossibly got empty window expression

Expected behavior

With above SQL i should got:

-- Result
| b  | max_a |
+----+-------+
| aa | 3     | 
| bb | 7     | 
+----+-------+

Additional context

Input logical plan:

    Sort: #d.b ASC NULLS LAST
      Projection: #d.b, #MAX(d.a) AS max_a
        Aggregate: groupBy=[[#d.b]], aggr=[[MAX(#d.a)]]
          Projection: #_data2.seq, #_data2.a, #_data2.b, alias=d
            Projection: #ROW_NUMBER() PARTITION BY [#s.b] ORDER BY [#s.a ASC NULLS LAST] AS seq, #s.a, #s.b, alias=_data2
              WindowAggr: windowExpr=[[ROW_NUMBER() PARTITION BY [#s.b] ORDER BY [#s.a ASC NULLS LAST]]]
                Projection: #_sample_data.a, #_sample_data.b, alias=s
                  Union
                    Projection: Int64(1) AS a, Utf8("aa") AS b
                      EmptyRelation
                    Projection: Int64(3) AS a, Utf8("aa") AS b
                      EmptyRelation
                    Projection: Int64(5) AS a, Utf8("bb") AS b
                      EmptyRelation
                    Projection: Int64(7) AS a, Utf8("bb") AS b
                      EmptyRelation

Optimized logical plan:

    Sort: #d.b ASC NULLS LAST
      Projection: #d.b, #MAX(d.a) AS max_a
        Aggregate: groupBy=[[#d.b]], aggr=[[MAX(#d.a)]]
          Projection: #_data2.a, #_data2.b, alias=d
            Projection: #s.a, #s.b, alias=_data2
              WindowAggr: windowExpr=[[]]  <-------- Here is empty expression
                Projection: #_sample_data.a, #_sample_data.b, alias=s
                  Union
                    Projection: Int64(1) AS a, Utf8("aa") AS b
                      EmptyRelation
                    Projection: Int64(3) AS a, Utf8("aa") AS b
                      EmptyRelation
                    Projection: Int64(5) AS a, Utf8("bb") AS b
                      EmptyRelation
                    Projection: Int64(7) AS a, Utf8("bb") AS b
                      EmptyRelation

We can see windowExpr is empty after optimization. I have tested this case on master ('a825891').

I don't use seq column in last CTE and this causes error. WIth below SQL it's work

WITH _sample_data AS (
  SELECT 1 as a, 'aa' AS b
  UNION ALL
  SELECT 3 as a, 'aa' AS b
  UNION ALL
  SELECT 5 as a, 'bb' AS b
  UNION ALL
  SELECT 7 as a, 'bb' AS b
  
), _data2 AS (
  SELECT
    row_number() OVER (PARTITION BY s.b ORDER BY s.a) AS seq,
    s.a,
    s.b
  FROM _sample_data s
)
SELECT d.b, MAX(d.a) AS max_a, max(d.seq)
FROM _data2 d
GROUP BY d.b
ORDER BY d.b;

-- Resut:
| b  | max_a | MAX(d.seq) |
+----+-------+------------+
| aa | 3     | 2          |
| bb | 7     | 2          |
+----+-------+------------+

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions