Skip to content

Aliased Window Expr enters unreachable code #14108

@berkaysynnada

Description

@berkaysynnada

Describe the bug

There is an unreachable!() line in PushDownFilter rule, where LogicalPlan::Window is being handled.

The following query enters there:

    pub(crate) const QUERY2: &str = "
    -- Define the subquery
    WITH SubQuery AS (
        SELECT
            a.column1,
            a.column2 AS ts_column,
            a.column3,
            SUM(a.column3) OVER (
                PARTITION BY a.column1
                ORDER BY a.column2 RANGE BETWEEN INTERVAL '10 minutes' PRECEDING AND CURRENT ROW
            ) AS moving_sum
        FROM source_table a
    )
    SELECT
        column1,
        ts_column,
        moving_sum
    FROM SubQuery
    WHERE moving_sum > 100
    ";

To Reproduce

    #[tokio::test]
    async fn test_window_alias_in_subquery() -> Result<()> {
        let schema = Arc::new(Schema::new(vec![
            Field::new("column1", DataType::Utf8, false),
            Field::new(
                "column2",
                DataType::Timestamp(TimeUnit::Nanosecond, None),
                false,
            ),
            Field::new("column3", DataType::Float64, false),
        ]));

        let column1 = Arc::new(StringArray::from(vec!["item1", "item2", "item1"]));
        let column2 = Arc::new(TimestampNanosecondArray::from(vec![
            1_000_000_000,
            2_000_000_000,
            3_000_000_000,
        ]));
        let column3 = Arc::new(Float64Array::from(vec![50.0, 30.0, 25.0]));
        let source_table =
            RecordBatch::try_new(schema.clone(), vec![column1, column2, column3])?;

        let mut ctx = SessionContext::new();
        ctx.register_batch("source_table", source_table)?;

        let df = ctx.sql(QUERY2).await?;
        let results = df.collect().await?;

        for batch in results {
            println!("{:?}", batch);
        }

        Ok(())
    }

Expected behavior

It should run without error.

Additional context

This query works successfully:

        SELECT
            a.column1,
            a.column2 AS ts_column,
            a.column3,
            SUM(a.column3) OVER (
                PARTITION BY a.column1
                ORDER BY a.column2 RANGE BETWEEN INTERVAL '10 minutes' PRECEDING AND CURRENT ROW
            ) AS moving_sum
        FROM source_table a

However, if it becomes a subquery, then it starts to fail.

Handling the alias expr in the if let part solves the problem, but I am not sure it is the correct way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions