Skip to content

first_value doesn't work when applied to window function output #1300

@ntjohnson1

Description

@ntjohnson1

Describe the bug
A clear and concise description of what the bug is.
If I generate a column based on a window function then try to filter and select the first value it barfs

To Reproduce
Steps to reproduce the behavior:

import datafusion as dfn
from datafusion import lit, col, functions as F
from datafusion.expr import Window, WindowFrame

def main() -> None:
    ctx = dfn.SessionContext()
    df = ctx.from_pydict(
        {"any_row": list(range(10))},
    )
    df = df.select(
        "any_row",
        lit(1).alias("ones"),
    )
    df = df.select(
        "any_row",
        F.sum(col("ones"))\
            .over(Window(window_frame=WindowFrame("rows", None, 0), order_by=col("any_row").sort(ascending=True))) \
            .alias("forward_row_sum"),
        F.sum(col("ones"))\
            .over(Window(window_frame=WindowFrame("rows", None, 0), order_by=col("any_row").sort(ascending=False))) \
            .alias("reverse_row_sum"),
    )
    df.collect()
    df.select(
        F.first_value(col("forward_row_sum"), order_by=col("any_row"))
    ).collect()

    df.select(
        F.last_value(col("reverse_row_sum"), filter=col("reverse_row_sum") >= 5, order_by=col("any_row").sort(ascending=True))
    ).collect()

if __name__ == "__main__":
    main()
Traceback (most recent call last):
  File "/Users/nick/repos/bug.py", line 39, in <module>
    main()
    ~~~~^^
  File "/Users/nick/repos/bug.py", line 26, in main
    ).collect()
      ~~~~~~~^^
  File "/Users/nick/repos/.venv/lib/python3.13/site-packages/datafusion/dataframe.py", line 681, in collect
    return self.df.collect()
           ~~~~~~~~~~~~~~~^^
Exception: DataFusion error: NotImplemented("Physical plan does not support logical expression AggregateFunction(AggregateFunction { func: AggregateUDF { inner: FirstValue { name: \"first_value\", signature: Signature { type_signature: Any(1), volatility: Immutable }, accumulator: \"<FUNC>\" } }, params: AggregateFunctionParams { args: [Column(Column { relation: None, name: \"sum(ones) ORDER BY [c19e557aec20e49b985bb070e969ba68f.any_row ASC NULLS FIRST] ROWS BETWEEN UNBOUNDED PRECEDING AND 0 FOLLOWING\" })], distinct: false, filter: None, order_by: [Sort { expr: Column(Column { relation: Some(Bare { table: \"c19e557aec20e49b985bb070e969ba68f\" }), name: \"any_row\" }), asc: true, nulls_first: true }], null_treatment: Some(RespectNulls) } })")

Expected behavior
A clear and concise description of what you expected to happen.
That I get the first (or last) value.

Additional context
Add any other context about the problem here.

import datafusion as dfn
>>> dfn.__version__
'50.1.0'

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions