Skip to content

Add any_value aggregate function #22799

@haohuaijin

Description

@haohuaijin

Is your feature request related to a problem or challenge?

A common pattern is to GROUP BY a unique id and then get one arbitrary value of another column per group (for example, any tag value for each id). Which value comes back doesn't matter, so ordering is irrelevant.

DataFusion has no function for this today, and the workarounds are heavier than needed:

  • first_value(x) / last_value(x) can do this, but they do extra work any_value doesn't need.
  • min(x) / max(x) are often used instead, but they don't really mean "any value", and they still compare every row of every batch.

What's missing is a function whose state is just "one value, captured once", so that once a group has a value, later rows for that group cost nothing.

ANY_VALUE is also a standard function (SQL:2023) and is already supported by spark, duckdb, postgres, and others.

Describe the solution you'd like

Add a built-in any_value(expression) aggregate that returns an arbitrary value from each group, ignoring NULLs (it returns NULL only when every value in the group is NULL).

Describe alternatives you've considered

No response

Additional context

ANY_VALUE is a standard aggregate (SQL:2023) and is widely supported, all returning an arbitrary non-NULL value from the group:

NULL handling varies slightly across engines; this proposal follows the ignore-NULLs behavior shared by the above (a NULL is returned only when every value in the group is NULL). RESPECT NULLS | IGNORE NULLS could be added later if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions