Skip to content

Conversation

@mustafasrepo
Copy link
Contributor

@mustafasrepo mustafasrepo commented May 25, 2023

Which issue does this PR close?

Closes #6444.

Rationale for this change

What changes are included in this PR?

This PR adds support for FIRST_VALUE, LAST_VALUE ordering sensitive aggregation functions. With the changes in this PR we now can run query below

SELECT a, FIRST_VALUE(c ORDER BY b DESC) as first_c
FROM table
GROUP BY a

This query will return the first c value, according to the reverse b order, for every a group.

Are these changes tested?

Yes, new tests are added to the groupby.slt file for new aggregate functions.

Are there any user-facing changes?

mustafasrepo and others added 30 commits May 3, 2023 14:46
# Conflicts:
#	datafusion/core/tests/sqllogictests/test_files/aggregate.slt
# Conflicts:
#	datafusion/core/src/physical_plan/planner.rs
#	datafusion/core/tests/sqllogictests/test_files/aggregate.slt
#	datafusion/expr/src/expr.rs
#	datafusion/expr/src/tree_node/expr.rs
#	datafusion/expr/src/udaf.rs
#	datafusion/optimizer/src/analyzer/type_coercion.rs
#	datafusion/optimizer/src/common_subexpr_eliminate.rs
#	datafusion/proto/src/logical_plan/from_proto.rs
#	datafusion/proto/src/logical_plan/mod.rs
#	datafusion/proto/src/logical_plan/to_proto.rs
#	datafusion/sql/src/expr/function.rs
#	datafusion/sql/src/utils.rs
…rst_last_aggregate

# Conflicts:
#	datafusion/expr/Cargo.toml
@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sql SQL Planner sqllogictest SQL Logic Tests (.slt) labels May 25, 2023
@ozankabak
Copy link
Contributor

ozankabak commented May 25, 2023

Note that this change makes FIRST_VALUE and LAST_VALUE functions available in an aggregation context, which were previously only available as window functions. With this change, one can choose first/last values within a group according to a given ordering. In aggregation contexts, these functions are sometimes called FIRST and LAST in other query engines/DBs, but we chose to use the same with their window function counterparts (to be consistent with the case of SUM, COUNT etc.)

Having said that, there are few minor issues here that we will fix shortly. I think the example in the PR body should read something like:

SELECT a, FIRST_VALUE(c ORDER BY b DESC) as first_c
FROM table
GROUP BY a

This will return the first c value, according to the reverse b order, for every a group.

Also some tests are failing. We will fix the tests, make the PR body clearer and possibly add another test that looks like this.

@alamb
Copy link
Contributor

alamb commented May 25, 2023

Marking as draft while the PR is worked on

@alamb alamb marked this pull request as draft May 25, 2023 20:03
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
@mustafasrepo mustafasrepo marked this pull request as ready for review May 26, 2023 07:22
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
"+----------------------+",
"| 10 |",
"+----------------------+",
"+-----------------------+",
Copy link
Contributor Author

@mustafasrepo mustafasrepo May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of this PR, I have added convert_camel_to_upper_snake util for displaying Aggregate functions. Their display name is automatically, calculated from struct name. Hence display name of some of the existing aggregators is changed(such as APPROXMEDIAN became APPROX_MEDIAN, ARRAYAGG became ARRAY_AGG, etc).

@mustafasrepo
Copy link
Contributor Author

Marking as draft while the PR is worked on

I have resolved the CI issue, also I have updated Pr body to reflect intent better. This PR is ready for further review.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mustafasrepo -- I think the feature looks great and really nicely coded 👌 .

I think a different approach for display name that is not so tightly bound to the Rust struct name would be better, but that is a stylistic opinion and something we can fix in a follow on PR. I left some more specific suggestions inline

"+-------------------------------------------+",
"| 10 |",
"+-------------------------------------------+",
"+---------------------------------------------+",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is nice -- it makes the aggregates consistent with the (very) recent change from @2010YOUY01 for scalar functions: #6448

@alamb alamb merged commit f54f514 into apache:main May 26, 2023
@alamb
Copy link
Contributor

alamb commented May 26, 2023

Thanks again @mustafasrepo !

BoolOr,
}

impl AggregateFunction {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@mustafasrepo mustafasrepo deleted the feature/first_last_aggregate2 branch May 29, 2023 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for FIRST_VALUE, LAST_VALUE aggregators

4 participants