Implementing Filter Clause for aggregates #1309

pdet · 2021-01-20T06:23:02Z

This PR gives a (potentially over-complicated binding process, please check that) implementation of the filter clause for aggregates #896

Mytherin

Thanks for the PR! It looks good, except indeed I think the binding process is a bit overcomplicated. I have added some suggestions on how to simplify it.

src/execution/aggregate_hashtable.cpp

src/execution/operator/aggregate/physical_hash_aggregate.cpp

src/execution/operator/aggregate/physical_perfecthash_aggregate.cpp

src/execution/operator/aggregate/physical_hash_aggregate.cpp

src/execution/perfect_aggregate_hashtable.cpp

src/execution/physical_plan/plan_aggregate.cpp

src/include/duckdb/execution/operator/aggregate/physical_hash_aggregate.hpp

src/optimizer/remove_unused_columns.cpp

Mytherin · 2021-01-21T10:52:43Z

test/sql/filter/test_filter_clause.test

@@ -0,0 +1,273 @@
+# name: test/sql/filter/test_filter_clause.test
+# description: Test aggregation with filter clause


I would like some more test cases:

Query with many different filter clauses (e.g. 5 aggregates, 5 different filters)

Filter with some more complex aggregates: COVAR_POP (multiple input columns), STRING_AGG (strings) and ARRAY_AGG (lists)

DISTINCT aggregates

Also; looking at these tests I would not be surprised if all of them use the perfect hash aggregate. You can force the regular hash aggregate to be used by using very spaced out groups (e.g. [0, 10000000, 20000000, ....]).

Mytherin · 2021-01-26T14:25:39Z

Thanks, this looks great now!

hannes · 2021-05-06T20:51:54Z

So this ht that's mapping the bound_ref_expr.index of the filter around in the physical aggregates is blowing up horribly with many threads and small vectors. Another reason to default to multithreading.

hannes · 2021-05-06T20:52:18Z

src/execution/operator/aggregate/physical_hash_aggregate.cpp

+			auto &bound_ref_expr = (BoundReferenceExpression &)*aggr.filter;
+			auto it = ht.find(aggr.filter.get());
+			if (it == ht.end()) {
+				aggregate_input_chunk.data[aggregate_input_idx].Reference(input.data[bound_ref_expr.index]);


hannes · 2021-05-06T20:52:45Z

Will fix just thought I'd note for future reference

pdet added 9 commits January 15, 2021 15:33

parser and planner of filter clause

1c504f5

More on filter clause

a48492e

Almost complete filter clause on simple agg

47ecb66

More on filter clause

9dc64b3

Perfect hash and regular hash potentially working with filters

7a4a4d1

Looks like all aggregates with filter are working

4e0c991

Fixing build

68ca3e6

Fix build

637cbdb

Cleanup of filter clause

fa4e49a

Mytherin reviewed Jan 21, 2021

View reviewed changes

pdet added 2 commits January 21, 2021 22:55

PR requests + push down filter to projection

777eac7

Ups, accidentally deleted this line

d8eeb2d

Mytherin merged commit dd73011 into duckdb:master Jan 26, 2021

pdet deleted the filterclause branch March 10, 2021 18:34

hannes reviewed May 6, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing Filter Clause for aggregates #1309

Implementing Filter Clause for aggregates #1309

pdet commented Jan 20, 2021

Mytherin left a comment

Mytherin Jan 21, 2021

Mytherin commented Jan 26, 2021

hannes commented May 6, 2021

hannes May 6, 2021

hannes commented May 6, 2021

		@@ -0,0 +1,273 @@
		# name: test/sql/filter/test_filter_clause.test
		# description: Test aggregation with filter clause

Implementing Filter Clause for aggregates #1309

Implementing Filter Clause for aggregates #1309

Conversation

pdet commented Jan 20, 2021

Mytherin left a comment

Choose a reason for hiding this comment

Mytherin Jan 21, 2021

Choose a reason for hiding this comment

Mytherin commented Jan 26, 2021

hannes commented May 6, 2021

hannes May 6, 2021

Choose a reason for hiding this comment

hannes commented May 6, 2021