Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Filter Clause for aggregates #1309

Merged
merged 11 commits into from Jan 26, 2021
Merged

Conversation

pdet
Copy link
Member

@pdet pdet commented Jan 20, 2021

This PR gives a (potentially over-complicated binding process, please check that) implementation of the filter clause for aggregates #896

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! It looks good, except indeed I think the binding process is a bit overcomplicated. I have added some suggestions on how to simplify it.

src/execution/aggregate_hashtable.cpp Show resolved Hide resolved
src/execution/perfect_aggregate_hashtable.cpp Outdated Show resolved Hide resolved
src/execution/physical_plan/plan_aggregate.cpp Outdated Show resolved Hide resolved
src/optimizer/remove_unused_columns.cpp Outdated Show resolved Hide resolved
@@ -0,0 +1,273 @@
# name: test/sql/filter/test_filter_clause.test
# description: Test aggregation with filter clause
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like some more test cases:

  • Query with many different filter clauses (e.g. 5 aggregates, 5 different filters)
  • Filter with some more complex aggregates: COVAR_POP (multiple input columns), STRING_AGG (strings) and ARRAY_AGG (lists)
  • DISTINCT aggregates

Also; looking at these tests I would not be surprised if all of them use the perfect hash aggregate. You can force the regular hash aggregate to be used by using very spaced out groups (e.g. [0, 10000000, 20000000, ....]).

@Mytherin
Copy link
Collaborator

Thanks, this looks great now!

@Mytherin Mytherin merged commit dd73011 into duckdb:master Jan 26, 2021
@pdet pdet deleted the filterclause branch March 10, 2021 18:34
@hannes
Copy link
Member

hannes commented May 6, 2021

So this ht that's mapping the bound_ref_expr.index of the filter around in the physical aggregates is blowing up horribly with many threads and small vectors. Another reason to default to multithreading.

auto &bound_ref_expr = (BoundReferenceExpression &)*aggr.filter;
auto it = ht.find(aggr.filter.get());
if (it == ht.end()) {
aggregate_input_chunk.data[aggregate_input_idx].Reference(input.data[bound_ref_expr.index]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thing

@hannes
Copy link
Member

hannes commented May 6, 2021

Will fix just thought I'd note for future reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants