-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support projection after sorting in SQL #5788
Conversation
final Sort sort = call.rel(1); | ||
final Aggregate aggregate = call.rel(2); | ||
|
||
return aggregate != null && sort != null && project != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these can be null. So it should be safe to remove the entire matches
method, in which case the rule will fire for any project -> sort -> aggregate -> druidrel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
@@ -227,6 +233,42 @@ public void onMatch(final RelOptRuleCall call) | |||
} | |||
}; | |||
|
|||
public static RelOptRule AGGREGATE_SORT_PROJECT = new DruidOuterQueryRule( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you create a test that hits this rule? It should be a nested groupby where the outer query has the sort + project combo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test.
} | ||
} | ||
|
||
private static RowOrderAndPostAggregations computePostAggregations( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about calling this method simply create
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you tell me your thoughts in more detail? create
sounds too broad and less intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that RowOrderAndPostAggregations.create
is still pretty obvious as to what it does.
} | ||
} | ||
|
||
private static class RowOrderAndPostAggregations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better name would be ProjectRowOrderAndPostAggregations
. It's longer but it has the word "Project" in there, and this is something that really is meant to represent a projection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed.
|
||
SortProject( | ||
RowSignature inputRowSignature, | ||
List<Aggregation> postAggregators, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these are meant to only be post-aggregators, I think it'd be better to pass in a List<PostAggregator>
. The idea of an Aggregation
is that it can bundle together aggregators and post-aggregators. But here, we never want regular aggregators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Changed.
private static class RowOrderAndPostAggregations | ||
{ | ||
private final List<String> rowOrder; | ||
private final List<Aggregation> postAggregations; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is meant to only be PostAggregators, why not have this be a List<PostAggregator>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
final Set<String> seen = new HashSet<>(); | ||
inputRowSignature.getRowOrder().forEach(field -> { | ||
if (!seen.add(field)) { | ||
throw new ISE("Duplicate field name: %s", field); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this anti-collision verification is necessary. It may even be a bug.
It's checking that the input row signature has no duplicate output field names, but, it might (if the input is select a, a from tbl group by a, a
then the input row order will be something like ["d0","d0"]
). And that would be okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Calcite can remove duplicate columns automatically. Please check the testProjectAfterSort3()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, okay, let's leave it in then and if it's too aggressive we can remove it later.
} else { | ||
if (sortProject != null) { | ||
for (Aggregation aggregation : sortProject.getPostAggregators()) { | ||
retVal.addAll(aggregation.getVirtualColumns()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will never be any virtual columns added by post-aggregators (virtual columns can only be added by aggregators that read the input data).
This code would be removed naturally if getPostAggregators
was changed to return List<PostAggregator>
rather than List<Aggregation>
, so that's a point in favor of changing those types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @jihoonson!
* Add sort project * add more test * address comments
* Add sort project * add more test * address comments
In SQL, an additional projection can be added after sorting, which means, the projections after sorting can be different from the projections before sorting.
An example SQL is
This change is