Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support projection after sorting in SQL #5788

Merged
merged 3 commits into from
Jun 11, 2018

Conversation

jihoonson
Copy link
Contributor

@jihoonson jihoonson commented May 20, 2018

In SQL, an additional projection can be added after sorting, which means, the projections after sorting can be different from the projections before sorting.

An example SQL is

SELECT dim1
FROM (
  SELECT dim1, dim2, count(*) cnt 
  FROM druid.foo 
  GROUP BY dim1, dim2 
  ORDER BY cnt
) t

This change is Reviewable

final Sort sort = call.rel(1);
final Aggregate aggregate = call.rel(2);

return aggregate != null && sort != null && project != null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these can be null. So it should be safe to remove the entire matches method, in which case the rule will fire for any project -> sort -> aggregate -> druidrel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -227,6 +233,42 @@ public void onMatch(final RelOptRuleCall call)
}
};

public static RelOptRule AGGREGATE_SORT_PROJECT = new DruidOuterQueryRule(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a test that hits this rule? It should be a nested groupby where the outer query has the sort + project combo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test.

}
}

private static RowOrderAndPostAggregations computePostAggregations(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about calling this method simply create?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you tell me your thoughts in more detail? create sounds too broad and less intuitive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that RowOrderAndPostAggregations.create is still pretty obvious as to what it does.

}
}

private static class RowOrderAndPostAggregations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better name would be ProjectRowOrderAndPostAggregations. It's longer but it has the word "Project" in there, and this is something that really is meant to represent a projection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.


SortProject(
RowSignature inputRowSignature,
List<Aggregation> postAggregators,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are meant to only be post-aggregators, I think it'd be better to pass in a List<PostAggregator>. The idea of an Aggregation is that it can bundle together aggregators and post-aggregators. But here, we never want regular aggregators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Changed.

private static class RowOrderAndPostAggregations
{
private final List<String> rowOrder;
private final List<Aggregation> postAggregations;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is meant to only be PostAggregators, why not have this be a List<PostAggregator>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

final Set<String> seen = new HashSet<>();
inputRowSignature.getRowOrder().forEach(field -> {
if (!seen.add(field)) {
throw new ISE("Duplicate field name: %s", field);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this anti-collision verification is necessary. It may even be a bug.

It's checking that the input row signature has no duplicate output field names, but, it might (if the input is select a, a from tbl group by a, a then the input row order will be something like ["d0","d0"]). And that would be okay.

Copy link
Contributor Author

@jihoonson jihoonson Jun 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Calcite can remove duplicate columns automatically. Please check the testProjectAfterSort3().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, okay, let's leave it in then and if it's too aggressive we can remove it later.

} else {
if (sortProject != null) {
for (Aggregation aggregation : sortProject.getPostAggregators()) {
retVal.addAll(aggregation.getVirtualColumns());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will never be any virtual columns added by post-aggregators (virtual columns can only be added by aggregators that read the input data).

This code would be removed naturally if getPostAggregators was changed to return List<PostAggregator> rather than List<Aggregation>, so that's a point in favor of changing those types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jihoonson!

@gianm gianm merged commit fe4d678 into apache:master Jun 11, 2018
gianm pushed a commit to implydata/druid-public that referenced this pull request Jun 18, 2018
* Add sort project

* add more test

* address comments
@jihoonson jihoonson added this to the 0.12.3 milestone Aug 22, 2018
gianm pushed a commit to gianm/druid that referenced this pull request Aug 25, 2018
* Add sort project

* add more test

* address comments
fjy pushed a commit that referenced this pull request Aug 26, 2018
* Add sort project

* add more test

* address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants