-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-6214] Remove DISTINCT in aggregate function if field is unique #3641
[CALCITE-6214] Remove DISTINCT in aggregate function if field is unique #3641
Conversation
import java.util.List; | ||
|
||
/** | ||
* Planner rule that removes a distinct in count for {@link Aggregate}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@julianhyde asked in JIRA to handle other aggregates too. Why just count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overlooked it before.
Now we can handle other aggregates.
/** Test case for | ||
* <a href="https://issues.apache.org/jira/browse/CALCITE-6214">[CALCITE-6214] | ||
* Remove `DISTINCT` in `COUNT` if field is unique</a>. */ | ||
@Test void testAggregateDistinctRemove3() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you handle other aggregates you should probably add tests like SELECT COUNT(x), SUM(x) FROM SELECT DISTINCT ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, now we can handle other aggregate functions after new commits.
@@ -6562,6 +6562,67 @@ private HepProgram getTransitiveProgram() { | |||
.check(); | |||
} | |||
|
|||
/** Test case for | |||
* <a href="https://issues.apache.org/jira/browse/CALCITE-6214">[CALCITE-6214] | |||
* Remove `DISTINCT` in `COUNT` if field is unique</a>. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use backticks in jira summary or javadoc. They will be rendered as backticks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed them.
* <a href="https://issues.apache.org/jira/browse/CALCITE-6214">[CALCITE-6214] | ||
* Remove `DISTINCT` in `COUNT` if field is unique</a>. */ | ||
@Test void testAggregateDistinctRemove2() { | ||
final String sql = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add at least one test where the outer query has a GROUP BY
? The following query should benefit from the simplification:
SELECT deptno, COUNT(DISTINCT sal),
FROM (
SELECT DISTINCT deptno, sal
FROM emp)
GROUP BY deptno
Note that sal
is not distinct but it is distinct for each deptno
(because (deptno, sal)
is a key).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added it.
DISTINCT
in COUNT
if field is unique9a9584e
to
a4419bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question I have is whether this should be as you wrote it in RelBuilder, or it should be a rewrite rule in the optimizer. It is a matter of design choice rather than of correctness.
@@ -2525,6 +2529,29 @@ private RelBuilder aggregate_(GroupKeyImpl groupKey, | |||
return project(projects.transform((i, name) -> aliasMaybe(field(i), name))); | |||
} | |||
|
|||
/** | |||
* Removed redundant distinct if an input is already unique. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would document that this is specifically for aggregates.
And perhaps a better function name would be removeRedundantAggregateDistinct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to removeRedundantAggregateDistinct
.
/** Whether to save the distinct if we know that the input is | ||
* already unique; default true. */ | ||
@Value.Default | ||
default boolean redundantDistinct() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flag is a little unintuitive, since it inhibits the optimization rather than enabling it.
All the other similar flags are in the opposite way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
I changed default value, now we can set true
to optimize. (Default is false
)
]]> | ||
</Resource> | ||
</TestCase> | ||
<TestCase name="testRemoveDistinctIfUnique"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the corresponding test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to delete.
Removed it.
</Resource> | ||
<Resource name="plan"> | ||
<![CDATA[ | ||
LogicalAggregate(group=[{}], EXPR$0=[COUNT($0)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the DISTINCT removed from this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can know empno
is primary key by using RelMetadataQuery#areColumnsUnique
.
Here is metadata:
calcite/testkit/src/main/java/org/apache/calcite/test/catalog/MockCatalogReaderSimple.java
Line 80 in f837ffa
empTable.addColumn("EMPNO", fixture.intType, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are so many empno references in the codebase that I couldn't figure out that this is a primary key.
There are also multiple definitions of this column in multiple files, and I didn't know which one is being used here.
Can you please add a comment explaining this in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
Thank you for your review.
We need other rules to remove same function.
|
6cbf205
to
8f33be6
Compare
|
Here is JIRA ticket: https://issues.apache.org/jira/browse/CALCITE-6214