[CALCITE-6214] Remove DISTINCT in aggregate function if field is unique #3641

JiajunBernoulli · 2024-01-21T08:21:40Z

Here is JIRA ticket: https://issues.apache.org/jira/browse/CALCITE-6214

mihaibudiu · 2024-01-21T18:52:50Z

core/src/main/java/org/apache/calcite/rel/rules/AggregateRemoveDistinctRule.java

+import java.util.List;
+
+/**
+ * Planner rule that removes a distinct in count for {@link Aggregate}.


@julianhyde asked in JIRA to handle other aggregates too. Why just count?

I overlooked it before.

Now we can handle other aggregates.

mihaibudiu · 2024-01-21T18:58:15Z

core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java

+  /** Test case for
+   * <a href="https://issues.apache.org/jira/browse/CALCITE-6214">[CALCITE-6214]
+   * Remove `DISTINCT` in `COUNT` if field is unique</a>. */
+  @Test void testAggregateDistinctRemove3() {


If you handle other aggregates you should probably add tests like SELECT COUNT(x), SUM(x) FROM SELECT DISTINCT ...

Yes, now we can handle other aggregate functions after new commits.

julianhyde · 2024-01-21T19:16:10Z

core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java

@@ -6562,6 +6562,67 @@ private HepProgram getTransitiveProgram() {
        .check();
  }

+  /** Test case for
+   * <a href="https://issues.apache.org/jira/browse/CALCITE-6214">[CALCITE-6214]
+   * Remove `DISTINCT` in `COUNT` if field is unique</a>. */


Don't use backticks in jira summary or javadoc. They will be rendered as backticks.

Removed them.

julianhyde · 2024-01-21T19:23:36Z

core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java

+   * <a href="https://issues.apache.org/jira/browse/CALCITE-6214">[CALCITE-6214]
+   * Remove `DISTINCT` in `COUNT` if field is unique</a>. */
+  @Test void testAggregateDistinctRemove2() {
+    final String sql = ""


Can you add at least one test where the outer query has a GROUP BY? The following query should benefit from the simplification:

SELECT deptno, COUNT(DISTINCT sal), FROM ( SELECT DISTINCT deptno, sal FROM emp) GROUP BY deptno

Note that sal is not distinct but it is distinct for each deptno (because (deptno, sal) is a key).

mihaibudiu

Another question I have is whether this should be as you wrote it in RelBuilder, or it should be a rewrite rule in the optimizer. It is a matter of design choice rather than of correctness.

mihaibudiu · 2024-02-06T19:13:41Z

core/src/main/java/org/apache/calcite/tools/RelBuilder.java

@@ -2525,6 +2529,29 @@ private RelBuilder aggregate_(GroupKeyImpl groupKey,
    return project(projects.transform((i, name) -> aliasMaybe(field(i), name)));
  }

+  /**
+   * Removed redundant distinct if an input is already unique.


I would document that this is specifically for aggregates.
And perhaps a better function name would be removeRedundantAggregateDistinct.

Renamed to removeRedundantAggregateDistinct.

mihaibudiu · 2024-02-06T19:15:19Z

core/src/main/java/org/apache/calcite/tools/RelBuilder.java

+    /** Whether to save the distinct if we know that the input is
+     * already unique; default true. */
+    @Value.Default
+    default boolean redundantDistinct() {


This flag is a little unintuitive, since it inhibits the optimization rather than enabling it.
All the other similar flags are in the opposite way.

Good catch.

I changed default value, now we can set true to optimize. (Default is false)

mihaibudiu · 2024-02-06T19:16:51Z

core/src/test/resources/org/apache/calcite/test/SqlToRelConverterTest.xml

+]]>
+    </Resource>
+  </TestCase>
+  <TestCase name="testRemoveDistinctIfUnique">


Where is the corresponding test?

Forgot to delete.

Removed it.

mihaibudiu · 2024-02-06T19:23:38Z

core/src/test/resources/org/apache/calcite/test/SqlToRelConverterTest.xml

+    </Resource>
+    <Resource name="plan">
+      <![CDATA[
+LogicalAggregate(group=[{}], EXPR$0=[COUNT($0)])


Why is the DISTINCT removed from this case?

We can know empno is primary key by using RelMetadataQuery#areColumnsUnique.

Here is metadata:

calcite/testkit/src/main/java/org/apache/calcite/test/catalog/MockCatalogReaderSimple.java

Line 80 in f837ffa

empTable.addColumn("EMPNO", fixture.intType, true);

There are so many empno references in the codebase that I couldn't figure out that this is a primary key.
There are also multiple definitions of this column in multiple files, and I didn't know which one is being used here.
Can you please add a comment explaining this in the code?

JiajunBernoulli · 2024-02-08T02:52:06Z

Another question I have is whether this should be as you wrote it in RelBuilder, or it should be a rewrite rule in the optimizer. It is a matter of design choice rather than of correctness.

Thank you for your review.

RelBuilder optimization can reuse subexpressions.

Using RelBuilder

LogicalProject(DEPTNO=[$0], CDS=[$1], CS=[$2], SDS=[$3], SS=[$3]) -- SDS is same as SS
  LogicalAggregate(group=[{0}], CDS=[COUNT($1)], CS=[COUNT()], SDS=[SUM($1)])
    LogicalAggregate(group=[{0, 1}])
      LogicalProject(DEPTNO=[$7], SAL=[$5])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])

Using Rule

LogicalProject(DEPTNO=[$0], CDS=[$1], CS=[$2], SDS=[$3], SS=[$4]) -- SDS is same as SS
  LogicalAggregate(group=[{0}], CDS=[COUNT($1)], CS=[COUNT()], SDS=[SUM($1)], SS=[SUM($1)])
    LogicalAggregate(group=[{0, 1}])
      LogicalProject(DEPTNO=[$7], SAL=[$5])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])

We need other rules to remove same function.

RelBuilder is easier to use than Rule.

RelBuilder: withRedundantDistinct(flag) to enable or disable.
Rule: Add or remove rule in programs.

sonarcloud · 2024-02-11T06:02:08Z

Quality Gate passed

Issues
3 New issues

Measures
0 Security Hotspots
96.3% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

mihaibudiu reviewed Jan 21, 2024

View reviewed changes

julianhyde reviewed Jan 21, 2024

View reviewed changes

JiajunBernoulli changed the title ~~[CALCITE-6214] Remove DISTINCT in COUNT if field is unique~~ [CALCITE-6214] Remove DISTINCT in aggregate function if field is unique Jan 22, 2024

JiajunBernoulli force-pushed the remove-distinct-if-uniq branch 3 times, most recently from 9a9584e to a4419bb Compare January 28, 2024 09:50

JiajunBernoulli requested a review from mihaibudiu February 6, 2024 11:28

mihaibudiu reviewed Feb 6, 2024

View reviewed changes

mihaibudiu approved these changes Feb 8, 2024

View reviewed changes

[CALCITE-6214] Remove DISTINCT in aggregate function if field is unique

8f33be6

JiajunBernoulli force-pushed the remove-distinct-if-uniq branch from 6cbf205 to 8f33be6 Compare February 11, 2024 05:43

JiajunBernoulli merged commit ec0dc3c into apache:main Feb 12, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CALCITE-6214] Remove DISTINCT in aggregate function if field is unique #3641

[CALCITE-6214] Remove DISTINCT in aggregate function if field is unique #3641

JiajunBernoulli commented Jan 21, 2024 •

edited

mihaibudiu Jan 21, 2024

JiajunBernoulli Jan 28, 2024

mihaibudiu Jan 21, 2024

JiajunBernoulli Jan 28, 2024

julianhyde Jan 21, 2024

JiajunBernoulli Jan 28, 2024

julianhyde Jan 21, 2024

JiajunBernoulli Jan 28, 2024

mihaibudiu left a comment

mihaibudiu Feb 6, 2024

JiajunBernoulli Feb 8, 2024

mihaibudiu Feb 6, 2024

JiajunBernoulli Feb 8, 2024

mihaibudiu Feb 6, 2024

JiajunBernoulli Feb 8, 2024

mihaibudiu Feb 6, 2024

JiajunBernoulli Feb 8, 2024

mihaibudiu Feb 8, 2024

JiajunBernoulli Feb 8, 2024

JiajunBernoulli commented Feb 8, 2024

sonarcloud bot commented Feb 11, 2024

[CALCITE-6214] Remove DISTINCT in aggregate function if field is unique #3641

[CALCITE-6214] Remove DISTINCT in aggregate function if field is unique #3641

Conversation

JiajunBernoulli commented Jan 21, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mihaibudiu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JiajunBernoulli commented Feb 8, 2024

sonarcloud bot commented Feb 11, 2024

Quality Gate passed

JiajunBernoulli commented Jan 21, 2024 •

edited