Skip to content

[CALCITE-7266] Optimize the "well-known count bug" correction#4614

Merged
rubenada merged 1 commit intoapache:mainfrom
rubenada:CALCITE-7266
Nov 12, 2025
Merged

[CALCITE-7266] Optimize the "well-known count bug" correction#4614
rubenada merged 1 commit intoapache:mainfrom
rubenada:CALCITE-7266

Conversation

@rubenada
Copy link
Contributor

@rubenada rubenada commented Nov 4, 2025

@rubenada rubenada changed the title [CALCITE-7266] Optimize the "well-known count bug" fix [CALCITE-7266] Optimize the "well-known count bug" correction Nov 4, 2025
@suibianwanwank
Copy link
Contributor

I believe there's way for optimization here, but merely considering LEFT and COUNT doesn't seem sufficient in my view.

Test in sub-query.iq:

SELECT deptno, (SELECT CASE WHEN SUM(sal) > 10 then 'VIP' else 'Regular' END expr
                   FROM emp e
                   WHERE d.deptno = e.deptno) a
FROM dept d;

!ok

Currently, this query will be decorrelated by hepPlanner. Suppose we remove HepPlanner.

--- a/core/src/main/java/org/apache/calcite/sql2rel/RelDecorrelator.java
+++ b/core/src/main/java/org/apache/calcite/sql2rel/RelDecorrelator.java
@@ -247,9 +247,7 @@ public static RelNode decorrelateQuery(RelNode rootRel,
         new RelDecorrelator(corelMap,
             cluster.getPlanner().getContext(), relBuilder);

-    RelNode newRootRel = decorrelationRules == null
-        ? decorrelator.removeCorrelationViaRule(rootRel)
-        : decorrelator.removeCorrelationViaRule(rootRel, decorrelationRules);
+    RelNode newRootRel = rootRel;

     if (SQL2REL_LOGGER.isDebugEnabled()) {
       SQL2REL_LOGGER.debug(
@@ -324,7 +322,7 @@ protected RelNode decorrelate(RelNode root) {
     HepPlanner planner = createPlanner(program);

     planner.setRoot(root);
-    root = planner.findBestExp();
+//    root = planner.findBestExp();
     if (SQL2REL_LOGGER.isDebugEnabled()) {
       SQL2REL_LOGGER.debug("Plan before extracting correlated computations:\n"
           + RelOptUtil.toString(root));

After this PR, the decorrelator will return incorrect results.

DEPTNO | A
--------+---------
      10 | VIP
      20 | VIP
      30 | VIP
      40 | NULL

@rubenada
Copy link
Contributor Author

rubenada commented Nov 4, 2025

Thanks for taking a look @suibianwanwank . Maybe I'm doing something wrong, but I'm getting the same results for your sample query with: a) RelDecorrelator disabled, b) RelDecorrelator enabled with current code, c) RelDecorrelator with this PR code:

+--------+---------+
| DEPTNO | A       |
+--------+---------+
|     10 | VIP     |
|     20 | VIP     |
|     30 | VIP     |
|     40 | Regular |
+--------+---------+

PS: I have pushed the test, just to double-check

@suibianwanwank
Copy link
Contributor

As mentioned above, this happens because the RBO decorrelates such patterns in advance. However, I believe the RelDecorrelator framework itself should ensure correctness, rather than relying on rules to pre-handle certain bad cases. After all, pattern-based approaches in decorrelation are inherently limited in what they can cover.

@rubenada
Copy link
Contributor Author

rubenada commented Nov 5, 2025

Ok, I understand now what you mean @suibianwanwank , there's indeed a regression with the proposed patch (observable only if we deactivate the decorrelation via rules step).
Thanks for the feedback @iwanttobepowerful .
It's clear that this idea need some rework.... WIP

BTW It seems there was a similar PR on Spark apache/spark#43341 👀 UPDATE: after a closer look, it seems that PR on Spark was to avoid calling the extra join in case of several Aggregates on the same subtree, whereas the current PR idea is looking at the possibility of avoiding it in case of LeftCorrelate.

@rubenada
Copy link
Contributor Author

rubenada commented Nov 5, 2025

@iwanttobepowerful I haven't looked in detail, but it seems that Spark uses a slightly different approach to deal with the count bug (at least in some cases), with the usage of this "alwaysTrue" value. Notice that some of the manipulations done by Spark might be done (more or less) in Calcite not by the RelDecorrelator itself, but by certain auxiliary rules called via HepPlanner inside the RelDecorrelator, so in Calcite this process is intermingled among rule transformations + the pure decorrelate algorithm itself (which might be not ideal, as stated by @suibianwanwank above).

I'm not entirely sure, but I have the impression that the "LEFT" approach might be valid if the Aggregate result is not further manipulated (as in the counter-example proposed by @suibianwanwank ), i.e. we could avoid the rewrite if the Correlate is LEFT and the Aggregate is directly its right child (this seems to fix the counter-example).
I've just pushed a new commit with this idea.

// Otherwise call except if this is a LEFT Correlate with the Aggregate being its RHS,
// in that case NULL is effectively the same as empty (which promotes NULL on the RHS)
(!parentPropagatesNullValues
&& requireNonNull(frameStack.peek()).left.getJoinType() != JoinRelType.LEFT))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it could be optimized this way🤔 in decorrelateRel(Join):

final Frame rightFrame = getInvoke(oldRight, true, rel, parentPropagatesNullValues);

//to:

final Frame rightFrame = getInvoke(oldRight, true, rel, true);

Copy link
Contributor Author

@rubenada rubenada Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean on decorrelateRel(Correlate) ?
That actually seems to do the trick in a much cleaner way. It maintains the plans adjusted in the PR, does not fail on the counter-example that you proposed on my initial commit, and it also works as expected on my downstream project's tests if I apply it.
I've pushed this change, cleaning up the previous modifications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also add the "counter-example" as a unit test in RelDecorrelatorTest, but it would require some minor adjustments in RelDecorrelator to allow running the decorrelation algorithm without any type of rule prior. It's manageable, adds more flexibility (and can be done in a way to keep things backward-compatible)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 LGTM, An additional thought is whether an inner join would also work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test pushed

@rubenada rubenada marked this pull request as ready for review November 11, 2025 08:06
@rubenada
Copy link
Contributor Author

@suibianwanwank @iwanttobepowerful are there other remarks for this change? Shall I squash commits to prepare the merge?

Copy link
Contributor

@suibianwanwank suibianwanwank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rubenada rubenada added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Nov 11, 2025
@sonarqubecloud
Copy link

@rubenada rubenada merged commit 0f148c7 into apache:main Nov 12, 2025
21 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LGTM-will-merge-soon Overall PR looks OK. Only minor things left.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments