[CALCITE-2166] Cumulative cost of RelSubset.best RelNode is increased… #1440

xndai · 2019-09-04T17:57:38Z

… after calling RelSubset.propagateCostImprovements() for input RelNodes

It's possible that Subset's best cost increases when input subset's best is changed. In those cases, although input subset's cost is reduced, the row count can increase which causes the increase of non-cumulative cost of parent rel. As a result, the cost of parent rel can increase. If the parent rel happens to be the best rel of a given subset, we currently do nothing. And this would lead to the inconsistency of the rel node cost.

Fixing this by updating the best rel node cost if it's increased. Although this approach won't garantee an optimal plan, at least it makes sure the memo is consistent.

More details, please refer to JIRA - https://issues.apache.org/jira/browse/CALCITE-2166

… after calling RelSubset.propagateCostImprovements() for input RelNodes It's possible that Subset's best cost increases when input subset's best is changed. In those cases, although input subset's cost is reduced, the row count can increase which causes the increase of non-cumulative cost of parent rel. As a result, the cost of parent rel can increase. If the parent rel happens to be the best rel of a given subset, we currently do nothing. And this would lead to the inconsistency of the rel node cost. Fixing this by updating the best rel node cost if it's increased. Although this approach won't garantee an optimal plan, at least it makes sure the memo is consistent. More details, please refer to JIRA - https://issues.apache.org/jira/browse/CALCITE-2166

xndai · 2019-09-05T16:26:12Z

@vvysotskyi can you please take a look at this? The PR is opened based on our discussions in Jira.

vvysotskyi · 2019-09-05T16:32:52Z

@xndai, I'll try to review it by the end of this week.

vvysotskyi

@xndai, changes look good, thanks for the fix!

danny0405 · 2019-09-08T01:04:39Z

core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java

+          }
+
+          // Make sure bsetCost is accurate
+          RelOptCost bestCost = getCost(subset.best, subset.best.getCluster().getMetadataQuery());


should be bestCost

Will fix it.

danny0405 · 2019-09-08T03:15:45Z

core/src/main/java/org/apache/calcite/plan/volcano/RelSubset.java

+
+      // Best rel's cost is increased, we need to search for new best rel
+      if (rel == best && bestCost.isLt(cost)) {
+        updateBest = true;


Why we need a decision rel == best ? We should always update the best rel and cost if we found any. The problem here is that when and how this fix is triggered, can you give more explain ?

We only need to update best when its own cost is increased. If this is any other rel node and its cost is larger than current best, we simply ignore (same behavior as today). I think I have explained the case when this could happen. Please check the descriptions and JIRA. Let me know if you have further question. Thanks.

But in this for loop, you have replaced the best rel right ? This replaced rel may still not be the best one, i didn't find the meaningness to replace it, if you want to fix the cost val, just update the cost.

In what case do you think the replaced rel may still not be the best one?

The loop is to guarantee the new best would be the one with cheapest cost in current sub set. It could be a different node if the cost of the original best has increased. As I mentioned in the JIRA, this approach doesn't guarantee an optimal result but does make sure the memo is consistent. To get optimal result, we either need to make sure RelSubset row count doesn't change (it shouldn't change in theory), or we have multiple best node candidates in a subset. But the change would be too drastic and the runtime overhead is non trivial.

hsyuan

+1

… after calling RelSubset.propagateCostImprovements() for input RelNodes (Xiening Dai) It's possible that Subset's best cost increases when input subset's best is changed. In those cases, although input subset's cost is reduced, the row count can increase which causes the increase of non-cumulative cost of parent rel. As a result, the cost of parent rel can increase. If the parent rel happens to be the best rel of a given subset, we currently do nothing. And this would lead to the inconsistency of the rel node cost. Fixing this by updating the best rel node cost if it's increased. Although this approach won't garantee an optimal plan, at least it makes sure the memo is consistent. More details, please refer to JIRA - https://issues.apache.org/jira/browse/CALCITE-2166 Close apache#1440

… after calling RelSubset.propagateCostImprovements() for input RelNodes (Xiening Dai) It's possible that Subset's best cost increases when input subset's best is changed. In those cases, although input subset's cost is reduced, the row count can increase which causes the increase of non-cumulative cost of parent rel. As a result, the cost of parent rel can increase. If the parent rel happens to be the best rel of a given subset, we currently do nothing. And this would lead to the inconsistency of the rel node cost. Fixing this by updating the best rel node cost if it's increased. Although this approach won't garantee an optimal plan, at least it makes sure the memo is consistent. More details, please refer to JIRA - https://issues.apache.org/jira/browse/CALCITE-2166 Close apache#1440 Change-Id: I3202652be58f4ecb3b9d0e5fe52cb0458a9f5c86

xndai added 2 commits September 4, 2019 10:44

Update failed test case

f723a73

xndai closed this Sep 4, 2019

xndai reopened this Sep 4, 2019

Search for new best rel when current best cost is increased

5155ff5

hsyuan closed this Sep 6, 2019

hsyuan reopened this Sep 6, 2019

vvysotskyi approved these changes Sep 7, 2019

View reviewed changes

danny0405 reviewed Sep 8, 2019

View reviewed changes

Fix typo in comment

6e1ce4c

hsyuan approved these changes Sep 13, 2019

View reviewed changes

hsyuan closed this in f95f74a Sep 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CALCITE-2166] Cumulative cost of RelSubset.best RelNode is increased… #1440

[CALCITE-2166] Cumulative cost of RelSubset.best RelNode is increased… #1440

xndai commented Sep 4, 2019

xndai commented Sep 5, 2019

vvysotskyi commented Sep 5, 2019

vvysotskyi left a comment

danny0405 Sep 8, 2019

xndai Sep 8, 2019

danny0405 Sep 8, 2019

xndai Sep 8, 2019

danny0405 Sep 9, 2019

hsyuan Sep 9, 2019

xndai Sep 9, 2019

hsyuan left a comment

[CALCITE-2166] Cumulative cost of RelSubset.best RelNode is increased… #1440

[CALCITE-2166] Cumulative cost of RelSubset.best RelNode is increased… #1440

Conversation

xndai commented Sep 4, 2019

xndai commented Sep 5, 2019

vvysotskyi commented Sep 5, 2019

vvysotskyi left a comment

Choose a reason for hiding this comment

danny0405 Sep 8, 2019

Choose a reason for hiding this comment

xndai Sep 8, 2019

Choose a reason for hiding this comment

danny0405 Sep 8, 2019

Choose a reason for hiding this comment

xndai Sep 8, 2019

Choose a reason for hiding this comment

danny0405 Sep 9, 2019

Choose a reason for hiding this comment

hsyuan Sep 9, 2019

Choose a reason for hiding this comment

xndai Sep 9, 2019

Choose a reason for hiding this comment

hsyuan left a comment

Choose a reason for hiding this comment