-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-2166] Cumulative cost of RelSubset.best RelNode is increased… #1440
Conversation
… after calling RelSubset.propagateCostImprovements() for input RelNodes It's possible that Subset's best cost increases when input subset's best is changed. In those cases, although input subset's cost is reduced, the row count can increase which causes the increase of non-cumulative cost of parent rel. As a result, the cost of parent rel can increase. If the parent rel happens to be the best rel of a given subset, we currently do nothing. And this would lead to the inconsistency of the rel node cost. Fixing this by updating the best rel node cost if it's increased. Although this approach won't garantee an optimal plan, at least it makes sure the memo is consistent. More details, please refer to JIRA - https://issues.apache.org/jira/browse/CALCITE-2166
@vvysotskyi can you please take a look at this? The PR is opened based on our discussions in Jira. |
@xndai, I'll try to review it by the end of this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xndai, changes look good, thanks for the fix!
} | ||
|
||
// Make sure bsetCost is accurate | ||
RelOptCost bestCost = getCost(subset.best, subset.best.getCluster().getMetadataQuery()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be bestCost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix it.
|
||
// Best rel's cost is increased, we need to search for new best rel | ||
if (rel == best && bestCost.isLt(cost)) { | ||
updateBest = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need a decision rel == best
? We should always update the best rel and cost if we found any. The problem here is that when and how this fix is triggered, can you give more explain ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need to update best when its own cost is increased. If this is any other rel node and its cost is larger than current best, we simply ignore (same behavior as today). I think I have explained the case when this could happen. Please check the descriptions and JIRA. Let me know if you have further question. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in this for loop, you have replaced the best rel
right ? This replaced rel may still not be the best one, i didn't find the meaningness to replace it, if you want to fix the cost val, just update the cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what case do you think the replaced rel may still not be the best one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop is to guarantee the new best would be the one with cheapest cost in current sub set. It could be a different node if the cost of the original best has increased. As I mentioned in the JIRA, this approach doesn't guarantee an optimal result but does make sure the memo is consistent. To get optimal result, we either need to make sure RelSubset row count doesn't change (it shouldn't change in theory), or we have multiple best node candidates in a subset. But the change would be too drastic and the runtime overhead is non trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
… after calling RelSubset.propagateCostImprovements() for input RelNodes (Xiening Dai) It's possible that Subset's best cost increases when input subset's best is changed. In those cases, although input subset's cost is reduced, the row count can increase which causes the increase of non-cumulative cost of parent rel. As a result, the cost of parent rel can increase. If the parent rel happens to be the best rel of a given subset, we currently do nothing. And this would lead to the inconsistency of the rel node cost. Fixing this by updating the best rel node cost if it's increased. Although this approach won't garantee an optimal plan, at least it makes sure the memo is consistent. More details, please refer to JIRA - https://issues.apache.org/jira/browse/CALCITE-2166 Close apache#1440
… after calling RelSubset.propagateCostImprovements() for input RelNodes (Xiening Dai) It's possible that Subset's best cost increases when input subset's best is changed. In those cases, although input subset's cost is reduced, the row count can increase which causes the increase of non-cumulative cost of parent rel. As a result, the cost of parent rel can increase. If the parent rel happens to be the best rel of a given subset, we currently do nothing. And this would lead to the inconsistency of the rel node cost. Fixing this by updating the best rel node cost if it's increased. Although this approach won't garantee an optimal plan, at least it makes sure the memo is consistent. More details, please refer to JIRA - https://issues.apache.org/jira/browse/CALCITE-2166 Close apache#1440 Change-Id: I3202652be58f4ecb3b9d0e5fe52cb0458a9f5c86
… after calling RelSubset.propagateCostImprovements() for input RelNodes
It's possible that Subset's best cost increases when input subset's best is changed. In those cases, although input subset's cost is reduced, the row count can increase which causes the increase of non-cumulative cost of parent rel. As a result, the cost of parent rel can increase. If the parent rel happens to be the best rel of a given subset, we currently do nothing. And this would lead to the inconsistency of the rel node cost.
Fixing this by updating the best rel node cost if it's increased. Although this approach won't garantee an optimal plan, at least it makes sure the memo is consistent.
More details, please refer to JIRA - https://issues.apache.org/jira/browse/CALCITE-2166