Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-2166] Cumulative cost of RelSubset.best RelNode is increased… #1440

Closed
wants to merge 4 commits into from

Conversation

xndai
Copy link
Contributor

@xndai xndai commented Sep 4, 2019

… after calling RelSubset.propagateCostImprovements() for input RelNodes

It's possible that Subset's best cost increases when input subset's best is changed. In those cases, although input subset's cost is reduced, the row count can increase which causes the increase of non-cumulative cost of parent rel. As a result, the cost of parent rel can increase. If the parent rel happens to be the best rel of a given subset, we currently do nothing. And this would lead to the inconsistency of the rel node cost.

Fixing this by updating the best rel node cost if it's increased. Although this approach won't garantee an optimal plan, at least it makes sure the memo is consistent.

More details, please refer to JIRA - https://issues.apache.org/jira/browse/CALCITE-2166

… after calling RelSubset.propagateCostImprovements() for input RelNodes

It's possible that Subset's best cost increases when input subset's best is changed. In those cases, although input subset's cost is reduced, the row count can increase which causes the increase of non-cumulative cost of parent rel. As a result, the cost of parent rel can increase. If the parent rel happens to be the best rel of a given subset, we currently do nothing. And this would lead to the inconsistency of the rel node cost.

Fixing this by updating the best rel node cost if it's increased. Although this approach won't garantee an optimal plan, at least it makes sure the memo is consistent.

More details, please refer to JIRA - https://issues.apache.org/jira/browse/CALCITE-2166
@xndai xndai closed this Sep 4, 2019
@xndai xndai reopened this Sep 4, 2019
@xndai
Copy link
Contributor Author

xndai commented Sep 5, 2019

@vvysotskyi can you please take a look at this? The PR is opened based on our discussions in Jira.

@vvysotskyi
Copy link
Member

@xndai, I'll try to review it by the end of this week.

@hsyuan hsyuan closed this Sep 6, 2019
@hsyuan hsyuan reopened this Sep 6, 2019
Copy link
Member

@vvysotskyi vvysotskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xndai, changes look good, thanks for the fix!

}

// Make sure bsetCost is accurate
RelOptCost bestCost = getCost(subset.best, subset.best.getCluster().getMetadataQuery());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be bestCost

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix it.


// Best rel's cost is increased, we need to search for new best rel
if (rel == best && bestCost.isLt(cost)) {
updateBest = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need a decision rel == best ? We should always update the best rel and cost if we found any. The problem here is that when and how this fix is triggered, can you give more explain ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need to update best when its own cost is increased. If this is any other rel node and its cost is larger than current best, we simply ignore (same behavior as today). I think I have explained the case when this could happen. Please check the descriptions and JIRA. Let me know if you have further question. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in this for loop, you have replaced the best rel right ? This replaced rel may still not be the best one, i didn't find the meaningness to replace it, if you want to fix the cost val, just update the cost.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case do you think the replaced rel may still not be the best one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop is to guarantee the new best would be the one with cheapest cost in current sub set. It could be a different node if the cost of the original best has increased. As I mentioned in the JIRA, this approach doesn't guarantee an optimal result but does make sure the memo is consistent. To get optimal result, we either need to make sure RelSubset row count doesn't change (it shouldn't change in theory), or we have multiple best node candidates in a subset. But the change would be too drastic and the runtime overhead is non trivial.

Copy link
Member

@hsyuan hsyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@hsyuan hsyuan closed this in f95f74a Sep 13, 2019
wangxlong pushed a commit to wangxlong/calcite that referenced this pull request Feb 13, 2020
… after calling RelSubset.propagateCostImprovements() for input RelNodes (Xiening Dai)

It's possible that Subset's best cost increases when input subset's best is
changed. In those cases, although input subset's cost is reduced, the row count
can increase which causes the increase of non-cumulative cost of parent rel. As
a result, the cost of parent rel can increase. If the parent rel happens to be
the best rel of a given subset, we currently do nothing. And this would lead to
the inconsistency of the rel node cost.

Fixing this by updating the best rel node cost if it's increased. Although this
approach won't garantee an optimal plan, at least it makes sure the memo is
consistent.

More details, please refer to JIRA -
https://issues.apache.org/jira/browse/CALCITE-2166

Close apache#1440
hbtoo pushed a commit to hbtoo/calcite that referenced this pull request Oct 2, 2020
… after calling RelSubset.propagateCostImprovements() for input RelNodes (Xiening Dai)

It's possible that Subset's best cost increases when input subset's best is
changed. In those cases, although input subset's cost is reduced, the row count
can increase which causes the increase of non-cumulative cost of parent rel. As
a result, the cost of parent rel can increase. If the parent rel happens to be
the best rel of a given subset, we currently do nothing. And this would lead to
the inconsistency of the rel node cost.

Fixing this by updating the best rel node cost if it's increased. Although this
approach won't garantee an optimal plan, at least it makes sure the memo is
consistent.

More details, please refer to JIRA -
https://issues.apache.org/jira/browse/CALCITE-2166

Close apache#1440

Change-Id: I3202652be58f4ecb3b9d0e5fe52cb0458a9f5c86
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants