Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-2018] Queries failed with AssertionError: rel has lower cost… #552

Closed
wants to merge 3 commits into from

Conversation

vvysotskyi
Copy link
Member

@vvysotskyi vvysotskyi commented Oct 23, 2017

… than best cost of subset.

  • Replaced Map usage by Table for query metadata caching. It allows finding cached metadata query for concrete RelNode instance more easily.
  • Made changes to use single RelMetadataQuery instance during planning between the first call of RelOptCluster.getMetadataQuery() and RelOptCluster.invalidateMetadataQuery() call.
  • Cached metadata values are removed for current RelSubset and all parent RelNodes when RelSubset.propagateCostImprovements() is called and best rel node was changed.
  • Made changes in RelSet.mergeWith() method to call RelSubset.propagateCostImprovements() when resulting best is known instead of just assigning RelSubset.best.

if (otherSubset.bestCost.isLt(subset.bestCost)) {
subset.bestCost = otherSubset.bestCost;
subset.best = otherSubset.best;
changedSubsets.put(subset, otherSubset.best);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should something like max(changedSubsets.get(subset), otherSubset.best) be used here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good assumption, but when registering new rel nodes into this RelSet from otherSubset, the best cost for subset may be changed, and therefore otherSubset.best should be passed, since it is better according to the check above. Also, otherSubset is not changed and therefore there is no need to try to update its best cost.

lincoln-lil pushed a commit to lincoln-lil/calcite that referenced this pull request Feb 18, 2019
… than best cost of subset. (Inspired from pr: apache#552 which is committed by vvysotskyi).

Summary:
1.Replaced Map usage by Table for query metadata caching. It allows finding cached metadata query for concrete RelNode instance more easily.
2. Cached metadata values are removed for current RelSubset and all parent RelNodes when RelSubset.propagateCostImprovements() is called and best rel node was changed.
3. Made changes in RelSet.mergeWith() method to call RelSubset.propagateCostImprovements() when resulting best is known instead of just assigning RelSubset.best.
Note:
1. Does not import following points from pr because it will increase the lifetime of RelMetadataQuery.
Made changes to use single RelMetadataQuery instance during planning between the first call of RelOptCluster.getMetadataQuery() and RelOptCluster.invalidateMetadataQuery() call.

Test Plan: NO

Reviewers: 鲁尼, 晓令, 玉兆

Reviewed By: 晓令

Subscribers: P577102

Differential Revision: https://aone.alibaba-inc.com/code/D642962
RelSubset relSubset = subsetBestPair.getKey();
relSubset.propagateCostImprovements(
planner, mq, subsetBestPair.getValue(),
activeSet);
Copy link
Contributor

@danny0405 danny0405 Mar 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okey, this is definitely needed. We should reCompute the best rel based on the new cost instead of replace it directly with the best rel in the merged RelSubSet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this is one of the key changes in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason of recomputing otherSubset.best? Do you think its input can be changed at this point? Can you please give a scenario? Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is required for the case when this RelSet contains RelSubset which matches to the trait sets of otherSubset, so best may be changed (otherSubset.bestCost.isLt(subset.bestCost) check) and therefore parent rel nodes should be updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Now I see why. It looks to me that if we move these codes to be after line 363 (renaming parentRel), then we won't need line 373 to line 378. What do you think?

@vvysotskyi vvysotskyi force-pushed the CALCITE-2018 branch 2 times, most recently from 78b4def to dfa8317 Compare April 16, 2019 10:30
@xndai
Copy link
Contributor

xndai commented Sep 9, 2019

Also I notice under debugger that RelNode tree can have more than one RelOptCluster. I haven't dig into that. But if this is the case, your #2 bullet point in your change description may not be true.

@zabetak
Copy link
Contributor

zabetak commented Sep 9, 2019

Hi @xndai if you have two clusters in the same RelNode tree then it sounds like a bug. There are various checks in the VolcanoPlanner to avoid this situation.

@xndai
Copy link
Contributor

xndai commented Sep 10, 2019

@zabetak @vvysotskyi I look closer and find two tests cases in JdbcTest where I see rel nodes have different RelOptCluster instances - testLateralJoin and testExpandViewWithLateralJoin. It looks like during parsing when there's view table and ViewTable.expandView() would create a new SqlToRelConverter instance which then would use a different RelOptCluster instance for rel nodes it creates.

Is this expected?

@zabetak
Copy link
Contributor

zabetak commented Sep 12, 2019

@xndai thanks for looking into this further. I kind of understand why it happens but I think it should not.

@xndai
Copy link
Contributor

xndai commented Sep 15, 2019

I would also suggest we turn on isValid() check by default in UT (instead of doing that only under Trace logger level). We would expect all these kinds of issues are gone with this fix (and 2116). Also to prevent any future regressions.

@xndai
Copy link
Contributor

xndai commented Sep 24, 2019

Based on this change, I made a couple of modifications, now the isValid() check passes for all unit tests. You can find at my changes at - https://github.com/xndai/calcite/tree/2018

The key thing I add is that when a RelNode is renamed and it is subset's best RelNode, we propagate change improvements since the subset's best cost could be changed. I also turn on isValid() check under assert.

Also with this change, we run into some CyclicMetadataException when accessing meta. Right now I just catch and ignore, similar to how it is handled in other parts of the code. But in the long run, we will need to figure out better way to handle cyclic.

@vvysotskyi @zabetak

@danny0405
Copy link
Contributor

Thanks @xndai for the additional patch, the CyclicMetadataException already says that exception is thrown for signaling purposes, so the stack trace is not really important. Also i rechecked the code and make sure that the fix of @vvysotskyi is the right way to go, we should consider:

  1. Propagate the cost when 2 RelSet are merged.
  2. Clear the cached metadata for parents when propagating.

@hsyuan
Copy link
Member

hsyuan commented Sep 28, 2019

But the stack trace is misleading if we reuse the exception instance. On the other hand, how much time will be wasted if we construct new CyclicMetadataException instance without reusing the singleton? I don't think it is significant.

@xndai
Copy link
Contributor

xndai commented Sep 29, 2019

I agree with Haisheng. The exception needs to be fixed. Currently we could get a wrong call stack info when reusing CyclicMetadataException. This is really confusing and creates a lot of pains during debugging.

vvysotskyi and others added 3 commits October 29, 2019 11:13
The change completes the fix of CALCITE-2018. Below are a few key additions -

1. When a Rel is renamed and it is subset's best RelNode, we need to propagate change improvement since the subset's best cost could be changed.
2. Add codes to handle CyclicMetadataException. Right now we just ignore it. We will revisit once we have figured out a better way to handle CyclicMetadataException.
3. Enable isValid() check under assert. Also check validation after RelSet merge.
@xndai
Copy link
Contributor

xndai commented Oct 29, 2019

LGTM

Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@danny0405 danny0405 added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Oct 30, 2019
Copy link
Member

@hsyuan hsyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@danny0405 danny0405 closed this in e0d869a Oct 30, 2019
@hsyuan
Copy link
Member

hsyuan commented Oct 30, 2019

@danny0405 You didn't squash commits?

@danny0405
Copy link
Contributor

Yes, i want to keep the credit of every committer.

@hsyuan
Copy link
Member

hsyuan commented Oct 30, 2019

You don't need to. There is a way to keep multiple authors, like 955d4ea

@danny0405
Copy link
Contributor

Thanks, but i want to follow the CALCITE-1935 style because this fix lasts so long time and so many discussions, split it into multiple commits make the fix more clear.

XuQianJin-Stars pushed a commit to XuQianJin-Stars/calcite that referenced this pull request Nov 17, 2019
… than best cost of subset

* Replaced Map usage by Table for query metadata caching. It allows finding cached metadata query for concrete RelNode instance more easily.
* Made changes to use single RelMetadataQuery instance during planning between the first call of RelOptCluster.getMetadataQuery() and RelOptCluster.invalidateMetadataQuery() call.
* Cached metadata values are removed for current RelSubset and all parent RelNodes when RelSubset.propagateCostImprovements() is called and best rel node was changed.
* Made changes in RelSet.mergeWith() method to call RelSubset.propagateCostImprovements() when resulting best is known instead of just assigning RelSubset.best.

close apache#552
XuQianJin-Stars pushed a commit to XuQianJin-Stars/calcite that referenced this pull request Nov 20, 2019
… than best cost of subset

* Replaced Map usage by Table for query metadata caching. It allows finding cached metadata query for concrete RelNode instance more easily.
* Made changes to use single RelMetadataQuery instance during planning between the first call of RelOptCluster.getMetadataQuery() and RelOptCluster.invalidateMetadataQuery() call.
* Cached metadata values are removed for current RelSubset and all parent RelNodes when RelSubset.propagateCostImprovements() is called and best rel node was changed.
* Made changes in RelSet.mergeWith() method to call RelSubset.propagateCostImprovements() when resulting best is known instead of just assigning RelSubset.best.

close apache#552
XuQianJin-Stars pushed a commit to XuQianJin-Stars/calcite that referenced this pull request Nov 20, 2019
… than best cost of subset

* Replaced Map usage by Table for query metadata caching. It allows finding cached metadata query for concrete RelNode instance more easily.
* Made changes to use single RelMetadataQuery instance during planning between the first call of RelOptCluster.getMetadataQuery() and RelOptCluster.invalidateMetadataQuery() call.
* Cached metadata values are removed for current RelSubset and all parent RelNodes when RelSubset.propagateCostImprovements() is called and best rel node was changed.
* Made changes in RelSet.mergeWith() method to call RelSubset.propagateCostImprovements() when resulting best is known instead of just assigning RelSubset.best.

close apache#552
wangxlong pushed a commit to wangxlong/calcite that referenced this pull request Feb 13, 2020
… than best cost of subset

* Replaced Map usage by Table for query metadata caching. It allows finding cached metadata query for concrete RelNode instance more easily.
* Made changes to use single RelMetadataQuery instance during planning between the first call of RelOptCluster.getMetadataQuery() and RelOptCluster.invalidateMetadataQuery() call.
* Cached metadata values are removed for current RelSubset and all parent RelNodes when RelSubset.propagateCostImprovements() is called and best rel node was changed.
* Made changes in RelSet.mergeWith() method to call RelSubset.propagateCostImprovements() when resulting best is known instead of just assigning RelSubset.best.

close apache#552
hbtoo pushed a commit to hbtoo/calcite that referenced this pull request Oct 2, 2020
… than best cost of subset

* Replaced Map usage by Table for query metadata caching. It allows finding cached metadata query for concrete RelNode instance more easily.
* Made changes to use single RelMetadataQuery instance during planning between the first call of RelOptCluster.getMetadataQuery() and RelOptCluster.invalidateMetadataQuery() call.
* Cached metadata values are removed for current RelSubset and all parent RelNodes when RelSubset.propagateCostImprovements() is called and best rel node was changed.
* Made changes in RelSet.mergeWith() method to call RelSubset.propagateCostImprovements() when resulting best is known instead of just assigning RelSubset.best.

close apache#552

Change-Id: I3590638395bde98699980b73a891aef441ec7167
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
6 participants