Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-24167: TPC-DS query 14 fails while generating plan for the filter #5077
base: master
Are you sure you want to change the base?
HIVE-24167: TPC-DS query 14 fails while generating plan for the filter #5077
Changes from 6 commits
b23ec31
50dc3f3
17cea96
3cfb6e1
c473fe9
b11e3f8
bd22d94
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some attributes that could differentiate two HiveTableScans. We may not need these ones here.
There is one potential problem in my mind. HiveTableScan doesn't retain the equivalents of
TableScanDesc#getPredicateString
orTableScanDesc#getRowLimit
. So, we can't unify ASTNodes or RelNodes of HiveTableScan based on signatures. Otherwise, Operators will be over-unified later.Currently, we link only HiveFilter RelNodes using RelTreeSignatures(we link HiveTableScan with its ASTNode, but don't link it with the signature). So, the existence of
TableScanDesc#getPredicateString
doesn't matter to us.Also, CBO is disabled when TABLESAMPLE is used. So, we can say
TableScanDesc#getRowLimit
doesn't cause an issue in the world of RelNodes. I expect we could push down the row count to HiveTableScan when we support TABLESAMPLE in CBO.So, in my current understanding, this PR doesn't cause an immediate problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We link RelTreeSignature first so that we can safely unify multiple filters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool! :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By setting this property the test fails with the same
RuntimeException: equivalence mapping violation
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm checking the whole execution and what the flag does. Disabling
hive.query.planmapper.link.relnodes
, no aux signature is created, and no merge across Operators doesn't happen. I feel it is not consistent with the description,Whether to link Calcite nodes to runtime statistics
.I guess the direct problem will be resolved if we skip linking RelNodes with Operators when
hive.query.planmapper.link.relnodes=false
is configured. That sounds more consistent with the description.I'm putting my additional notes here. I tried to put some to-be solutions but it could not be very easy.
https://docs.google.com/document/d/1LCST23cSBZglBzjhnCqHlpcLv6xrXRBxU_wszSDAk9w/edit?usp=sharing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess
hive.query.planmapper.link.relnodes
can be implemented like this. @kgyrtkirk might have some more contexts.okumin@cdab2c1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this fails if we disable
hive.query.planmapper.link.relnodes
.@zabetak As a potential workaround, I'm wondering if it makes sense to relax the validation and disable features with which PlanMapper is involved when "equivalence mapping violation" happens.
I presume PlanMapper succeeds in 99.9% cases as most qtests succeed. Currently, the remaining 0.1% fails and it sounds like an excessive penalty. We can keep the validation for qtests so that we can minimize the risk of degradation.
In my feeling, it could be possible to decrease the 0.1% to 0.05%, but could be tough to achieve 0% with a single patch.
okumin@dd5be15