-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-26160: Materialized View rewrite does not check tables scanned in sub-query expressions #3229
Conversation
…n sub-query expressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, left some small comments that may be worth addressing.
/** | ||
* Traverse the plan including sub-query expressions and collect table names from {@link TableScan} operators. | ||
*/ | ||
protected Set<TableName> getAllTablesUsed(RelNode plan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to distinguish if we are going to look in sub-queries or not? If yes can't simply add a boolean parameter with an intuitive name instead of adding a new method (e.g., includeSubqueries
, expandSubqueries
)?
*/ | ||
protected Set<TableName> getAllTablesUsed(RelNode plan) { | ||
Set<TableName> tablesUsed = new HashSet<>(); | ||
new HiveSubQueryVisitor() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating new visitors wouldn't make sense to apply the SubQueryRemoveRule
either inside this method or before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a good idea. Let's see how it works: #3246
import org.apache.calcite.rex.RexSubQuery; | ||
import org.apache.calcite.rex.RexVisitorImpl; | ||
|
||
public class HiveSubQueryVisitor extends RelVisitor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this class? Can't we somehow exploit the SubQueryRemoveRule
? If we need this then probably we want to add some basic javadoc.
if (node instanceof Filter) { | ||
visit((Filter) node); | ||
} else if (node instanceof Project) { | ||
visit((Project) node); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to focus only on Filter/Project ? Why not subqueries in Join
or elsewhere? Can't we use the RelNode#accept(RexShuttle)
for more uniform access?
|
||
-- View can be used -> rewrite | ||
explain cbo | ||
select col0 from t2 where col0 in (select col0 from t1 where col0 = 1 union select col0 from t1 where col0 = 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to include also MV rewrite tests with subquery in project and join?
Closing this another solution was merged: #3246 |
What changes were proposed in this pull request?
Traverse the expressions in
Project
andFilter
operators in CBO plan when collecting tables used.Why are the changes needed?
For Materialized View rewrite based on exact sql text match Hive uses the initial CBO plan which may contains subquery expressions.
Does this PR introduce any user-facing change?
Yes. If a query plan was rewritten to scan an outdated MV the results can be different.
How was this patch tested?