Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-26160: Materialized View rewrite does not check tables scanned in sub-query expressions #3229

Closed

Conversation

kasakrisz
Copy link
Contributor

What changes were proposed in this pull request?

Traverse the expressions in Project and Filter operators in CBO plan when collecting tables used.

Why are the changes needed?

For Materialized View rewrite based on exact sql text match Hive uses the initial CBO plan which may contains subquery expressions.

Does this PR introduce any user-facing change?

Yes. If a query plan was rewritten to scan an outdated MV the results can be different.

How was this patch tested?

mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=materialized_view_rewrite_by_text_9.q -pl itests/qtest -Pitests

@kasakrisz kasakrisz self-assigned this Apr 21, 2022
Copy link
Contributor

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, left some small comments that may be worth addressing.

/**
* Traverse the plan including sub-query expressions and collect table names from {@link TableScan} operators.
*/
protected Set<TableName> getAllTablesUsed(RelNode plan) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to distinguish if we are going to look in sub-queries or not? If yes can't simply add a boolean parameter with an intuitive name instead of adding a new method (e.g., includeSubqueries, expandSubqueries)?

*/
protected Set<TableName> getAllTablesUsed(RelNode plan) {
Set<TableName> tablesUsed = new HashSet<>();
new HiveSubQueryVisitor() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating new visitors wouldn't make sense to apply the SubQueryRemoveRule either inside this method or before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a good idea. Let's see how it works: #3246

import org.apache.calcite.rex.RexSubQuery;
import org.apache.calcite.rex.RexVisitorImpl;

public class HiveSubQueryVisitor extends RelVisitor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this class? Can't we somehow exploit the SubQueryRemoveRule? If we need this then probably we want to add some basic javadoc.

Comment on lines +33 to +38
if (node instanceof Filter) {
visit((Filter) node);
} else if (node instanceof Project) {
visit((Project) node);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to focus only on Filter/Project ? Why not subqueries in Join or elsewhere? Can't we use the RelNode#accept(RexShuttle) for more uniform access?


-- View can be used -> rewrite
explain cbo
select col0 from t2 where col0 in (select col0 from t1 where col0 = 1 union select col0 from t1 where col0 = 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to include also MV rewrite tests with subquery in project and join?

@kasakrisz
Copy link
Contributor Author

Closing this another solution was merged: #3246

@kasakrisz kasakrisz closed this Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants