-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-17432: Enable join and aggregate materialized view rewriting #245
Conversation
b2338e3
to
a0ebb58
Compare
if (crtView.getSerde() == null) { | ||
if (storageHandler == null) { | ||
LOG.info("Default to LazySimpleSerDe for materialized view " + crtView.getViewName()); | ||
tbl.setSerializationLib(LazySimpleSerDe.class.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look safe to assume serialization to be LazySimple. My suggestion is to throw in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is similar to the 'create table' statement logic:
If user does not specifies a custom serde, we follow same logic. I will factor out the code in fact so the method is reused for both (and if default changes for tables, it changes for materialized views too).
tbl.setInputFormatClass(crtView.getInputFormat()); | ||
tbl.setOutputFormatClass(crtView.getOutputFormat()); | ||
if (crtView.getInputFormat() != null && !crtView.getInputFormat().isEmpty()) { | ||
tbl.getSd().setInputFormat(tbl.getInputFormatClass().getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding a comment that we need to call setIF on both tbl and tbl.getSD()
List<String> parentTableQualifiedName = Lists.newArrayList(parentFullyQualifiedName); | ||
Table parentTab = null; | ||
try { | ||
parentTab = Hive.get().getTable(parentDatabaseName, parentTableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a cache for Table objects in SemanticAnalyzer::getTableObjectByName() We need to move that cache elsewhere and use it from places like this.
Could be a follow-up: Leave a TODO
ImmutableList.Builder<IntPair> keys = ImmutableList.builder(); | ||
for (ForeignKeyCol fkCol : fkCols) { | ||
int fkPos = -1; | ||
for (int i = 0; i < rowType.getFieldNames().size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (String fieldName : rowType.getFieldNames()) will be easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote it to make it more readable and avoid two counters, though I still needed at least one.
} | ||
int pkPos = -1; | ||
for (int i = 0; i < parentTab.getAllCols().size(); i++) { | ||
if (parentTab.getAllCols().get(i).getName().equals(fkCol.parentColName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be rewritten to make it easier to read.
@@ -1501,6 +1523,10 @@ public RelOptMaterialization apply(RelOptMaterialization materialization) { | |||
// Add view-based rewriting rules to planner | |||
planner.addRule(HiveMaterializedViewRule.INSTANCE_PROJECT_FILTER); | |||
planner.addRule(HiveMaterializedViewRule.INSTANCE_FILTER); | |||
planner.addRule(HiveMaterializedViewRule.INSTANCE_PROJECT_JOIN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding so many rules will slow down compilation. All these rules slightly vary in pattern they are looking for and roughly does same thing once they apply.
I suspect we have to create so many rules because Calcite's pattern matching on operator tree isnt powerful. We shall try to improve on this otherwise we traverse tree multiple times which could be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I guess that will depend on how the planner (Volcano in this case) handles internally the triggering of rules, we should aim at triggering them doing a single pass over the query plan.
Although currently Calcite has no way to indicate that a rule should not be triggered only once over a specific set of equivalent expressions, I think we should be safe on that end because we have only the MV rewriting specific rules in this group, thus they will only work over current plan.
SessionState.getSessionConf(), crtViewDesc.getStorageHandler()); | ||
} | ||
|
||
Class<? extends Deserializer> serdeClass = LazySimpleSerDe.class; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem like a good idea to use lazysimple if none is specified. Shall we throw in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is also similar to the one for the normal table descriptors, so we use the same strategy. I will consolidate the code here too.
a0ebb58
to
5c4f08e
Compare
@ashutoshc , I have addressed the comments in the first PR (sorry, I did not realized and I have pushed -f instead of creating a new commit on top of the old one). |
…sus Camacho Rodriguez, reviewed by Ashutosh Chauhan) Close apache#245
No description provided.