-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28409: Column lineage when creating view is missing if atlas HiveHook is set #5370
HIVE-28409: Column lineage when creating view is missing if atlas HiveHook is set #5370
Conversation
@@ -3865,6 +3865,8 @@ public static enum ConfVars { | |||
"get old behavior, if desired. See, test-case in patch for HIVE-6689."), | |||
HIVE_LINEAGE_INFO("hive.lineage.hook.info.enabled", false, | |||
"Whether Hive provides lineage information to hooks."), | |||
HIVE_LINEAGE_STATEMENT_FILTER("hive.lineage.statement.filter", "All", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we have a rule about it or not. But can the default value be lowercase?
I would love to avoid future confusion about if is that All, all or ALL.
Also, could you please put some description and and example values there that can help us to understand what that config value exactly does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comparison is not case sensitive to make this more felxible
if (ALL.equalsIgnoreCase(valueText)) {
if (NONE.equalsIgnoreCase(valueText)) {
HiveOperation enumValue = EnumUtils.getEnumIgnoreCase(HiveOperation.class, valueText);
I added an upper case default value because other values coming from HiveOperation are in upper case. There are other configs with upper case values so I think it is not an exception.
I also added the missing description.
@@ -6,7 +6,7 @@ | |||
* to you under the Apache License, Version 2.0 (the | |||
* "License"); you may not use this file except in compliance | |||
* with the License. You may obtain a copy of the License at | |||
* | |||
*f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accident typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@@ -79,7 +79,7 @@ public void initialize(HiveConf hiveConf) { | |||
|| postExecHooks.contains("org.apache.hadoop.hive.ql.hooks.PostExecutePrinter") | |||
|| postExecHooks.contains("org.apache.hadoop.hive.ql.hooks.LineageLogger") | |||
|| postExecHooks.contains("org.apache.atlas.hive.hook.HiveHook")) { | |||
transformations.add(new Generator(postExecHooks)); | |||
transformations.add(Generator.fromConf(hiveConf)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that before that change, we read the transformations from the postExecHooks that is as far I remember, calculated once. And from now, we read those again at every single query execution.
Am I correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
postExecHooks
also calculated at every query execution:
hive/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
Lines 75 to 77 in 09553fc
Set<String> postExecHooks = Sets.newHashSet( | |
Splitter.on(",").trimResults().omitEmptyStrings().split( | |
Strings.nullToEmpty(HiveConf.getVar(hiveConf, HiveConf.ConfVars.POST_EXEC_HOOKS)))); |
I think this should be addressed in a follow-up only if this is bottleneck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx
|
||
HiveOperation enumValue = EnumUtils.getEnumIgnoreCase(HiveOperation.class, valueText); | ||
if (enumValue == null) { | ||
throw new EnumConstantNotPresentException(HiveOperation.class, valueText); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with that part: what will be the user experience if they add this value to the conf: "Hello" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exception is thrown at compile time.
Example:
java.lang.EnumConstantNotPresentException: org.apache.hadoop.hive.ql.plan.HiveOperation.Hello
at org.apache.hadoop.hive.ql.optimizer.lineage.Generator.createFilterPredicateFromConf(Generator.java:106)
at org.apache.hadoop.hive.ql.optimizer.lineage.Generator.fromConf(Generator.java:72)
at org.apache.hadoop.hive.ql.optimizer.Optimizer.initialize(Optimizer.java:82)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13274)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx
operations.stream().anyMatch(hiveOperation -> filterMap.get(hiveOperation).apply(parseContext)); | ||
} | ||
|
||
private final Predicate<ParseContext> statementFilter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move that declaration and the constructor up so that the class can keep that pattern: fields -> constructors -> methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything before statementFilter
is static. Where should those static fields and methods go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really not add method declarations before field ones.
What about static fields, instance fields, static initialization block, constructor, public methods, private methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually move static members to the beginning of the file but I don't have strong opinion about this.
I changed the order as you suggested.
LOG.debug("Not evaluating lineage"); | ||
return pctx; | ||
} | ||
if (!statementFilter.test(pctx)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a fear that this change will break existing configs: Previously, we said that if the Atlas HiveHook is set, we will run that transformation. Now we are saying that if hive.lineage.statement.filter
is set, we do the transformation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it breaks existing configs but previously Atlas HiveHook-related filters were hardcoded and impossible to test. Now hive.lineage.statement.filter
is independent of hooks and hence applies for any hook making it possible to test using LineageLogger
.
81c89e9
to
35a9c6a
Compare
35a9c6a
to
4fbab8e
Compare
"Whether Hive provides lineage information to hooks."), | ||
"Whether Hive provides lineage information to hooks." + | ||
"Deprecated: use hive.lineage.statement.filter instead."), | ||
HIVE_LINEAGE_STATEMENT_FILTER("hive.lineage.statement.filter", "ALL", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest using NONE as default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NONE means that this functionality is never triggered. IMHO all functionality should be tested otherwise follow-up changes in the project may make it impossible to turn it on in the future. We have experienced this in the past a few times.
0e66ae2
to
b8e80c7
Compare
ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/Generator.java
Outdated
Show resolved
Hide resolved
ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/Generator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
b8e80c7
to
346f7ab
Compare
LOAD(parseContext -> !(parseContext.getLoadTableWork() == null || parseContext.getLoadTableWork().isEmpty())), | ||
QUERY(parseContext -> parseContext.getQueryProperties().isQuery()), | ||
ALL(parseContext -> true), | ||
NONE(parseContext -> false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i still think that NONE is redundant, but I am fine to keep it
|
What changes were proposed in this pull request?
isView
property toQueryProperties
to indicate that this statement is a logical view creation.Why are the changes needed?
Lineage information was not generated in case of create view operation. See jira for details: https://issues.apache.org/jira/browse/HIVE-28409
Does this PR introduce any user-facing change?
No.
Is the change a dependency upgrade?
No.
How was this patch tested?