-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28321 Support select alias in the having clause for CBO #5294
Conversation
RowResolver inputRR = relToHiveRR.get(srcRel); | ||
for (ASTNode astNode : exprToAlias.keySet()) { | ||
if (inputRR.getExpression(astNode) != null) { | ||
inputRR.put("", exprToAlias.get(astNode), inputRR.getExpression(astNode)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that tableAlias
is an empty string here, and even in non-cbo path it is an empty string, but not sure if there is a better way to deal with this. I see several instances in the code base where we use empty string for this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you. I think the better way to handle is to pass the tableAlias field instead of the "" and then let the field fallback to the empty string if it is empty. Given that it is a much broader scope than this PR, I would prefer to address this issue separately. Let me know what you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is fine with me :)
@@ -1,4 +1,5 @@ | |||
-- Test HAVING clause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are supporting this through CBO, I think it'd be nice to include a CBO plan for one of the queries below.
Also, it would be good to add a more complex test with multiple column aliases in the having clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding more tests and EXPLAIN CBO plans is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let's add the extra tests and small suggested improvements and then we can get this in.
Map<ASTNode, String> exprToAlias = qbPI.getAllExprToColumnAlias(); | ||
RowResolver inputRR = relToHiveRR.get(srcRel); | ||
for (ASTNode astNode : exprToAlias.keySet()) { | ||
if (inputRR.getExpression(astNode) != null) { | ||
inputRR.put("", exprToAlias.get(astNode), inputRR.getExpression(astNode)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is basically modifying the RowResolver
adding some kind of reverse mappings. Moreover, this logic is duplicated in SemanticAnalyzer#genHavingPlan
. I think it makes sense to refactor this snippet and create a new method in the RowResolver
class with a proper name (and javadoc if necessary) that clarifies what is the code supposed to do. The only parameter to the method should be a Map<ASTNode, String>
.
@@ -1,4 +1,5 @@ | |||
-- Test HAVING clause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding more tests and EXPLAIN CBO plans is a good idea.
* @return | ||
* @throws SemanticException | ||
*/ | ||
public void putAggregateAlias(Map<ASTNode, String> exprToColumnAlias) throws SemanticException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the method is unnecessarily tight to aggregates and aliases while the implementation seems much more generic than that. Given that we just call put
for every entry in the map maybe a better name would be putAll
or maybe replaceAll
since we only perform the action if an entry is already there.
Will this logic really add an entry in the resolver or it just changes the alias for ColumnInfo that is registered previously? Depending on the answer another potential name would be replaceAliases
.
A well chosen name will better indicate what the method really does without requiring checking the implementation.
|
||
EXPLAIN CBO SELECT count(value) as c, max(key) as m from src GROUP BY key HAVING c > 3 and m > 0; | ||
SELECT count(value) as c, max(key) as m from src GROUP BY key HAVING c > 3 and m > 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are talking about max(key)
does the m > 0
condition filter anything? If not then maybe changing to m > 400
would be a safer choice.
Quality Gate passedIssues Measures |
Thank you @soumyakanti3578 and @zabetak for the review |
What changes were proposed in this pull request?
Having clause CBO path of the queries should have visibility from the alias of the select expressions, as this is the case in the non-CBO path.
Why are the changes needed?
In order to make the CBO path and non-CBO work the same way in terms of syntax and semantics of having clause
Does this PR introduce any user-facing change?
No
Is the change a dependency upgrade?
No
How was this patch tested?
Unit tests. Specifically mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=limit_pushdown_negative.q