-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13235] [SQL] Removed an Extra Distinct from the Plan when Using Union in SQL #11120
Conversation
Test build #50938 has finished for PR 11120 at commit
|
This change is on the parser. @hvanhovell Could you please take a look? Thanks! |
@gatorsmile This looks pretty solid. Could you add a test for this to |
@@ -2358,34 +2358,8 @@ setOpSelectStatement[CommonTree t, boolean topLevel] | |||
u=setOperator LPAREN b=simpleSelectStatement RPAREN | |||
| | |||
u=setOperator b=simpleSelectStatement) | |||
-> {$setOpSelectStatement.tree != null && $u.tree.getType()==SparkSqlParser.TOK_UNIONDISTINCT}? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this is redundant because originally TOK_UNIONALL
is used here. So an additional distinct is necessary. As we use TOK_UNIONDISTINCT
now, we can skip it.
@hvanhovell @viirya Thank you for your reviews! Just added two test cases for this, as suggested by @hvanhovell Yeah. I did check the original Hive JIRA: https://issues.apache.org/jira/browse/HIVE-9039 . The reason their parser added this is that they do not add another |
Test build #50975 has finished for PR 11120 at commit
|
LGTM |
1 similar comment
LGTM |
Merging to master. thanks! |
Currently, the parser added two
Distinct
operators in the plan if we are usingUnion
orUnion Distinct
in the SQL. This PR is to remove the extraDistinct
from the plan.For example, before the fix, the following query has a plan with two
Distinct
After the fix, the plan is changed without the extra
Distinct
as follows: