-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12719][SQL] SQL generation support for generators, including UDTF #11563
Conversation
cc @liancheng @gatorsmile |
Can one of the admins verify this patch? |
} | ||
|
||
Generate(generator, join = true, outer = outer, Some(alias.toLowerCase), attributes, child) | ||
Generate( | ||
generator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4-space Indentations are needed here.
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile Thanks !! In the doc, 4-space indentation is applicable for function declaration. Is it also applicable here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, you are right. This is part of #11538
@liancheng already reviewed it. Thanks!
Overall, LGTM. Thanks! |
@@ -445,4 +461,86 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest with SQLTestUtils { | |||
"f1", "b[0].f1", "f1", "c[foo]", "d[0]" | |||
) | |||
} | |||
|
|||
test("SQL generation for generate") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is getting pretty long. i'd separate it into multiple logically separate test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rxin ok Reynold. I will group them into different test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rxin I have split the tests into 5 groups. Pl. let me know if it looks ok to you.
Why do we need two separate cases here? |
@rxin. Hi Reynold, We have two cases to handle. SELECT explode(array(1,2,3)) FROM src
SELECT gentab2.* FROM t1 LATERAL VIEW explode(array(array(1,2,3))) gentab1 AS gencol1 LATERAL VIEW explode(gentab1.gencol1) gentab2 AS gencol2 Currently, I handle the first case in Lateral view also can refer to columns from tables before itself. So i felt it is safer to However, I went with the approach in this PR as it didn't seem too complex and also retained the layout of the original SQL. I could be easily overlooking something here and would appreciate your guidance. Please let me know. |
"as we wanted to generate SQLs which is closer to the original SQL" Why is this a goal? I worry about the fragility of this two cases, if we really only need one to satisfy correctness. |
@rxin Thanks for the input. Let me try to work on it and see if i encounter any issues |
Hi @dilipbiswal , could you describe the pattern of a logical plan containing |
I think making generated SQL closer to the original one is not our goal, instead the goal should be: make your approach as simple as possible, so that it's easy to reason about, verify the correctness, and review the code. |
@cloud-fan Thanks..I understand now from Reynold's and your feedback. I wrongly assumed the goal to make the best effort to retain the original sql layout. Regarding the pattern of logical plans between the two cases here is
In the above plan, we have two LATERAL VIEW clauses with qualifier gentab1 and gentab2 respectively.
In the above case, its a generator in the projection list. There is a test which uses generator but has no table in the FROM clause like following. select explode(array(1,2,3)) AS gencol I suspect this may cause problem if we need to express the generator as a LATERAL VIEW. Please let me know if you need any other info. |
Closing this in favor of |
What changes were proposed in this pull request?
This PR is to convert SQL from analyzed logical plans containing Generate operator.
Sample Plan :
Generated Query:
How was this patch tested?
Added test cases in LogicalPlanToSQLSuite
Have also run HiveCompatibilitySuite to look for any failure in generation of Generate plans.
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)