-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16947][SQL] Improve type coercion for inline tables. #14539
Conversation
@hvanhovell in case of validation error, how will the error message look like? will it mention inline-tables in any way, or is it just going to complain about Union's requirements? |
Test build #63358 has finished for PR 14539 at commit
|
@eyalfa It is currently going to complain about Unions (if type coercion fails). |
@hvanhovell don't you think that if you're already taking a stab at it, it's better to introduce something like UnresolvedInlineTable with its own resolution logic and type coercion, once it's resolved it can be transformed into a Union of simple projects. |
@eyalfa I am a bit hesitant to add yet another almost pointless I do think your point has merit and that errors should be as concise as possible, but I would (if we were to change this) rather add some sort of an alias which encodes this information or just add a flag to Union. |
fair enough, I think it's worth adding a negative test just to see what we're dealing with. |
Test build #63360 has finished for PR 14539 at commit
|
Test build #63366 has finished for PR 14539 at commit
|
Can we create some common function used by both union and this? It seems like a pretty complicated plan to do this via union. |
Test build #63378 has finished for PR 14539 at commit
|
val numExpectedColumns = rows.head.size | ||
val aliases = if (ctx.identifierList != null) { | ||
val names = visitIdentifierList(ctx.identifierList) | ||
assert(names.size == numExpectedColumns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an error case users can hit, should we throw ParserException
instead of assert
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It uses a parser only version of assert that throws a ParseException: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala#L81
Come to think of it, we might need to rename it because people expect that assert calls can be elided. That is for a different PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
* another LocalRelation. | ||
* | ||
* This is relatively simple as it currently handles only a single case: Project. | ||
* Converts local operations (i.e. ones that don't require data exchange) on LocalRelation or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update comment.
@hvanhovell , I may miss something, why do we create this new |
Test build #63442 has finished for PR 14539 at commit
|
@cloud-fan I had an offline discussion with @rxin about this. His main point was that a larger inline table would create an extremely unreadable plan. So I came up with this. |
@hvanhovell how about we make |
@hvanhovell do you mind me taking a look at this? I am running into an issue in which I cannot use array() function to construct an array in inline tables (only literals are allowed). I can try fix the type coercion issue there too. |
@petermaxlee sure go ahead! |
// Create expressions. | ||
val rows = ctx.expression.asScala.map { e => | ||
expression(e) match { | ||
case CreateStruct(children) => children |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvanhovell what's this about? Why do we need to expand struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think I understand what's happening here now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parser creates rows by issuing CreateStruct
commands. Inline table takes a Seq[Expression]
per row. So we need to extracts the children from the CreateStruct
.
How to specify
Is that supported? |
@gatorsmile we should support this, but you might have to add an explicit cast. |
closing in favor of #14676 |
What changes were proposed in this pull request?
Inline tables were added in to Spark SQL in 2.0, e.g.:
select * from values (1, 'A'), (2, 'B') as tbl(a, b)
This is currently implemented using a
LocalRelation
and this relation is created during parsing. This has a weakness: type coercion is based on the first row in the relation, and all subsequent values are cast in to this type. The latter violates the principle of least surprise.This PR fixes this by creating a dedicated
InlineTable
node. Type coercion now follows the rules forUnion
, which is similar to other systems like PostgreSQL. In order to retain optimal speed, I have extended theConvertToLocalRelation
, which makes sure the table gets rewritten into aLocalRelation
during optimization.The following SQL statement:
... now yields the following plan:
... and the following result:
How was this patch tested?
I have updated the
PlanParseSuite
to test the parsers the new output, and I have added tests to theConvertToLocalRelationSuite
to tests if inline table (a-like) structures are converted intoLocalRelation
s. I still need to add tests for theTypeCoercion
andCheckAnalysis
rules.