-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16730][SQL] Implement function aliases for type casts #14364
Conversation
@cloud-fan this is an alternative implementation using FunctionRegistry. Let me know if you prefer this one over #14362. |
expression[BitwiseXor]("^"), | ||
|
||
// Cast aliases (SPARK-16730) | ||
castAlias("boolean", BooleanType), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in hive, if users create a udf called boolean
, will hive throw exception or override the type casting one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boolean is just a normal function in Hive (same as for example acos), so it would do whatever a normal function's behavior is.
Test build #62872 has finished for PR 14364 at commit
|
Test build #62902 has finished for PR 14364 at commit
|
castAlias("tinyint", ByteType), | ||
castAlias("smallint", ShortType), | ||
castAlias("int", IntegerType), | ||
castAlias("bigint", LongType), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use LongType.simpleString
instead of bigint
looks better. Same to others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's actually worse, because it makes it less clear what the function name is by looking at this source file. Also if for some reason we change LongType.simpleString in the future, these functions will subtly break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok agree
mostly LGTM, thanks for working on it! |
Test build #62909 has finished for PR 14364 at commit
|
} | ||
|
||
/** | ||
* Creates a function lookup registry for cast aliases (SPARK-16730). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: ...function lookup registry...
should that ben...function registry lookup entry ...
or something similar?
A few minor comments. LGTM otherwise. |
I've updated the pull request based on @hvanhovell's comment. |
Test build #62928 has finished for PR 14364 at commit
|
} | ||
Cast(args.head, dataType) | ||
} | ||
(name, (expressionInfo[Cast](name), builder)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so whatever cast function we describe, we will always show Cast
's description right? Is it same with hive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - this is a limitation. That's not what Hive does because Hive actually does not have a single cast expression. It has a cast expression for each target type. I think it's a pretty small detail and fixing it would require a lot of work.
## What changes were proposed in this pull request? Spark 1.x supports using the Hive type name as function names for doing casts, e.g. ```sql SELECT int(1.0); SELECT string(2.0); ``` The above query would work in Spark 1.x because Spark 1.x fail back to Hive for unimplemented functions, and break in Spark 2.0 because the fall back was removed. This patch implements function aliases using an analyzer rule for the following cast functions: - boolean - tinyint - smallint - int - bigint - float - double - decimal - date - timestamp - binary - string ## How was this patch tested? Added end-to-end tests in SQLCompatibilityFunctionSuite. Author: petermaxlee <petermaxlee@gmail.com> Closes apache#14364 from petermaxlee/SPARK-16730-2.
## What changes were proposed in this pull request? Spark 1.x supports using the Hive type name as function names for doing casts, e.g. ```sql SELECT int(1.0); SELECT string(2.0); ``` The above query would work in Spark 1.x because Spark 1.x fail back to Hive for unimplemented functions, and break in Spark 2.0 because the fall back was removed. This patch implements function aliases using an analyzer rule for the following cast functions: - boolean - tinyint - smallint - int - bigint - float - double - decimal - date - timestamp - binary - string ## How was this patch tested? Added end-to-end tests in SQLCompatibilityFunctionSuite. Author: petermaxlee <petermaxlee@gmail.com> Closes apache#14364 from petermaxlee/SPARK-16730-2. (cherry picked from commit 11d427c) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks, merging to master and 2.0! |
Great. Thanks for merging. |
What changes were proposed in this pull request?
Spark 1.x supports using the Hive type name as function names for doing casts, e.g.
The above query would work in Spark 1.x because Spark 1.x fail back to Hive for unimplemented functions, and break in Spark 2.0 because the fall back was removed.
This patch implements function aliases using an analyzer rule for the following cast functions:
How was this patch tested?
Added end-to-end tests in SQLCompatibilityFunctionSuite.