-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16714][SPARK-16735][SPARK-16646] array, map, greatest, least's type coercion should handle decimal type #14439
Conversation
This is a quick fix for both master and 2.0 branch. After this we can adopt #14389 to make the code of master branch cleaner. |
Test build #63079 has finished for PR 14439 at commit
|
Test build #63080 has finished for PR 14439 at commit
|
Let's be careful at here. I am not sure we can just use |
@yhuai do you have a concrete suggestion other than being careful here? |
It will be good to summarize the behaviors of other systems in the description. Let's also explain the behavioral change of this pr in the description. So, others can understand its implication. Also, for master, I am wondering if we can change the behavior of |
/** | ||
* Similar to [[findWiderCommonType]], but can't promote to string. | ||
*/ | ||
private def findWiderTypeWithoutStringPromotion(types: Seq[DataType]): Option[DataType] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is weird that its name is findWiderTypeWithoutStringPromotion
because findTightestCommonTypeOfTwo
is used inside. Also, let's add more docs to this method.
@yhuai , I checked the decimal truncation logic in hive, hive will truncate |
Test build #63111 has finished for PR 14439 at commit
|
operator(Literal(1.0).cast(DoubleType) | ||
:: Literal.create(null, DecimalType(10, 5)).cast(DoubleType) | ||
:: Literal(1).cast(DoubleType) | ||
:: Nil)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this test does not cover the logic of handling decimal types having different precisions and scales.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will push a new commit with two more tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
@cloud-fan Thanks for the fix. The new logic looks good. I will merge it once jenkins passes. |
Test build #63177 has finished for PR 14439 at commit
|
OK. I am merging this PR to master and branch 2.0. |
… type coercion should handle decimal type ## What changes were proposed in this pull request? Here is a table about the behaviours of `array`/`map` and `greatest`/`least` in Hive, MySQL and Postgres: | |Hive|MySQL|Postgres| |---|---|---|---|---| |`array`/`map`|can find a wider type with decimal type arguments, and will truncate the wider decimal type if necessary|can find a wider type with decimal type arguments, no truncation problem|can find a wider type with decimal type arguments, no truncation problem| |`greatest`/`least`|can find a wider type with decimal type arguments, and truncate if necessary, but can't do string promotion|can find a wider type with decimal type arguments, no truncation problem, but can't do string promotion|can find a wider type with decimal type arguments, no truncation problem, but can't do string promotion| I think these behaviours makes sense and Spark SQL should follow them. This PR fixes `array` and `map` by using `findWiderCommonType` to get the wider type. This PR fixes `greatest` and `least` by add a `findWiderTypeWithoutStringPromotion`, which provides similar semantic of `findWiderCommonType`, but without string promotion. ## How was this patch tested? new tests in `TypeCoersionSuite` Author: Wenchen Fan <wenchen@databricks.com> Author: Yin Huai <yhuai@databricks.com> Closes #14439 from cloud-fan/bug. (cherry picked from commit b55f343) Signed-off-by: Yin Huai <yhuai@databricks.com>
What changes were proposed in this pull request?
Here is a table about the behaviours of
array
/map
andgreatest
/least
in Hive, MySQL and Postgres:array
/map
greatest
/least
I think these behaviours makes sense and Spark SQL should follow them.
This PR fixes
array
andmap
by usingfindWiderCommonType
to get the wider type.This PR fixes
greatest
andleast
by add afindWiderTypeWithoutStringPromotion
, which provides similar semantic offindWiderCommonType
, but without string promotion.How was this patch tested?
new tests in
TypeCoersionSuite