[SPARK-16714][SPARK-16735][SPARK-16646] array, map, greatest, least's type coercion should handle decimal type #14439

cloud-fan · 2016-08-01T14:33:42Z

What changes were proposed in this pull request?

Here is a table about the behaviours of array/map and greatest/least in Hive, MySQL and Postgres:

	Hive	MySQL	Postgres
`array`/`map`	can find a wider type with decimal type arguments, and will truncate the wider decimal type if necessary	can find a wider type with decimal type arguments, no truncation problem	can find a wider type with decimal type arguments, no truncation problem
`greatest`/`least`	can find a wider type with decimal type arguments, and truncate if necessary, but can't do string promotion	can find a wider type with decimal type arguments, no truncation problem, but can't do string promotion	can find a wider type with decimal type arguments, no truncation problem, but can't do string promotion

I think these behaviours makes sense and Spark SQL should follow them.

This PR fixes array and map by using findWiderCommonType to get the wider type.
This PR fixes greatest and least by add a findWiderTypeWithoutStringPromotion, which provides similar semantic of findWiderCommonType, but without string promotion.

How was this patch tested?

new tests in TypeCoersionSuite

cloud-fan · 2016-08-01T14:35:12Z

This is a quick fix for both master and 2.0 branch. After this we can adopt #14389 to make the code of master branch cleaner.

cc @petermaxlee @rxin @yhuai

SparkQA · 2016-08-01T16:23:43Z

Test build #63079 has finished for PR 14439 at commit 95f0866.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-08-01T16:48:08Z

Test build #63080 has finished for PR 14439 at commit d48590e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-08-01T18:25:30Z

Let's be careful at here. I am not sure we can just use DecimalPrecision.widerDecimalType, which produces Decimal(38, 38) when we have one decimal with the type of Decimal(38, 0) and another one with the type of Decimal(38, 38).

rxin · 2016-08-01T19:27:49Z

@yhuai do you have a concrete suggestion other than being careful here?

yhuai · 2016-08-01T22:24:03Z

It will be good to summarize the behaviors of other systems in the description. Let's also explain the behavioral change of this pr in the description. So, others can understand its implication.

Also, for master, I am wondering if we can change the behavior of DecimalPrecision.widerDecimalType. Right now, widerDecimalType will truncate the integral part, which is not intuitive.

yhuai · 2016-08-01T22:25:04Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala

+  /**
+   * Similar to [[findWiderCommonType]], but can't promote to string.
+   */
+  private def findWiderTypeWithoutStringPromotion(types: Seq[DataType]): Option[DataType] = {


It is weird that its name is findWiderTypeWithoutStringPromotion because findTightestCommonTypeOfTwo is used inside. Also, let's add more docs to this method.

cloud-fan · 2016-08-02T06:24:29Z

@yhuai , I checked the decimal truncation logic in hive, hive will truncate decimal(76, 38) to decimal(38, 0), which makes more senses than ours, as keeping the integral part can make the result more accurate.

SparkQA · 2016-08-02T08:42:05Z

Test build #63111 has finished for PR 14439 at commit 9def789.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-08-03T15:37:50Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala

+        operator(Literal(1.0).cast(DoubleType)
+          :: Literal.create(null, DecimalType(10, 5)).cast(DoubleType)
+          :: Literal(1).cast(DoubleType)
+          :: Nil))


Seems this test does not cover the logic of handling decimal types having different precisions and scales.

I will push a new commit with two more tests.

yhuai · 2016-08-03T16:00:32Z

@cloud-fan Thanks for the fix. The new logic looks good. I will merge it once jenkins passes.

SparkQA · 2016-08-03T17:52:18Z

Test build #63177 has finished for PR 14439 at commit 9f1e642.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-08-03T18:13:19Z

OK. I am merging this PR to master and branch 2.0.

… type coercion should handle decimal type ## What changes were proposed in this pull request? Here is a table about the behaviours of `array`/`map` and `greatest`/`least` in Hive, MySQL and Postgres: | |Hive|MySQL|Postgres| |---|---|---|---|---| |`array`/`map`|can find a wider type with decimal type arguments, and will truncate the wider decimal type if necessary|can find a wider type with decimal type arguments, no truncation problem|can find a wider type with decimal type arguments, no truncation problem| |`greatest`/`least`|can find a wider type with decimal type arguments, and truncate if necessary, but can't do string promotion|can find a wider type with decimal type arguments, no truncation problem, but can't do string promotion|can find a wider type with decimal type arguments, no truncation problem, but can't do string promotion| I think these behaviours makes sense and Spark SQL should follow them. This PR fixes `array` and `map` by using `findWiderCommonType` to get the wider type. This PR fixes `greatest` and `least` by add a `findWiderTypeWithoutStringPromotion`, which provides similar semantic of `findWiderCommonType`, but without string promotion. ## How was this patch tested? new tests in `TypeCoersionSuite` Author: Wenchen Fan <wenchen@databricks.com> Author: Yin Huai <yhuai@databricks.com> Closes #14439 from cloud-fan/bug. (cherry picked from commit b55f343) Signed-off-by: Yin Huai <yhuai@databricks.com>

array, map, greatest, least's type coercion should handle decimal type

d48590e

cloud-fan force-pushed the bug branch from 95f0866 to d48590e Compare August 1, 2016 14:40

dongjoon-hyun mentioned this pull request Aug 1, 2016

[SPARK-16714][SQL] array should create a decimal array from decimals with different precisions and scales #14353

Closed

yhuai reviewed Aug 1, 2016
View reviewed changes

more comments

9def789

cloud-fan force-pushed the bug branch from 4f1ea53 to 9def789 Compare August 2, 2016 06:34

yhuai reviewed Aug 3, 2016
View reviewed changes

Two more tests

9f1e642

asfgit closed this in b55f343 Aug 3, 2016

HyukjinKwon mentioned this pull request Feb 13, 2017

[SPARK-19435][SQL] Type coercion between ArrayTypes #16777

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-16714][SPARK-16735][SPARK-16646] array, map, greatest, least's type coercion should handle decimal type #14439

[SPARK-16714][SPARK-16735][SPARK-16646] array, map, greatest, least's type coercion should handle decimal type #14439

cloud-fan commented Aug 1, 2016 •

edited

Loading

cloud-fan commented Aug 1, 2016

SparkQA commented Aug 1, 2016

SparkQA commented Aug 1, 2016

yhuai commented Aug 1, 2016

rxin commented Aug 1, 2016

yhuai commented Aug 1, 2016 •

edited

Loading

yhuai Aug 1, 2016

cloud-fan commented Aug 2, 2016

SparkQA commented Aug 2, 2016

yhuai Aug 3, 2016

yhuai Aug 3, 2016

yhuai Aug 3, 2016

yhuai commented Aug 3, 2016 •

edited

Loading

SparkQA commented Aug 3, 2016

yhuai commented Aug 3, 2016

[SPARK-16714][SPARK-16735][SPARK-16646] array, map, greatest, least's type coercion should handle decimal type #14439

[SPARK-16714][SPARK-16735][SPARK-16646] array, map, greatest, least's type coercion should handle decimal type #14439

Conversation

cloud-fan commented Aug 1, 2016 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

cloud-fan commented Aug 1, 2016

SparkQA commented Aug 1, 2016

SparkQA commented Aug 1, 2016

yhuai commented Aug 1, 2016

rxin commented Aug 1, 2016

yhuai commented Aug 1, 2016 • edited Loading

yhuai Aug 1, 2016

Choose a reason for hiding this comment

cloud-fan commented Aug 2, 2016

SparkQA commented Aug 2, 2016

yhuai Aug 3, 2016

Choose a reason for hiding this comment

yhuai Aug 3, 2016

Choose a reason for hiding this comment

yhuai Aug 3, 2016

Choose a reason for hiding this comment

yhuai commented Aug 3, 2016 • edited Loading

SparkQA commented Aug 3, 2016

yhuai commented Aug 3, 2016

cloud-fan commented Aug 1, 2016 •

edited

Loading

yhuai commented Aug 1, 2016 •

edited

Loading

yhuai commented Aug 3, 2016 •

edited

Loading