Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16714][SPARK-16735][SPARK-16646] array, map, greatest, least's type coercion should handle decimal type #14439

Closed
wants to merge 3 commits into from

Conversation

cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Aug 1, 2016

What changes were proposed in this pull request?

Here is a table about the behaviours of array/map and greatest/least in Hive, MySQL and Postgres:

Hive MySQL Postgres
array/map can find a wider type with decimal type arguments, and will truncate the wider decimal type if necessary can find a wider type with decimal type arguments, no truncation problem can find a wider type with decimal type arguments, no truncation problem
greatest/least can find a wider type with decimal type arguments, and truncate if necessary, but can't do string promotion can find a wider type with decimal type arguments, no truncation problem, but can't do string promotion can find a wider type with decimal type arguments, no truncation problem, but can't do string promotion

I think these behaviours makes sense and Spark SQL should follow them.

This PR fixes array and map by using findWiderCommonType to get the wider type.
This PR fixes greatest and least by add a findWiderTypeWithoutStringPromotion, which provides similar semantic of findWiderCommonType, but without string promotion.

How was this patch tested?

new tests in TypeCoersionSuite

@cloud-fan
Copy link
Contributor Author

This is a quick fix for both master and 2.0 branch. After this we can adopt #14389 to make the code of master branch cleaner.

cc @petermaxlee @rxin @yhuai

@SparkQA
Copy link

SparkQA commented Aug 1, 2016

Test build #63079 has finished for PR 14439 at commit 95f0866.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2016

Test build #63080 has finished for PR 14439 at commit d48590e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Aug 1, 2016

Let's be careful at here. I am not sure we can just use DecimalPrecision.widerDecimalType, which produces Decimal(38, 38) when we have one decimal with the type of Decimal(38, 0) and another one with the type of Decimal(38, 38).

@rxin
Copy link
Contributor

rxin commented Aug 1, 2016

@yhuai do you have a concrete suggestion other than being careful here?

@yhuai
Copy link
Contributor

yhuai commented Aug 1, 2016

It will be good to summarize the behaviors of other systems in the description. Let's also explain the behavioral change of this pr in the description. So, others can understand its implication.

Also, for master, I am wondering if we can change the behavior of DecimalPrecision.widerDecimalType. Right now, widerDecimalType will truncate the integral part, which is not intuitive.

/**
* Similar to [[findWiderCommonType]], but can't promote to string.
*/
private def findWiderTypeWithoutStringPromotion(types: Seq[DataType]): Option[DataType] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is weird that its name is findWiderTypeWithoutStringPromotion because findTightestCommonTypeOfTwo is used inside. Also, let's add more docs to this method.

@cloud-fan
Copy link
Contributor Author

@yhuai , I checked the decimal truncation logic in hive, hive will truncate decimal(76, 38) to decimal(38, 0), which makes more senses than ours, as keeping the integral part can make the result more accurate.

@SparkQA
Copy link

SparkQA commented Aug 2, 2016

Test build #63111 has finished for PR 14439 at commit 9def789.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

operator(Literal(1.0).cast(DoubleType)
:: Literal.create(null, DecimalType(10, 5)).cast(DoubleType)
:: Literal(1).cast(DoubleType)
:: Nil))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this test does not cover the logic of handling decimal types having different precisions and scales.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will push a new commit with two more tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@yhuai
Copy link
Contributor

yhuai commented Aug 3, 2016

@cloud-fan Thanks for the fix. The new logic looks good. I will merge it once jenkins passes.

@SparkQA
Copy link

SparkQA commented Aug 3, 2016

Test build #63177 has finished for PR 14439 at commit 9f1e642.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Aug 3, 2016

OK. I am merging this PR to master and branch 2.0.

@asfgit asfgit closed this in b55f343 Aug 3, 2016
asfgit pushed a commit that referenced this pull request Aug 3, 2016
… type coercion should handle decimal type

## What changes were proposed in this pull request?

Here is a table about the behaviours of `array`/`map` and `greatest`/`least` in Hive, MySQL and Postgres:

|    |Hive|MySQL|Postgres|
|---|---|---|---|---|
|`array`/`map`|can find a wider type with decimal type arguments, and will truncate the wider decimal type if necessary|can find a wider type with decimal type arguments, no truncation problem|can find a wider type with decimal type arguments, no truncation problem|
|`greatest`/`least`|can find a wider type with decimal type arguments, and truncate if necessary, but can't do string promotion|can find a wider type with decimal type arguments, no truncation problem, but can't do string promotion|can find a wider type with decimal type arguments, no truncation problem, but can't do string promotion|

I think these behaviours makes sense and Spark SQL should follow them.

This PR fixes `array` and `map` by using `findWiderCommonType` to get the wider type.
This PR fixes `greatest` and `least` by add a `findWiderTypeWithoutStringPromotion`, which provides similar semantic of `findWiderCommonType`, but without string promotion.

## How was this patch tested?

new tests in `TypeCoersionSuite`

Author: Wenchen Fan <wenchen@databricks.com>
Author: Yin Huai <yhuai@databricks.com>

Closes #14439 from cloud-fan/bug.

(cherry picked from commit b55f343)
Signed-off-by: Yin Huai <yhuai@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants