Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8052][SQL] Use java.math.BigDecimal for casting String to Decimal instead of using toDouble #6645

Closed
wants to merge 3 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Jun 4, 2015

@JoshRosen
Copy link
Contributor

Regression test?

@@ -323,7 +323,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
private[this] def castToDecimal(from: DataType, target: DecimalType): Any => Any = from match {
case StringType =>
buildCast[UTF8String](_, s => try {
changePrecision(Decimal(s.toString.toDouble), target)
changePrecision(Decimal(new java.math.BigDecimal(s.toString)), target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just import this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

@viirya viirya changed the title Use java.math.BigDecimal for casting String to Decimal instead of using toDouble [SPARK-8052][SQL] Use java.math.BigDecimal for casting String to Decimal instead of using toDouble Jun 4, 2015
@viirya
Copy link
Member Author

viirya commented Jun 4, 2015

I will update the test later.

@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34191 has finished for PR 6645 at commit 7ced9b0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 5, 2015

Test build #34240 has finished for PR 6645 at commit e19c6a3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jun 6, 2015

ping @srowen

@srowen
Copy link
Member

srowen commented Jun 6, 2015

I'm not qualified to review this, but, I'm wondering why this query involves a conversion to a decimal type at all? The target type is bigint. while this may patch the issue, there are deeper implications to converting a bunch of stuff to BigDecimal instead of double, so I'm not sure this is the source of the problem?

@srowen
Copy link
Member

srowen commented Jun 6, 2015

Also per the JIRA, is this even a Spark issue?

@viirya
Copy link
Member Author

viirya commented Jun 6, 2015

In HiveTypeCoercion, there is a StringToIntegralCasts which casts a string to an integral type. When the string represents a fractional number, jvm will throw a java.lang.NumberFormatException. In case to prevent this, it casts the string to DecimalType.Unlimited first, and then casts the decimal to the asked integral type later.

This problematic case here is when the given string representation is over the double range, toDouble will get wrong result.

@srowen
Copy link
Member

srowen commented Jun 6, 2015

Yeah, that's a problem then. The conversion to floating-point could be lossy as your test case indicates. I don't know that this is the fix though, since it has non-trivial side effects. Converting to a decimal type seems wrong. But is this specific to Hive on Spark?

@viirya
Copy link
Member Author

viirya commented Jun 7, 2015

As we can't know whether the string represents a fractional number or not, casting to decimal seems being the most feasible approach? Because typeCoercionRules is used in Analyzer, this casting might not only be specific to Hive on Spark.

@davies
Copy link
Contributor

davies commented Jun 9, 2015

@viirya This change make sense to me.

@viirya
Copy link
Member Author

viirya commented Jun 13, 2015

ping @rxin @marmbrus

@marmbrus
Copy link
Contributor

Thanks! Merging to master.

@asfgit asfgit closed this in ddec452 Jun 13, 2015
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…imal instead of using toDouble

JIRA: https://issues.apache.org/jira/browse/SPARK-8052

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#6645 from viirya/cast_string_integraltype and squashes the following commits:

e19c6a3 [Liang-Chi Hsieh] For comment.
c3e472a [Liang-Chi Hsieh] Add test.
7ced9b0 [Liang-Chi Hsieh] Use java.math.BigDecimal for casting String to Decimal instead of using toDouble.
asfgit pushed a commit that referenced this pull request Jul 20, 2015
…imal instead of using toDouble

JIRA: https://issues.apache.org/jira/browse/SPARK-8052

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #6645 from viirya/cast_string_integraltype and squashes the following commits:

e19c6a3 [Liang-Chi Hsieh] For comment.
c3e472a [Liang-Chi Hsieh] Add test.
7ced9b0 [Liang-Chi Hsieh] Use java.math.BigDecimal for casting String to Decimal instead of using toDouble.

(cherry picked from commit ddec452)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@viirya viirya deleted the cast_string_integraltype branch December 27, 2023 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants