Skip to content

[SPARK-34850][SQL] Support multiply a day-time interval by a numeric#31951

Closed
MaxGekk wants to merge 8 commits intoapache:masterfrom
MaxGekk:mul-day-time-interval
Closed

[SPARK-34850][SQL] Support multiply a day-time interval by a numeric#31951
MaxGekk wants to merge 8 commits intoapache:masterfrom
MaxGekk:mul-day-time-interval

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Mar 24, 2021

What changes were proposed in this pull request?

  1. Add new expression MultiplyDTInterval which multiplies a DayTimeIntervalType expression by a NumericType expression including ByteType, ShortType, IntegerType, LongType, FloatType, DoubleType, DecimalType.
  2. Extend binary arithmetic rules to support numeric * day-time interval and day-time interval * numeric.
  3. Invoke DoubleMath.roundToInt in double/float * year-month interval.

Why are the changes needed?

To conform the ANSI SQL standard which requires such operation over day-time intervals:
Screenshot 2021-03-22 at 16 33 16

Does this PR introduce any user-facing change?

No

How was this patch tested?

By running new tests:

$ build/sbt "test:testOnly *IntervalExpressionsSuite"
$ build/sbt "test:testOnly *ColumnExpressionSuite"

@github-actions github-actions bot added the SQL label Mar 24, 2021
@SparkQA
Copy link

SparkQA commented Mar 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41031/

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 24, 2021

@yaooqinn @cloud-fan Could you review this PR, please.

@SparkQA
Copy link

SparkQA commented Mar 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41031/

s"((new Decimal()).set($m).$$times($n)).toJavaBigDecimal()" +
".setScale(0, java.math.RoundingMode.HALF_UP).longValueExact()")
case _: FractionalType =>
defineCodeGen(ctx, ev, (m, n) => s"java.lang.Math.round($m * (double)$n)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that we should fail here too not just round it to the max long value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the SQL standard require such behavior? From my point of view, we can map Double.PositiveInfinity to Long.MaxValue as multiple double values can map to the same long value. We just should document such behavior, and borrow the text from java.lang.Math.round()

     <li>If the argument is negative infinity or any value less than or
       equal to the value of {@code Long.MIN_VALUE}, the result is
       equal to the value of {@code Long.MIN_VALUE}.
     <li>If the argument is positive infinity or any value greater than or
       equal to the value of {@code Long.MAX_VALUE}, the result is
       equal to the value of {@code Long.MAX_VALUE}.</ul>

@srielau @cloud-fan WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion here. Shall we turn float/double to Decimal and do the calculation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the interval value expression's return type is an interval and it defines an interval itself. The expression behavior should respect the interval term rather than the multiplier or divisor. According to the standard, we should define and raise interval field overflow rather than numeric overflows when the number of significant digits exceeds the implementation-defined maximum number of significant digits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we turn float/double to Decimal and do the calculation?

Decimal (and Java BigDecimal) doesn't have representation for NaN, PositiveInfinity and NegativeInfinity, see:

scala> import org.apache.spark.sql.types.Decimal
import org.apache.spark.sql.types.Decimal

scala> Decimal(Double.NaN)
java.lang.NumberFormatException
  at java.math.BigDecimal.<init>(BigDecimal.java:497)

scala> Decimal(Float.PositiveInfinity)
java.lang.NumberFormatException
  at java.math.BigDecimal.<init>(BigDecimal.java:497)

scala> Decimal(Float.MinValue)
res2: org.apache.spark.sql.types.Decimal = -340282346638528860000000000000000000000

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found the roundToLong() from Guava satisfies our needs completely. I am going to invoke it.

@SparkQA
Copy link

SparkQA commented Mar 24, 2021

Test build #136447 has finished for PR 31951 at commit 58db4dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41065/

@SparkQA
Copy link

SparkQA commented Mar 25, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41065/

@SparkQA
Copy link

SparkQA commented Mar 25, 2021

Test build #136481 has finished for PR 31951 at commit b80b489.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41085/

@SparkQA
Copy link

SparkQA commented Mar 25, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41085/

@SparkQA
Copy link

SparkQA commented Mar 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41084/

@MaxGekk
Copy link
Member Author

MaxGekk commented Mar 25, 2021

GA passed. Merging to master.
Thank you, @yaooqinn @cloud-fan for your review.

@MaxGekk MaxGekk closed this in a68d7ca Mar 25, 2021
@SparkQA
Copy link

SparkQA commented Mar 25, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41084/

@SparkQA
Copy link

SparkQA commented Mar 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41087/

@SparkQA
Copy link

SparkQA commented Mar 25, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41087/

@SparkQA
Copy link

SparkQA commented Mar 25, 2021

Test build #136495 has finished for PR 31951 at commit e16055d.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 25, 2021

Test build #136500 has finished for PR 31951 at commit 1a35356.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 25, 2021

Test build #136502 has finished for PR 31951 at commit d844c54.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants