New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29371][SQL] Support interval field values with fractional parts #26314
Conversation
@@ -36,7 +36,7 @@ object IntervalUtils { | |||
final val MICROS_PER_MINUTE: Long = | |||
DateTimeUtils.MILLIS_PER_MINUTE * DateTimeUtils.MICROS_PER_MILLIS | |||
final val DAYS_PER_MONTH: Byte = 30 | |||
final val MICROS_PER_MONTH: Long = DAYS_PER_MONTH * DateTimeUtils.SECONDS_PER_DAY | |||
final val MICROS_PER_MONTH: Long = DAYS_PER_MONTH * DateTimeUtils.MICROS_PER_DAY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good catch! do we have a test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, LGTM
@@ -36,7 +36,7 @@ object IntervalUtils { | |||
final val MICROS_PER_MINUTE: Long = | |||
DateTimeUtils.MILLIS_PER_MINUTE * DateTimeUtils.MICROS_PER_MILLIS | |||
final val DAYS_PER_MONTH: Byte = 30 | |||
final val MICROS_PER_MONTH: Long = DAYS_PER_MONTH * DateTimeUtils.SECONDS_PER_DAY | |||
final val MICROS_PER_MONTH: Long = DAYS_PER_MONTH * DateTimeUtils.MICROS_PER_DAY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh!
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
Show resolved
Hide resolved
This PR had better have the one-line fix on |
Test build #112906 has finished for PR 26314 at commit
|
Test build #112910 has finished for PR 26314 at commit
|
Test build #112903 has finished for PR 26314 at commit
|
With this pr, outside nanosecond will be omitted in silence as pg postgres=# select interval '.1111111111' second
postgres-# ;
interval
-----------------
00:00:00.111111
(1 row) |
Test build #112921 has finished for PR 26314 at commit
|
Test build #112982 has finished for PR 26314 at commit
|
@MaxGekk @cloud-fan @dongjoon-hyun rebased with master, please take a look again.thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is the big decimal ops are relatively expensive. If we will bulky load interval strings or cast from strings, throughput could be low. @yaooqinn Could you run IntervalBenchmark
to estimate performance.
@MaxGekk Yes, thanks, will do a benchmark comparison right now. |
Before[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.14.6
[info] Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
[info] cast strings to intervals: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] prepare string w/ interval 493 566 77 2.0 493.0 1.0X
[info] prepare string w/o interval 410 437 32 2.4 410.0 1.2X
[info] 1 units w/ interval 5237 5340 122 0.2 5236.6 0.1X
[info] 1 units w/o interval 5063 5136 123 0.2 5062.6 0.1X
[info] 2 units w/ interval 6964 6993 28 0.1 6964.0 0.1X
[info] 2 units w/o interval 6638 6665 38 0.2 6637.6 0.1X
[info] 3 units w/ interval 8344 8444 88 0.1 8343.9 0.1X
[info] 3 units w/o interval 7877 8034 166 0.1 7876.8 0.1X
[info] 4 units w/ interval 9306 9420 100 0.1 9305.7 0.1X
[info] 4 units w/o interval 8363 8635 239 0.1 8363.0 0.1X
[info] 5 units w/ interval 10037 10340 301 0.1 10036.6 0.0X
[info] 5 units w/o interval 10071 10136 107 0.1 10071.5 0.0X
[info] 6 units w/ interval 10818 11524 612 0.1 10818.5 0.0X
[info] 6 units w/o interval 10726 11289 490 0.1 10725.8 0.0X
[info] 7 units w/ interval 11498 11573 104 0.1 11498.1 0.0X
[info] 7 units w/o interval 11292 11311 28 0.1 11292.1 0.0X
[info] 8 units w/ interval 12792 12928 189 0.1 12792.4 0.0X
[info] 8 units w/o interval 12387 12399 12 0.1 12386.6 0.0X
[info] 9 units w/ interval 14307 14541 394 0.1 14307.5 0.0X
[info] 9 units w/o interval 13986 14015 49 0.1 13985.5 0.0X AFTER[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.14.6
[info] Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
[info] cast strings to intervals: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] prepare string w/ interval 643 726 108 1.6 643.0 1.0X
[info] prepare string w/o interval 468 501 35 2.1 468.4 1.4X
[info] 1 units w/ interval 5045 8939 NaN 0.2 5044.8 0.1X
[info] 1 units w/o interval 5591 5819 379 0.2 5590.6 0.1X
[info] 2 units w/ interval 6462 6938 765 0.2 6461.9 0.1X
[info] 2 units w/o interval 6022 6149 147 0.2 6021.7 0.1X
[info] 3 units w/ interval 7223 7319 90 0.1 7223.2 0.1X
[info] 3 units w/o interval 6972 7081 152 0.1 6972.3 0.1X
[info] 4 units w/ interval 8180 8333 262 0.1 8179.8 0.1X
[info] 4 units w/o interval 7817 7877 94 0.1 7817.5 0.1X
[info] 5 units w/ interval 9171 9201 29 0.1 9170.6 0.1X
[info] 5 units w/o interval 9119 9567 684 0.1 9118.7 0.1X
[info] 6 units w/ interval 10255 10538 488 0.1 10255.2 0.1X
[info] 6 units w/o interval 10122 10604 543 0.1 10121.9 0.1X
[info] 7 units w/ interval 12154 12334 222 0.1 12153.5 0.1X
[info] 7 units w/o interval 11352 11359 10 0.1 11352.1 0.1X
[info] 8 units w/ interval 12774 12778 6 0.1 12774.2 0.1X
[info] 8 units w/o interval 12627 12891 408 0.1 12627.2 0.1X
[info] 9 units w/ interval 14087 14209 184 0.1 14087.5 0.0X
[info] 9 units w/o interval 13856 14043 310 0.1 13855.9 0.0X |
@MaxGekk the performances look the same |
I modified slightly the benchmark and append fractions: diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/IntervalBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/IntervalBenchmark.scala
index d75cb1040f..ec32340b2b 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/IntervalBenchmark.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/IntervalBenchmark.scala
@@ -83,8 +83,8 @@ object IntervalBenchmark extends SqlBasedBenchmark {
override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
val N = 1000000
val timeUnits = Seq(
- "13 months", "100 weeks", "9 days", "12 hours",
- "5 minutes", "45 seconds", "123 milliseconds", "567 microseconds")
+ "13.123 months", "100.123 weeks", "9.123 days", "12.123 hours",
+ "5.123 minutes", "45.123 seconds", "123.456 milliseconds", "567 microseconds")
val intervalToTest = ListBuffer[String]() and the results look not so nice:
The throughput is roughly ~10 interval strings with 8-9 units per second. |
the worst-case gives the worst result. also, my benchmark test shows no performance regression to existing cases. Personally, I am ok with the result. |
Test build #113007 has finished for PR 26314 at commit
|
retest this please |
Test build #113032 has finished for PR 26314 at commit
|
@MaxGekk is this per ms not per second? |
retest this please |
Test build #113068 has finished for PR 26314 at commit
|
What changes were proposed in this pull request?
Support fraction representation fro interval field values.
Why are the changes needed?
In PostgreSQL, interval field values can have fractional parts. See https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT
Does this PR introduce any user-facing change?
yes, add fraction input support for interval values.
How was this patch tested?
add uts