Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29371][SQL] Support interval field values with fractional parts #26314

Closed
wants to merge 3 commits into from
Closed

[SPARK-29371][SQL] Support interval field values with fractional parts #26314

wants to merge 3 commits into from

Conversation

yaooqinn
Copy link
Member

What changes were proposed in this pull request?

Support fraction representation fro interval field values.

  1. the fraction part of the year only affects the month. WE OBEY THAT.
  2. pg not support millisecond or microsecond. WE DONT FOLLOW.
postgres=# select interval '1.41 years 2.51 months 2.21 weeks 15.24 days 3.31 hours 5.38 minutes 12.3456789 seconds';
               interval
---------------------------------------
 1 year 6 mons 45 days 27:38:35.145679
(1 row)postgres=# select interval '1.42 years 2.51 months 2.21 weeks 15.24 days 3.31 hours 5.38 minutes 12.3456789 seconds';
               interval
---------------------------------------
 1 year 7 mons 45 days 27:38:35.145679
(1 row)postgres=# select interval '1.42 years 2.51 months 2.21 weeks 15.24 days 3.31 hours 5.38 minutes 12.3456789 seconds 2.1 milliseconds ';
ERROR:  invalid input syntax for type interval: "1.42 years 2.51 months 2.21 weeks 15.24 days 3.31 hours 5.38 minutes 12.3456789 seconds 2.1 milliseconds "
LINE 1: select interval '1.42 years 2.51 months 2.21 weeks 15.24 day...

Why are the changes needed?

In PostgreSQL, interval field values can have fractional parts. See https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT

Does this PR introduce any user-facing change?

yes, add fraction input support for interval values.

How was this patch tested?

add uts

@yaooqinn
Copy link
Member Author

cc @cloud-fan @MaxGekk

@@ -36,7 +36,7 @@ object IntervalUtils {
final val MICROS_PER_MINUTE: Long =
DateTimeUtils.MILLIS_PER_MINUTE * DateTimeUtils.MICROS_PER_MILLIS
final val DAYS_PER_MONTH: Byte = 30
final val MICROS_PER_MONTH: Long = DAYS_PER_MONTH * DateTimeUtils.SECONDS_PER_DAY
final val MICROS_PER_MONTH: Long = DAYS_PER_MONTH * DateTimeUtils.MICROS_PER_DAY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good catch! do we have a test case?

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, LGTM

@@ -36,7 +36,7 @@ object IntervalUtils {
final val MICROS_PER_MINUTE: Long =
DateTimeUtils.MILLIS_PER_MINUTE * DateTimeUtils.MICROS_PER_MILLIS
final val DAYS_PER_MONTH: Byte = 30
final val MICROS_PER_MONTH: Long = DAYS_PER_MONTH * DateTimeUtils.SECONDS_PER_DAY
final val MICROS_PER_MONTH: Long = DAYS_PER_MONTH * DateTimeUtils.MICROS_PER_DAY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh!

@dongjoon-hyun
Copy link
Member

This PR had better have the one-line fix on final val MICROS_PER_MONTH: Long = DAYS_PER_MONTH * DateTimeUtils.MICROS_PER_DAY and test cases.

@SparkQA
Copy link

SparkQA commented Oct 30, 2019

Test build #112906 has finished for PR 26314 at commit 6231fb4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 30, 2019

Test build #112910 has finished for PR 26314 at commit bba0ad3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 30, 2019

Test build #112903 has finished for PR 26314 at commit 612bfca.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

With this pr, outside nanosecond will be omitted in silence as pg

postgres=# select interval '.1111111111' second
postgres-# ;
    interval
-----------------
 00:00:00.111111
(1 row)

@SparkQA
Copy link

SparkQA commented Oct 30, 2019

Test build #112921 has finished for PR 26314 at commit 25b78bb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 31, 2019

Test build #112982 has finished for PR 26314 at commit 652d01c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

@MaxGekk @cloud-fan @dongjoon-hyun rebased with master, please take a look again.thanks.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is the big decimal ops are relatively expensive. If we will bulky load interval strings or cast from strings, throughput could be low. @yaooqinn Could you run IntervalBenchmark to estimate performance.

@yaooqinn
Copy link
Member Author

@MaxGekk Yes, thanks, will do a benchmark comparison right now.

@yaooqinn
Copy link
Member Author

Before

[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.14.6
[info] Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
[info] cast strings to intervals:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] prepare string w/ interval                          493            566          77          2.0         493.0       1.0X
[info] prepare string w/o interval                         410            437          32          2.4         410.0       1.2X
[info] 1 units w/ interval                                5237           5340         122          0.2        5236.6       0.1X
[info] 1 units w/o interval                               5063           5136         123          0.2        5062.6       0.1X
[info] 2 units w/ interval                                6964           6993          28          0.1        6964.0       0.1X
[info] 2 units w/o interval                               6638           6665          38          0.2        6637.6       0.1X
[info] 3 units w/ interval                                8344           8444          88          0.1        8343.9       0.1X
[info] 3 units w/o interval                               7877           8034         166          0.1        7876.8       0.1X
[info] 4 units w/ interval                                9306           9420         100          0.1        9305.7       0.1X
[info] 4 units w/o interval                               8363           8635         239          0.1        8363.0       0.1X
[info] 5 units w/ interval                               10037          10340         301          0.1       10036.6       0.0X
[info] 5 units w/o interval                              10071          10136         107          0.1       10071.5       0.0X
[info] 6 units w/ interval                               10818          11524         612          0.1       10818.5       0.0X
[info] 6 units w/o interval                              10726          11289         490          0.1       10725.8       0.0X
[info] 7 units w/ interval                               11498          11573         104          0.1       11498.1       0.0X
[info] 7 units w/o interval                              11292          11311          28          0.1       11292.1       0.0X
[info] 8 units w/ interval                               12792          12928         189          0.1       12792.4       0.0X
[info] 8 units w/o interval                              12387          12399          12          0.1       12386.6       0.0X
[info] 9 units w/ interval                               14307          14541         394          0.1       14307.5       0.0X
[info] 9 units w/o interval                              13986          14015          49          0.1       13985.5       0.0X

AFTER

[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.14.6
[info] Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
[info] cast strings to intervals:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] prepare string w/ interval                          643            726         108          1.6         643.0       1.0X
[info] prepare string w/o interval                         468            501          35          2.1         468.4       1.4X
[info] 1 units w/ interval                                5045           8939         NaN          0.2        5044.8       0.1X
[info] 1 units w/o interval                               5591           5819         379          0.2        5590.6       0.1X
[info] 2 units w/ interval                                6462           6938         765          0.2        6461.9       0.1X
[info] 2 units w/o interval                               6022           6149         147          0.2        6021.7       0.1X
[info] 3 units w/ interval                                7223           7319          90          0.1        7223.2       0.1X
[info] 3 units w/o interval                               6972           7081         152          0.1        6972.3       0.1X
[info] 4 units w/ interval                                8180           8333         262          0.1        8179.8       0.1X
[info] 4 units w/o interval                               7817           7877          94          0.1        7817.5       0.1X
[info] 5 units w/ interval                                9171           9201          29          0.1        9170.6       0.1X
[info] 5 units w/o interval                               9119           9567         684          0.1        9118.7       0.1X
[info] 6 units w/ interval                               10255          10538         488          0.1       10255.2       0.1X
[info] 6 units w/o interval                              10122          10604         543          0.1       10121.9       0.1X
[info] 7 units w/ interval                               12154          12334         222          0.1       12153.5       0.1X
[info] 7 units w/o interval                              11352          11359          10          0.1       11352.1       0.1X
[info] 8 units w/ interval                               12774          12778           6          0.1       12774.2       0.1X
[info] 8 units w/o interval                              12627          12891         408          0.1       12627.2       0.1X
[info] 9 units w/ interval                               14087          14209         184          0.1       14087.5       0.0X
[info] 9 units w/o interval                              13856          14043         310          0.1       13855.9       0.0X

@yaooqinn
Copy link
Member Author

@MaxGekk the performances look the same

@MaxGekk
Copy link
Member

MaxGekk commented Oct 31, 2019

I modified slightly the benchmark and append fractions:

diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/IntervalBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/IntervalBenchmark.scala
index d75cb1040f..ec32340b2b 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/IntervalBenchmark.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/IntervalBenchmark.scala
@@ -83,8 +83,8 @@ object IntervalBenchmark extends SqlBasedBenchmark {
   override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
     val N = 1000000
     val timeUnits = Seq(
-      "13 months", "100 weeks", "9 days", "12 hours",
-      "5 minutes", "45 seconds", "123 milliseconds", "567 microseconds")
+      "13.123 months", "100.123 weeks", "9.123 days", "12.123 hours",
+      "5.123 minutes", "45.123 seconds", "123.456 milliseconds", "567 microseconds")
     val intervalToTest = ListBuffer[String]()

and the results look not so nice:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
cast strings to intervals:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
prepare string w/ interval                          440            497          51          2.3         440.3       1.0X
prepare string w/o interval                         370            392          21          2.7         370.3       1.2X
1 units w/ interval                                5322           5414         142          0.2        5322.4       0.1X
1 units w/o interval                               5002           5038          35          0.2        5002.2       0.1X
2 units w/ interval                               18172          18285         189          0.1       18172.0       0.0X
2 units w/o interval                              17899          18278         511          0.1       17899.4       0.0X
3 units w/ interval                               30881          31481         665          0.0       30880.5       0.0X
3 units w/o interval                              30839          31600         664          0.0       30838.5       0.0X
4 units w/ interval                               44694          45302         547          0.0       44693.9       0.0X
4 units w/o interval                              42490          42599          99          0.0       42489.5       0.0X
5 units w/ interval                               54424          54469          67          0.0       54424.5       0.0X
5 units w/o interval                              54879          55015         161          0.0       54879.1       0.0X
6 units w/ interval                               69155          69281         140          0.0       69154.8       0.0X
6 units w/o interval                              68877          69018         156          0.0       68876.7       0.0X
7 units w/ interval                               82116          82204         112          0.0       82116.4       0.0X
7 units w/o interval                              81495          81641         144          0.0       81495.4       0.0X
8 units w/ interval                               94609          94630          22          0.0       94608.7       0.0X
8 units w/o interval                              93915          94099         180          0.0       93914.8       0.0X
9 units w/ interval                               93132          95566        2126          0.0       93132.2       0.0X
9 units w/o interval                              91703          94179         NaN          0.0       91703.1       0.0X

The throughput is roughly ~10 interval strings with 8-9 units per second.

@yaooqinn
Copy link
Member Author

the worst-case gives the worst result. also, my benchmark test shows no performance regression to existing cases. Personally, I am ok with the result.

@SparkQA
Copy link

SparkQA commented Oct 31, 2019

Test build #113007 has finished for PR 26314 at commit b071e09.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Oct 31, 2019

Test build #113032 has finished for PR 26314 at commit b071e09.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

yaooqinn commented Nov 1, 2019

The throughput is roughly ~10 interval strings with 8-9 units per second.

@MaxGekk is this per ms not per second?

@yaooqinn
Copy link
Member Author

yaooqinn commented Nov 1, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Nov 1, 2019

Test build #113068 has finished for PR 26314 at commit b071e09.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants