[SPARK-26656][SQL] Benchmarks for date and timestamp functions#23661
[SPARK-26656][SQL] Benchmarks for date and timestamp functions#23661MaxGekk wants to merge 15 commits intoapache:masterfrom
Conversation
|
Test build #101715 has finished for PR 23661 at commit
|
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala
Outdated
Show resolved
Hide resolved
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Hi, @MaxGekk . This is a useful benchmark. I love this.
So, can we have a more complete coverage? For example, datediff or months_between are not covered here. Please review the full list and add here, too.
cc @gatorsmile
Sure. I just benchmarked the functions that could be affected by one of the tickets: https://issues.apache.org/jira/browse/SPARK-26651 to see that there is no unusual performance degradation spike. |
|
Test build #101720 has finished for PR 23661 at commit
|
| window: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------ | ||
| window wholestage off 1636 / 1661 0.6 1635.9 1.0X | ||
| window wholestage on 19997 / 20240 0.1 19997.4 0.1X |
There was a problem hiding this comment.
This is interesting. Performance drop ~x12. /cc @hvanhovell
There was a problem hiding this comment.
It is, let's address this in a follow-up.
There was a problem hiding this comment.
Do we have a JIRA to track this?
| date_trunc QUARTER: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------ | ||
| date_trunc QUARTER wholestage off 5261 / 5271 1.9 526.1 1.0X | ||
| date_trunc QUARTER wholestage on 5145 / 5151 1.9 514.5 1.0X |
There was a problem hiding this comment.
Truncation to QUARTER is comparable slow. Need to re-run the benchmark with new implementation: https://github.com/apache/spark/pull/23641/files#diff-da60f07e1826788aaeb07f295fae4b8aL747
|
Test build #101728 has finished for PR 23661 at commit
|
hvanhovell
left a comment
There was a problem hiding this comment.
LGTM - Merging to master.
|
Sorry, never mind. I missed the latest commits. LGTM. |
## What changes were proposed in this pull request? Added the following benchmarks: - Extract components from timestamp like year, month, day and etc. - Current date and time - Date arithmetic like date_add, date_sub - Format dates and timestamps - Convert timestamps from/to UTC Closes apache#23661 from MaxGekk/datetime-benchmark. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
What changes were proposed in this pull request?
Added the following benchmarks: