Skip to content

[SPARK-26656][SQL] Benchmarks for date and timestamp functions#23661

Closed
MaxGekk wants to merge 15 commits intoapache:masterfrom
MaxGekk:datetime-benchmark
Closed

[SPARK-26656][SQL] Benchmarks for date and timestamp functions#23661
MaxGekk wants to merge 15 commits intoapache:masterfrom
MaxGekk:datetime-benchmark

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jan 26, 2019

What changes were proposed in this pull request?

Added the following benchmarks:

  • Extract components from timestamp like year, month, day and etc.
  • Current date and time
  • Date arithmetic like date_add, date_sub
  • Format dates and timestamps
  • Convert timestamps from/to UTC

@SparkQA
Copy link

SparkQA commented Jan 26, 2019

Test build #101715 has finished for PR 23661 at commit 87c26dd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @MaxGekk . This is a useful benchmark. I love this.
So, can we have a more complete coverage? For example, datediff or months_between are not covered here. Please review the full list and add here, too.

cc @gatorsmile

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 26, 2019

So, can we have a more complete coverage?

Sure. I just benchmarked the functions that could be affected by one of the tickets: https://issues.apache.org/jira/browse/SPARK-26651 to see that there is no unusual performance degradation spike.

@SparkQA
Copy link

SparkQA commented Jan 27, 2019

Test build #101720 has finished for PR 23661 at commit 8e4cb6b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

window: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
window wholestage off 1636 / 1661 0.6 1635.9 1.0X
window wholestage on 19997 / 20240 0.1 19997.4 0.1X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. Performance drop ~x12. /cc @hvanhovell

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, let's address this in a follow-up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a JIRA to track this?

date_trunc QUARTER: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
date_trunc QUARTER wholestage off 5261 / 5271 1.9 526.1 1.0X
date_trunc QUARTER wholestage on 5145 / 5151 1.9 514.5 1.0X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truncation to QUARTER is comparable slow. Need to re-run the benchmark with new implementation: https://github.com/apache/spark/pull/23641/files#diff-da60f07e1826788aaeb07f295fae4b8aL747

@SparkQA
Copy link

SparkQA commented Jan 27, 2019

Test build #101728 has finished for PR 23661 at commit cb12aae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Merging to master.

@asfgit asfgit closed this in bd027f6 Jan 28, 2019
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 28, 2019

Hi, @MaxGekk .
Could you make a followup as we discussed here?

Sorry, never mind. I missed the latest commits. LGTM.

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

Added the following benchmarks:
- Extract components from timestamp like year, month, day and etc.
- Current date and time
- Date arithmetic like date_add, date_sub
- Format dates and timestamps
- Convert timestamps from/to UTC

Closes apache#23661 from MaxGekk/datetime-benchmark.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
@MaxGekk MaxGekk deleted the datetime-benchmark branch August 17, 2019 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants