Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark #28981

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jul 2, 2020

What changes were proposed in this pull request?

Set the JSON option inferTimestamp to true for the cases that measure perf of timestamp inference.

Why are the changes needed?

The PR #28966 disabled timestamp inference by default. As a consequence, some benchmarks don't measure perf of timestamp inference from JSON fields. This PR explicitly enable such inference.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By re-generating results of JsonBenchmark.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 2, 2020

@dongjoon-hyun May I ask you to review this PR.

@dongjoon-hyun
Copy link
Member

Thank you for a quick follow-up~

read date from files 16583 16612 29 0.6 1658.3 0.2X
timestamp strings 3927 3963 40 2.5 392.7 0.7X
parse timestamps from Dataset[String] 52827 53004 243 0.2 5282.7 0.0X
infer timestamps from Dataset[String] 101108 101644 769 0.1 10110.8 0.0X
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

from_json(date) 43755 43782 27 0.2 4375.5 0.1X
read timestamp text from files 2616 2641 34 3.8 261.6 1.0X
read timestamps from files 37481 37517 58 0.3 3748.1 0.1X
infer timestamps from files 84774 84964 201 0.1 8477.4 0.0X
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. I checked both places in the benchmark code and test result.

dongjoon-hyun pushed a commit that referenced this pull request Jul 2, 2020
…mark

### What changes were proposed in this pull request?
Set the JSON option `inferTimestamp` to `true` for the cases that measure perf of timestamp inference.

### Why are the changes needed?
The PR #28966 disabled timestamp inference by default. As a consequence, some benchmarks don't measure perf of timestamp inference from JSON fields. This PR explicitly enable such inference.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By re-generating results of `JsonBenchmark`.

Closes #28981 from MaxGekk/json-inferTimestamps-disable-by-default-followup.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 42f01e3)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member

Merged to master/3.0. This is irrelevant to Jenkins unit tests.

@MaxGekk MaxGekk deleted the json-inferTimestamps-disable-by-default-followup branch December 11, 2020 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants