Skip to content

[SPARK-31623][SQL][TESTS] Benchmark rebasing of INT96 and TIMESTAMP_MILLIS timestamps in read/write#28431

Closed
MaxGekk wants to merge 9 commits intoapache:masterfrom
MaxGekk:parquet-timestamps-DateTimeRebaseBenchmark
Closed

[SPARK-31623][SQL][TESTS] Benchmark rebasing of INT96 and TIMESTAMP_MILLIS timestamps in read/write#28431
MaxGekk wants to merge 9 commits intoapache:masterfrom
MaxGekk:parquet-timestamps-DateTimeRebaseBenchmark

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented May 1, 2020

What changes were proposed in this pull request?

Add new benchmarks to DateTimeRebaseBenchmark for reading/writing timestamps of INT96 and TIMESTAMP_MICROS column types. Here are benchmark results for reading timestamps after 1582 year with default settings (rebasing is off for TIMESTAMP_MICROS/TIMESTAMP_MILLIS, and rebasing on for INT96):

timestamp type vectorized off (ns/row) vectorized on (ns/row)
TIMESTAMP_MICROS 160.1 50.2
INT96 215.6 117.8
TIMESTAMP_MILLIS 159.9 60.6

Why are the changes needed?

To compare default timestamp type TIMESTAMP_MICROS with other types in the case if an user decides to switch on them.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By running the benchmarks via:

SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.DateTimeRebaseBenchmark"

in the environment:

Item Description
Region us-west-2 (Oregon)
Instance r3.xlarge
AMI ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5)
Java OpenJDK 64-Bit Server VM 1.8.0_252-8u252 and OpenJDK 64-Bit Server VM 11.0.7+10

@SparkQA
Copy link

SparkQA commented May 1, 2020

Test build #122174 has finished for PR 28431 at commit b98c3d0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented May 2, 2020

@cloud-fan @HyukjinKwon Please, take a look at this.

MaxGekk added 5 commits May 4, 2020 13:21
…estamps-DateTimeRebaseBenchmark

# Conflicts:
#	sql/core/benchmarks/DateTimeRebaseBenchmark-jdk11-results.txt
#	sql/core/benchmarks/DateTimeRebaseBenchmark-results.txt
#	sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala
@SparkQA
Copy link

SparkQA commented May 4, 2020

Test build #122264 has finished for PR 28431 at commit 283b554.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 4, 2020

Test build #122263 has finished for PR 28431 at commit 07ac3d4.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented May 4, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented May 4, 2020

Test build #122265 has finished for PR 28431 at commit 0c9f9d5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 4, 2020

Test build #122276 has finished for PR 28431 at commit 0c9f9d5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 735771e May 5, 2020
cloud-fan pushed a commit that referenced this pull request May 5, 2020
…ILLIS timestamps in read/write

### What changes were proposed in this pull request?
Add new benchmarks to `DateTimeRebaseBenchmark` for reading/writing timestamps of INT96 and TIMESTAMP_MICROS column types. Here are benchmark results for reading timestamps after 1582 year with default settings (rebasing is off for TIMESTAMP_MICROS/TIMESTAMP_MILLIS,  and rebasing on for INT96):

timestamp type | vectorized off (ns/row) | vectorized on (ns/row)
--|--|--
TIMESTAMP_MICROS| 160.1 | 50.2
INT96 | 215.6 | 117.8
TIMESTAMP_MILLIS | 159.9 | 60.6

### Why are the changes needed?
To compare default timestamp type `TIMESTAMP_MICROS` with other types in the case if an user decides to switch on them.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the benchmarks via:
```
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.DateTimeRebaseBenchmark"
```
in the environment:
| Item | Description |
| ---- | ----|
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_252-8u252 and OpenJDK 64-Bit Server VM 11.0.7+10 |

Closes #28431 from MaxGekk/parquet-timestamps-DateTimeRebaseBenchmark.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 735771e)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@MaxGekk MaxGekk deleted the parquet-timestamps-DateTimeRebaseBenchmark branch June 5, 2020 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants