Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25665][SQL][TEST] Refactor ObjectHashAggregateExecBenchmark to… #22804

Closed
wants to merge 7 commits into from

Conversation

peter-toth
Copy link
Contributor

What changes were proposed in this pull request?

Refactor ObjectHashAggregateExecBenchmark to use main method

How was this patch tested?

Manually tested:

bin/spark-submit --class org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark --jars sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar,core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,sql/hive/target/spark-hive_2.11-3.0.0-SNAPSHOT.jar --packages org.spark-project.hive:hive-exec:1.2.1.spark2 sql/hive/target/spark-hive_2.11-3.0.0-SNAPSHOT-tests.jar

Generated results with:

SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "hive/test:runMain org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark"

sparkSession.conf.set(SQLConf.OBJECT_AGG_SORT_BASED_FALLBACK_THRESHOLD.key, "2")
df.groupBy($"id" / (N / 4) cast LongType).agg(percentile_approx($"id", 0.5)).collect()
/**
* Benchmark to measure read performance with Filter pushdown.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read performance with Filter pushdown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wangyum , fixed.

Change-Id: Ib9c6b80822013dce31c22baac2d6f6a5b9b730f2
@dongjoon-hyun
Copy link
Member

ok to test

val spark: SparkSession = TestHive.sparkSession

override def runBenchmarkSuite(): Unit = {
runBenchmark("Hive UDAF vs Spark AF") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @peter-toth . Thank you for making this PR.
Currently, runBenchmarkSuite is too long. Could you make a separate function for each test case? For example, ignore("Hive UDAF vs Spark AF") can be a single function. And runBenchmarkSuite will call a series of those functions.

@SparkQA
Copy link

SparkQA commented Oct 24, 2018

Test build #97940 has finished for PR 22804 at commit 2ed884b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 24, 2018

Test build #97972 has finished for PR 22804 at commit 37b40ae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @peter-toth .
Thank you for updating. I made a PR to you. Could you review and merge that?

Change minor stuff and update result
@peter-toth
Copy link
Contributor Author

Thanks @dongjoon-hyun for the fixes. Merged.

@SparkQA
Copy link

SparkQA commented Oct 25, 2018

Test build #98012 has finished for PR 22804 at commit 6849a87.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @peter-toth and @wangyum .
+1, LGTM.

Merged to master.

@asfgit asfgit closed this in ccd07b7 Oct 25, 2018
@peter-toth
Copy link
Contributor Author

Thanks @dongjoon-hyun , @wangyum for the review.

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

Refactor ObjectHashAggregateExecBenchmark to use main method

## How was this patch tested?

Manually tested:
```
bin/spark-submit --class org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark --jars sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar,core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,sql/hive/target/spark-hive_2.11-3.0.0-SNAPSHOT.jar --packages org.spark-project.hive:hive-exec:1.2.1.spark2 sql/hive/target/spark-hive_2.11-3.0.0-SNAPSHOT-tests.jar
```
Generated results with:
```
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "hive/test:runMain org.apache.spark.sql.execution.benchmark.ObjectHashAggregateExecBenchmark"
```

Closes apache#22804 from peter-toth/SPARK-25665.

Lead-authored-by: Peter Toth <peter.toth@gmail.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants