Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25658][SQL][TEST] Refactor HashByteArrayBenchmark to use main method #22652

Closed
wants to merge 6 commits into from
Closed

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Oct 6, 2018

What changes were proposed in this pull request?

Refactor HashByteArrayBenchmark to use main method.

  1. use spark-submit:
bin/spark-submit --class  org.apache.spark.sql.HashByteArrayBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar
  1. Generate benchmark result:
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "catalyst/test:runMain org.apache.spark.sql.HashByteArrayBenchmark"

How was this patch tested?

manual tests

@SparkQA
Copy link

SparkQA commented Oct 6, 2018

Test build #97040 has finished for PR 22652 at commit 3e6a058.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Oct 6, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Oct 6, 2018

Test build #97043 has finished for PR 22652 at commit 3e6a058.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Oct 6, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Oct 6, 2018

Test build #97047 has finished for PR 22652 at commit 3e6a058.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* {{{
* 1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
* 2. build/sbt "sql/test:runMain <this class>"
* 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sql/test -> catalyst/test.
If we use sql/test, the result will be generated in sql module instead of catalyst module.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for both line 32 and 33.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please check line 31, too.

@dongjoon-hyun
Copy link
Member

Could you review and merge wangyum#17 ?

@SparkQA
Copy link

SparkQA commented Oct 6, 2018

Test build #97059 has finished for PR 22652 at commit bdb0549.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 6, 2018

Test build #97060 has finished for PR 22652 at commit b5190d4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 6, 2018

Test build #97066 has finished for PR 22652 at commit cc268ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

import org.apache.spark.sql.catalyst.expressions.{HiveHasher, XXH64}
import org.apache.spark.unsafe.Platform
import org.apache.spark.unsafe.hash.Murmur3_x86_32

/**
* Synthetic benchmark for MurMurHash 3 and xxHash64.
* To run this benchmark:
* {{{
* 1. without sbt: bin/spark-submit --class <this class> <spark catalyst test jar>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a correct guide? BenchmarkBase is in a different jar file, isn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we missed this because we thought this is a legacy guide which has been worked before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right:

LM-SHC-16502798:spark yumwang$ bin/spark-submit --class  org.apache.spark.sql.HashByteArrayBenchmark ./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar18/10/07 07:35:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/benchmark/BenchmarkBase
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
......

The correct usage should be:

bin/spark-submit --class  org.apache.spark.sql.HashByteArrayBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar

@SparkQA
Copy link

SparkQA commented Oct 7, 2018

Test build #97075 has finished for PR 22652 at commit 0a7741a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 7, 2018

Test build #97074 has finished for PR 22652 at commit 11f2bbe.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Oct 7, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Oct 7, 2018

Test build #97077 has finished for PR 22652 at commit 0a7741a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @wangyum .

+1, LGTM. Merged to master.

@asfgit asfgit closed this in b1328cc Oct 7, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…method

## What changes were proposed in this pull request?

Refactor `HashByteArrayBenchmark` to use main method.
1. use `spark-submit`:
```console
bin/spark-submit --class  org.apache.spark.sql.HashByteArrayBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar
```

2. Generate benchmark result:
```console
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "catalyst/test:runMain org.apache.spark.sql.HashByteArrayBenchmark"
```

## How was this patch tested?

manual tests

Closes apache#22652 from wangyum/SPARK-25658.

Lead-authored-by: Yuming Wang <wgyumg@gmail.com>
Co-authored-by: Yuming Wang <yumwang@ebay.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants