Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25510][TEST] Create new trait replace BenchmarkWithCodegen #22522

Closed
wants to merge 3 commits into from
Closed

[SPARK-25510][TEST] Create new trait replace BenchmarkWithCodegen #22522

wants to merge 3 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Sep 21, 2018

What changes were proposed in this pull request?

We need create a new trait to replace BenchmarkWithCodegen as BenchmarkWithCodegen extends from SparkFunSuite.

For example. when doing AggregateBenchmark refactor.
Before this change:

object AggregateBenchmark extends BenchmarkBase {

  lazy val sparkSession = SparkSession.builder
    .master("local[1]")
    .appName(this.getClass.getSimpleName)
    .config("spark.sql.shuffle.partitions", 1)
    .config("spark.sql.autoBroadcastJoinThreshold", 1)
    .getOrCreate()

  /** Runs function `f` with whole stage codegen on and off. */
  def runBenchmark(name: String, cardinality: Long)(f: => Unit): Unit = {
    val benchmark = new Benchmark(name, cardinality, output = output)

    benchmark.addCase(s"$name wholestage off", numIters = 2) { iter =>
      sparkSession.conf.set("spark.sql.codegen.wholeStage", value = false)
      f
    }

    benchmark.addCase(s"$name wholestage on", numIters = 5) { iter =>
      sparkSession.conf.set("spark.sql.codegen.wholeStage", value = true)
      f
    }

    benchmark.run()
  }

  override def benchmark(): Unit = {
    runBenchmark("aggregate without grouping") {
      val N = 500L << 22
      runBenchmark("agg w/o group", N) {
        sparkSession.range(N).selectExpr("sum(id)").collect()
      }
    }
...

After this change:

object AggregateBenchmark extends SqlBasedBenchmark {

  override def benchmark(): Unit = {
    runBenchmark("aggregate without grouping") {
      val N = 500L << 22
      runBenchmark("agg w/o group", N) {
        sparkSession.range(N).selectExpr("sum(id)").collect()
      }
    }
...

All these benchmarks will use this trait:

AggregateBenchmark
BenchmarkWideTable
JoinBenchmark
MiscBenchmark
ObjectHashAggregateExecBenchmark
SortBenchmark
UnsafeArrayDataBenchmark

How was this patch tested?

manual tests

@wangyum
Copy link
Member Author

wangyum commented Sep 21, 2018

@SparkQA
Copy link

SparkQA commented Sep 21, 2018

Test build #96452 has finished for PR 22522 at commit 275cc6c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* Common base trait for micro benchmarks that are supposed to run standalone (i.e. not together
* with other benchmarks).
*/
private[benchmark] trait RunBenchmarkWithCodegen {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall this extend BenchmarkBase?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove private[benchmark] to be consistent with other benchmark classes.

Copy link
Member Author

@wangyum wangyum Sep 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extends BenchmarkBase and add getSparkSession function thus subclass can build their own SparkSession:

trait RunBenchmarkWithCodegen extends BenchmarkBase {
val spark: SparkSession = getSparkSession
/** Subclass can override this function to build their own SparkSession */
def getSparkSession: SparkSession = {
SparkSession.builder()
.master("local[1]")
.appName(this.getClass.getCanonicalName)
.config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
.config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
.getOrCreate()
}
/** Runs function `f` with whole stage codegen on and off. */
def runBenchmark(name: String, cardinality: Long)(f: => Unit): Unit = {
val benchmark = new Benchmark(name, cardinality, output = output)
benchmark.addCase(s"$name wholestage off", numIters = 2) { iter =>
spark.sqlContext.conf.setConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED, value = false)
f
}
benchmark.addCase(s"$name wholestage on", numIters = 5) { iter =>
spark.sqlContext.conf.setConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED, value = true)
f
}
benchmark.run()
}
}

@cloud-fan
Copy link
Contributor

I think this change is necessary, but I'd like to migrate one benchmark to use this new trait as an example. We can migrate others in follow up PRs.

@wangyum
Copy link
Member Author

wangyum commented Sep 22, 2018

Thanks @cloud-fan I have migrate AggregateBenchmark to use new trait.

@gengliangwang
Copy link
Member

@wangyum I have left my comment in #22484 .
Also, should we close this one and move to #22484 ?

@wangyum wangyum closed this Sep 22, 2018
@wangyum wangyum reopened this Sep 29, 2018
@wangyum
Copy link
Member Author

wangyum commented Sep 29, 2018

cc @dongjoon-hyun

@SparkQA
Copy link

SparkQA commented Sep 29, 2018

Test build #96783 has finished for PR 22522 at commit 20668ad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper

@wangyum wangyum closed this Oct 1, 2018
@wangyum wangyum deleted the SPARK-25510 branch October 1, 2018 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants