[SPARK-25510][TEST] Create new trait replace BenchmarkWithCodegen #22522

wangyum · 2018-09-21T17:53:02Z

What changes were proposed in this pull request?

We need create a new trait to replace BenchmarkWithCodegen as BenchmarkWithCodegen extends from SparkFunSuite.

For example. when doing AggregateBenchmark refactor.
Before this change:

object AggregateBenchmark extends BenchmarkBase {

  lazy val sparkSession = SparkSession.builder
    .master("local[1]")
    .appName(this.getClass.getSimpleName)
    .config("spark.sql.shuffle.partitions", 1)
    .config("spark.sql.autoBroadcastJoinThreshold", 1)
    .getOrCreate()

  /** Runs function `f` with whole stage codegen on and off. */
  def runBenchmark(name: String, cardinality: Long)(f: => Unit): Unit = {
    val benchmark = new Benchmark(name, cardinality, output = output)

    benchmark.addCase(s"$name wholestage off", numIters = 2) { iter =>
      sparkSession.conf.set("spark.sql.codegen.wholeStage", value = false)
      f
    }

    benchmark.addCase(s"$name wholestage on", numIters = 5) { iter =>
      sparkSession.conf.set("spark.sql.codegen.wholeStage", value = true)
      f
    }

    benchmark.run()
  }

  override def benchmark(): Unit = {
    runBenchmark("aggregate without grouping") {
      val N = 500L << 22
      runBenchmark("agg w/o group", N) {
        sparkSession.range(N).selectExpr("sum(id)").collect()
      }
    }
...

After this change:

object AggregateBenchmark extends SqlBasedBenchmark {

  override def benchmark(): Unit = {
    runBenchmark("aggregate without grouping") {
      val N = 500L << 22
      runBenchmark("agg w/o group", N) {
        sparkSession.range(N).selectExpr("sum(id)").collect()
      }
    }
...

All these benchmarks will use this trait:

AggregateBenchmark
BenchmarkWideTable
JoinBenchmark
MiscBenchmark
ObjectHashAggregateExecBenchmark
SortBenchmark
UnsafeArrayDataBenchmark

How was this patch tested?

manual tests

wangyum · 2018-09-21T17:58:55Z

cc @cloud-fan @gengliangwang @dongjoon-hyun

SparkQA · 2018-09-21T22:00:46Z

Test build #96452 has finished for PR 22522 at commit 275cc6c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-09-22T13:44:44Z

sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/RunBenchmarkWithCodegen.scala

+ * Common base trait for micro benchmarks that are supposed to run standalone (i.e. not together
+ * with other benchmarks).
+ */
+private[benchmark] trait RunBenchmarkWithCodegen {


shall this extend BenchmarkBase?

I'd remove private[benchmark] to be consistent with other benchmark classes.

extends BenchmarkBase and add getSparkSession function thus subclass can build their own SparkSession:

spark/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/RunBenchmarkWithCodegen.scala

Lines 28 to 58 in 42230b6

trait RunBenchmarkWithCodegen extends BenchmarkBase {

val spark: SparkSession = getSparkSession

/** Subclass can override this function to build their own SparkSession */

def getSparkSession: SparkSession = {

SparkSession.builder()

.master("local[1]")

.appName(this.getClass.getCanonicalName)

.config(SQLConf.SHUFFLE_PARTITIONS.key, 1)

.config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)

.getOrCreate()

}

/** Runs function `f` with whole stage codegen on and off. */

def runBenchmark(name: String, cardinality: Long)(f: => Unit): Unit = {

val benchmark = new Benchmark(name, cardinality, output = output)

benchmark.addCase(s"$name wholestage off", numIters = 2) { iter =>

spark.sqlContext.conf.setConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED, value = false)

f

}

benchmark.addCase(s"$name wholestage on", numIters = 5) { iter =>

spark.sqlContext.conf.setConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED, value = true)

f

}

benchmark.run()

}

}

cloud-fan · 2018-09-22T13:46:02Z

I think this change is necessary, but I'd like to migrate one benchmark to use this new trait as an example. We can migrate others in follow up PRs.

wangyum · 2018-09-22T15:42:27Z

Thanks @cloud-fan I have migrate AggregateBenchmark to use new trait.

gengliangwang · 2018-09-22T16:36:10Z

@wangyum I have left my comment in #22484 .
Also, should we close this one and move to #22484 ?

wangyum · 2018-09-29T00:50:33Z

cc @dongjoon-hyun

SparkQA · 2018-09-29T04:37:13Z

Test build #96783 has finished for PR 22522 at commit 20668ad.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper

Create new BenchmarkWithCodegen trait doesn't extends SparkFunSuite

275cc6c

cloud-fan reviewed Sep 22, 2018

View reviewed changes

wangyum closed this Sep 22, 2018

wangyum added 2 commits September 29, 2018 08:26

Merge remote-tracking branch 'upstream/master' into SPARK-25510

c62a3be

Rename RunBenchmarkWithCodegen to SqlBasedBenchmark

20668ad

wangyum reopened this Sep 29, 2018

gengliangwang mentioned this pull request Sep 29, 2018

[SPARK-25476][SPARK-25510][TEST] Refactor AggregateBenchmark and add a new trait to better support Dataset and DataFrame API #22484

Closed

wangyum closed this Oct 1, 2018

wangyum deleted the SPARK-25510 branch October 1, 2018 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-25510][TEST] Create new trait replace BenchmarkWithCodegen #22522

[SPARK-25510][TEST] Create new trait replace BenchmarkWithCodegen #22522

wangyum commented Sep 21, 2018 •

edited

wangyum commented Sep 21, 2018

SparkQA commented Sep 21, 2018

cloud-fan Sep 22, 2018

cloud-fan Sep 22, 2018

wangyum Sep 22, 2018 •

edited

cloud-fan commented Sep 22, 2018

wangyum commented Sep 22, 2018

gengliangwang commented Sep 22, 2018

wangyum commented Sep 29, 2018

SparkQA commented Sep 29, 2018

	trait RunBenchmarkWithCodegen extends BenchmarkBase {

	val spark: SparkSession = getSparkSession

	/** Subclass can override this function to build their own SparkSession */
	def getSparkSession: SparkSession = {
	SparkSession.builder()
	.master("local[1]")
	.appName(this.getClass.getCanonicalName)
	.config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
	.config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
	.getOrCreate()
	}

	/** Runs function `f` with whole stage codegen on and off. */
	def runBenchmark(name: String, cardinality: Long)(f: => Unit): Unit = {
	val benchmark = new Benchmark(name, cardinality, output = output)

	benchmark.addCase(s"$name wholestage off", numIters = 2) { iter =>
	spark.sqlContext.conf.setConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED, value = false)
	f
	}

	benchmark.addCase(s"$name wholestage on", numIters = 5) { iter =>
	spark.sqlContext.conf.setConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED, value = true)
	f
	}

	benchmark.run()
	}
	}

[SPARK-25510][TEST] Create new trait replace BenchmarkWithCodegen #22522

[SPARK-25510][TEST] Create new trait replace BenchmarkWithCodegen #22522

Conversation

wangyum commented Sep 21, 2018 • edited

What changes were proposed in this pull request?

How was this patch tested?

wangyum commented Sep 21, 2018

SparkQA commented Sep 21, 2018

cloud-fan Sep 22, 2018

Choose a reason for hiding this comment

cloud-fan Sep 22, 2018

Choose a reason for hiding this comment

wangyum Sep 22, 2018 • edited

Choose a reason for hiding this comment

cloud-fan commented Sep 22, 2018

wangyum commented Sep 22, 2018

gengliangwang commented Sep 22, 2018

wangyum commented Sep 29, 2018

SparkQA commented Sep 29, 2018

wangyum commented Sep 21, 2018 •

edited

wangyum Sep 22, 2018 •

edited