Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26277][SQL][TEST] WholeStageCodegen metrics should be tested with whole-stage codegen enabled #23224

Closed
wants to merge 4 commits into from

Conversation

seancxmao
Copy link
Contributor

What changes were proposed in this pull request?

In org.apache.spark.sql.execution.metric.SQLMetricsSuite, there's a test case named "WholeStageCodegen metrics". However, it is executed with whole-stage codegen disabled. This PR fixes this by enable whole-stage codegen for this test case.

How was this patch tested?

Tested locally using exiting test cases.

@SparkQA
Copy link

SparkQA commented Dec 5, 2018

Test build #99699 has finished for PR 23224 at commit 021728c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@heary-cao
Copy link
Contributor

retest this please

@HyukjinKwon
Copy link
Member

Can we file a JIRA? I think it's not minor.

@seancxmao seancxmao changed the title [MINOR][SQL][TEST] WholeStageCodegen metrics should be tested with whole-stage codegen enabled [SPARK-26277][SQL][TEST] WholeStageCodegen metrics should be tested with whole-stage codegen enabled Dec 5, 2018
@seancxmao
Copy link
Contributor Author

@HyukjinKwon Thank you for your comments! I have filed a JIRA and updated the PR title accordingly.

@SparkQA
Copy link

SparkQA commented Dec 5, 2018

Test build #99711 has finished for PR 23224 at commit 021728c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to check whole-stage codegen enabled inside the test?

@seancxmao
Copy link
Contributor Author

@felixcheung Yes, that makes sense. I have added a commit to check that.

val df = spark.range(10).filter('id < 5).toDF()
testSparkPlanMetrics(df, 1, Map.empty, true)
df.queryExecution.executedPlan.find(_.isInstanceOf[WholeStageCodegenExec])
.getOrElse(assert(false))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems test Sort metric also has similar issue:

test("Sort metrics") {
  // Assume the execution plan is
  // WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
  val ds = spark.range(10).sort('id)
  testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @viirya. Very good suggestions.

After investigation, besides whole-stage codegen related issue, I found another issue. #20560/SPARK-23375 introduced an optimizer rule to eliminate redundant Sort. For a test case named "Sort metrics" in SQLMetricsSuite, because range is already sorted, sort is removed by the RemoveRedundantSorts, which makes the test case meaningless. This seems to be a pretty different issue, so I opened a new PR. See #23258 for details.

@SparkQA
Copy link

SparkQA commented Dec 7, 2018

Test build #99829 has finished for PR 23224 at commit 3da2279.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 8, 2018

Test build #99858 has finished for PR 23224 at commit 71e569b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@heary-cao
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Dec 8, 2018

Test build #99869 has finished for PR 23224 at commit 71e569b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

LGTM

@seancxmao
Copy link
Contributor Author

Because #23258 has got merged, so I made changes to use testSparkPlanMetricsWithPredicates introduced in #23258 to check WholeStageCodegen metrics in a more reasonable way.

@srowen @mgaido91 Would you please take a look at this if you have time?

@SparkQA
Copy link

SparkQA commented Dec 31, 2018

Test build #100596 has finished for PR 23224 at commit fd7c63b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Dec 31, 2018

LGTM

Copy link
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, otherwise LGTM

*/
protected def testSparkPlanMetrics(
df: DataFrame,
expectedNumOfJobs: Int,
expectedMetrics: Map[Long, (String, Map[String, Any])]): Unit = {
expectedMetrics: Map[Long, (String, Map[String, Any])],
enableWholeStage: Boolean = false): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? IIUC it is never set to true...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's set to true in the new test that was introduced

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no testSparkPlanMetricsWithPredicates is used there, not testSparkPlanMetrics. So for testSparkPlanMetrics it is never used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes good point. Nothing calls this with a different value here. Yeah this should just pass false, not take a new arg?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not take any new value here and should pass nothing to testSparkPlanMetricsWithPredicates, as false is the default value there. So basically, no change in the testSparkPlanMetrics method is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently testSparkPlanMetrics is always used with whole-stage codegen disabled. I'd argue It's possible we want to use testSparkPlanMetrics when whole-stage codegen is enabled, just like testSparkPlanMetricsWithPredicates. Do you mean we should limit change scope as possible as we can? Or shall we do something for the future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we should do only changes which are strictly needed. If in the future we will need this, we will add the flag. But until then, we shouldn't add something which is not needed. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I have removed the unused flag from testSparkPlanMetrics in the new commit.

@SparkQA
Copy link

SparkQA commented Jan 2, 2019

Test build #100642 has finished for PR 23224 at commit 00284b6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor

mgaido91 commented Jan 2, 2019

LGTM, thanks

@srowen
Copy link
Member

srowen commented Jan 2, 2019

Merged to master

@srowen srowen closed this in d406548 Jan 2, 2019
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…ith whole-stage codegen enabled

## What changes were proposed in this pull request?
In `org.apache.spark.sql.execution.metric.SQLMetricsSuite`, there's a test case named "WholeStageCodegen metrics". However, it is executed with whole-stage codegen disabled. This PR fixes this by enable whole-stage codegen for this test case.

## How was this patch tested?
Tested locally using exiting test cases.

Closes apache#23224 from seancxmao/codegen-metrics.

Authored-by: seancxmao <seancxmao@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants