Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35332][SQL] Make cache plan disable configs configurable #32482

Closed
wants to merge 9 commits into from

Conversation

ulysses-you
Copy link
Contributor

What changes were proposed in this pull request?

Add a new config to make cache plan disable configs configurable.

Why are the changes needed?

The disable configs of cache plan if to avoid the perfermance regression, but not all the query will slow than before due to AQE or bucket scan enabled. It's useful to make a new config so that user can decide if some configs should be disabled during cache plan.

Does this PR introduce any user-facing change?

Yes, a new config.

How was this patch tested?

Add test.

@github-actions github-actions bot added the SQL label May 9, 2021
@SparkQA
Copy link

SparkQA commented May 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42825/

@SparkQA
Copy link

SparkQA commented May 9, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42825/

@SparkQA
Copy link

SparkQA commented May 9, 2021

Test build #138303 has finished for PR 32482 at commit f44e0f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 1095 to 1096
.doc("Configurations needs to be turned off, to avoid regression for cached query, so that " +
"the outputPartitioning of the underlying cached query plan can be leveraged later.")
Copy link
Member

@maropu maropu May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is for developers, so it is difficult for a user to understand it. Could you brush up it more?

.checkValue(_.forall(v => sqlConfEntries.containsKey(v) &&
sqlConfEntries.get(v).defaultValue.exists(_.isInstanceOf[Boolean])),
"config should be boolean type")
.createWithDefault(Seq(ADAPTIVE_EXECUTION_ENABLED.key, AUTO_BUCKETED_SCAN_ENABLED.key))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any usecase to turn off these rules separately? I think it's okay just to use a boolean flag for this though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I agree with starting with simpler config format if possible.

@@ -1554,4 +1554,39 @@ class CachedTableSuite extends QueryTest with SQLTestUtils
assert(!spark.catalog.isCached(viewName))
}
}

test("SPARK-35332: Make cache plan disable configs configurable") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests for more patterns? e.g.,

sql("""SET spark.sql.cache.disableConfigs=spark.sql.adaptive.enabled""")
sql("CACHE TABLE test_table1 AS <query 1>")
spark.table("test_table1").explain(true) <=  AQE disabled

sql("""SET spark.sql.cache.disableConfigs=""")
sql("CACHE TABLE test_table2 AS <query 2>")
spark.table("test_table2").explain(true) <=  AQE enabled
spark.table("test_table1").explain(true) <=  AQE disabled
sql("CACHE TABLE test_table3 AS <query 1>")
spark.table("test_table3").explain(true) <=  AQE disabled

@@ -1090,6 +1090,18 @@ object SQLConf {
.booleanConf
.createWithDefault(true)

val CACHE_DISABLE_CONFIGS =
buildConf("spark.sql.cache.disableConfigs")
.doc("Configurations needs to be turned off, to avoid regression for cached query, so that " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, internal?

.internal()

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we expose something to the users, we have to consider the side-effects when the users make mistake. What happen when they touch this config in a wrong way? For example, the case removing ADAPTIVE_EXECUTION_ENABLED from the conf. Is it okay and safe to expose this?

Also, cc @sunchao since this is related to the bucketed scan too.

}.getMessage
assert(msg.contains("config should be boolean type"))

Seq(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key, "").foreach { disableConfig =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we also test for auto bucketed scan? thanks.

.checkValue(_.forall(v => sqlConfEntries.containsKey(v) &&
sqlConfEntries.get(v).defaultValue.exists(_.isInstanceOf[Boolean])),
"config should be boolean type")
.createWithDefault(Seq(ADAPTIVE_EXECUTION_ENABLED.key, AUTO_BUCKETED_SCAN_ENABLED.key))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I agree with starting with simpler config format if possible.

@ulysses-you
Copy link
Contributor Author

Thank you @maropu @c21 @dongjoon-hyun .

Agree, the current config seems overkill to user, it's better to just make it as enabled.

Refactor this PR to address:

  • make the new config simple and improve the doc.
  • improve the test for two things, 1) more pattern with AQE test, 2) bucketed test

@@ -1175,7 +1175,7 @@ class CachedTableSuite extends QueryTest with SQLTestUtils
}

test("cache supports for intervals") {
withTable("interval_cache") {
withTable("interval_cache", "t1") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related this pr, but affected the new added test with t1.

@SparkQA
Copy link

SparkQA commented May 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42838/

@SparkQA
Copy link

SparkQA commented May 10, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42838/

@cloud-fan
Copy link
Contributor

cloud-fan commented May 10, 2021

In general, I think it's better to optimize the cached plan more aggressively for better performance, even though it may cause perf regression due to output partitioning change, which should be rare.

About the config name, how about spark.sql.optimizer.canChangeCachedPlanOutputPartitioning?

@c21
Copy link
Contributor

c21 commented May 10, 2021

cc @viirya as well as he was finding the bug and raised the concern for auto bucketed scan for cached query.

Comment on lines 1096 to 1097
.doc(s"When true, some configs are disabled during executing cache plan that is to avoid " +
s"performance regression if other queries hit the cached plan. Currently, the disabled " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't need s"".

*/
private[sql] def getOrCloneSessionWithConfigsOff(
session: SparkSession,
configurations: Seq[ConfigEntry[Boolean]]): SparkSession = {
val configsEnabled = configurations.filter(session.sessionState.conf.getConf(_))
if (configsEnabled.isEmpty) {
if (!session.sessionState.conf.getConf(SQLConf.CACHE_DISABLE_CONFIGS_ENABLED)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If getOrCloneSessionWithConfigsOff is used for disabling other configs? It is also automatically under control by CACHE_DISABLE_CONFIGS_ENABLED?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, it would introduce potential issue in future. Moved to CacheManager.

@@ -1069,21 +1069,26 @@ object SparkSession extends Logging {
}

/**
* Returns a cloned SparkSession with all specified configurations disabled, or
* the original SparkSession if all configurations are already disabled.
* When CACHE_DISABLE_CONFIGS_ENABLED is enabled, returns a cloned SparkSession with all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This getOrCloneSessionWithConfigsOff is not claimed to be only used for cache plan. It is a bit logically wrong to have a so called cache disable config here.

@viirya
Copy link
Member

viirya commented May 10, 2021

So it sounds like this wants to have an aggressive option when preparing cache plan, right? The config name can be refined actually. The first glance confuses me.

@SparkQA
Copy link

SparkQA commented May 10, 2021

Test build #138316 has finished for PR 32482 at commit 7625677.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42869/

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42869/

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42871/

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42871/

@SparkQA
Copy link

SparkQA commented May 11, 2021

Test build #138349 has finished for PR 32482 at commit 30b0572.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42877/

sql("CACHE TABLE t1 as SELECT /*+ REPARTITION */ * FROM values(1) as t(c)")
assert(spark.table("t1").rdd.partitions.length == 2)

sql(s"SET ${SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key} = true")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's turn each of these SET ... to withSQLConf

@SparkQA
Copy link

SparkQA commented May 11, 2021

Test build #138355 has finished for PR 32482 at commit 708bb0c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42894/

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42894/

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42896/

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42896/

@SparkQA
Copy link

SparkQA commented May 11, 2021

Test build #138371 has finished for PR 32482 at commit a83e9cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 11, 2021

Test build #138373 has finished for PR 32482 at commit 515aeba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine otherwise.

val CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING =
buildConf("spark.sql.optimizer.canChangeCachedPlanOutputPartitioning")
.internal()
.doc(s"When false, some configs are disabled during executing cache plan that is to avoid " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

Whether to forcibly enable some optimization rules that can change the output partitioning
of a cached query when executing it for caching. If it is set to true, 
queries may need an extra shuffle to read the cached data. This configuration is disabled by default. 
Currently, the optimization rules enabled by this configuration are
${ADAPTIVE_EXECUTION_ENABLED.key}  and ${AUTO_BUCKETED_SCAN_ENABLED.key}.

SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") {

withTempView("t1", "t2", "t3") {
withCache("t1", "t2", "t3") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withTempView drops caches, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, removed it.

@SparkQA
Copy link

SparkQA commented May 12, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42970/

@SparkQA
Copy link

SparkQA commented May 12, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42970/

@SparkQA
Copy link

SparkQA commented May 12, 2021

Test build #138449 has finished for PR 32482 at commit 16b8a22.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

sql("CACHE TABLE t2 as SELECT /*+ REPARTITION */ * FROM values(2) as t(c)")
assert(spark.table("t2").rdd.partitions.length == 1)

withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key -> "false") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this one nested?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm expecting something like

withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key -> "false") {
  test1
}
withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key -> "true") {
  test2
}
withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key -> "false") {
  test3
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it.

sql("CACHE TABLE t2")
assert(spark.table("t2").rdd.partitions.length == 1)

withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key -> "false") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@SparkQA
Copy link

SparkQA commented May 13, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43024/

@SparkQA
Copy link

SparkQA commented May 13, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43024/

@SparkQA
Copy link

SparkQA commented May 13, 2021

Test build #138504 has finished for PR 32482 at commit 2e8492b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 6f63057 May 13, 2021
@@ -328,4 +326,15 @@ class CacheManager extends Logging with AdaptiveSparkPlanHelper {
if (needToRefresh) fileIndex.refresh()
needToRefresh
}

/**
* If CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING is disabled, just return original session.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING is disabled" -> do we mean enabled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot to fix the comment, create followup #32543.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with one minor comment.

@ulysses-you ulysses-you deleted the SPARK-35332 branch May 14, 2021 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
8 participants