Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21092][SQL] Wire SQLConf in logical plan and expressions #18299

Closed
wants to merge 6 commits into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Jun 14, 2017

What changes were proposed in this pull request?

It is really painful to not have configs in logical plan and expressions. We had to add all sorts of hacks (e.g. pass SQLConf explicitly in functions). This patch exposes SQLConf in logical plan, using a thread local variable and a getter closure that's set once there is an active SparkSession.

The implementation is a bit of a hack, since we didn't anticipate this need in the beginning (config was only exposed in physical plan). The implementation is described in SQLConf.get.

In terms of future work, we should follow up to clean up CBO (remove the need for passing in config).

How was this patch tested?

Updated relevant tests for constraint propagation.

@rxin rxin changed the title Spark 21092 [SPARK-21092][SQL] Wire SQLConf in logical plan and expressions Jun 14, 2017
@rxin
Copy link
Contributor Author

rxin commented Jun 14, 2017

Note that this patch is based on #18298. Once we merge that one the diff will become smaller.

Update: #18298 was merged. This is ready.

@rxin
Copy link
Contributor Author

rxin commented Jun 14, 2017

cc @wzhfy for review / CBO

cc @sameeragarwal this is what we discussed.

@SparkQA
Copy link

SparkQA commented Jun 14, 2017

Test build #78036 has finished for PR 18299 at commit ced6e9b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 14, 2017

Test build #78038 has finished for PR 18299 at commit f059522.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wzhfy
Copy link
Contributor

wzhfy commented Jun 14, 2017

Since we have a thread local variable in SQLConf, can we just provide getter/setter methods, get it in QueryPlan and set it in SparkSession?

@rxin
Copy link
Contributor Author

rxin commented Jun 14, 2017

The issue is that SparkSession might change the way they are wired and it's not always the case that when we create a new thread, we set the thread local conf.

.union(constructIsNotNullConstraints(validConstraints))
.filter { c =>
c.references.nonEmpty && c.references.subsetOf(outputSet) && c.deterministic
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also bound their size as you suggested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That belongs in a separate patch.

@wzhfy
Copy link
Contributor

wzhfy commented Jun 14, 2017

Got it, thanks.

assert(aliasedRelation.analyze.constraints.nonEmpty)

SQLConf.get.setConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED, false)
assert(aliasedRelation.analyze.constraints.isEmpty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unset the config in the end

@wzhfy
Copy link
Contributor

wzhfy commented Jun 14, 2017

LGTM except one minor comment.

@SparkQA
Copy link

SparkQA commented Jun 14, 2017

Test build #78068 has finished for PR 18299 at commit ea09164.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78070 has finished for PR 18299 at commit 14f2b41.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78071 has finished for PR 18299 at commit a032106.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 15, 2017

Test build #78077 has finished for PR 18299 at commit cec78b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented Jun 15, 2017

Merging in master.

@asfgit asfgit closed this in fffeb6d Jun 15, 2017
@@ -87,6 +87,11 @@ class SparkSession private(

sparkContext.assertNotStopped()

// If there is no active SparkSession, uses the default SQL conf. Otherwise, use the session's.
SQLConf.setSQLConfGetter(() => {
SparkSession.getActiveSession.map(_.sessionState.conf).getOrElse(SQLConf.getFallbackConf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we move this to SparkSession.Builder.getOrCreate?

dataknocker pushed a commit to dataknocker/spark that referenced this pull request Jun 16, 2017
## What changes were proposed in this pull request?
It is really painful to not have configs in logical plan and expressions. We had to add all sorts of hacks (e.g. pass SQLConf explicitly in functions). This patch exposes SQLConf in logical plan, using a thread local variable and a getter closure that's set once there is an active SparkSession.

The implementation is a bit of a hack, since we didn't anticipate this need in the beginning (config was only exposed in physical plan). The implementation is described in `SQLConf.get`.

In terms of future work, we should follow up to clean up CBO (remove the need for passing in config).

## How was this patch tested?
Updated relevant tests for constraint propagation.

Author: Reynold Xin <rxin@databricks.com>

Closes apache#18299 from rxin/SPARK-21092.
wangyum pushed a commit to 1haodian/spark that referenced this pull request Jun 23, 2017
… conf in LogicalPlan

## What changes were proposed in this pull request?

After wiring `SQLConf` in logical plan ([PR 18299](apache#18299)), we can remove the need of passing `conf` into `def stats` and `def computeStats`.

## How was this patch tested?

Covered by existing tests, plus some modified existing tests.

Author: wangzhenhua <wangzhenhua@huawei.com>
Author: Zhenhua Wang <wzh_zju@163.com>

Closes apache#18391 from wzhfy/removeConf.
robert3005 pushed a commit to palantir/spark that referenced this pull request Jun 29, 2017
… conf in LogicalPlan

## What changes were proposed in this pull request?

After wiring `SQLConf` in logical plan ([PR 18299](apache#18299)), we can remove the need of passing `conf` into `def stats` and `def computeStats`.

## How was this patch tested?

Covered by existing tests, plus some modified existing tests.

Author: wangzhenhua <wangzhenhua@huawei.com>
Author: Zhenhua Wang <wzh_zju@163.com>

Closes apache#18391 from wzhfy/removeConf.
jzhuge pushed a commit to jzhuge/spark that referenced this pull request Aug 20, 2018
It is really painful to not have configs in logical plan and expressions. We had to add all sorts of hacks (e.g. pass SQLConf explicitly in functions). This patch exposes SQLConf in logical plan, using a thread local variable and a getter closure that's set once there is an active SparkSession.

The implementation is a bit of a hack, since we didn't anticipate this need in the beginning (config was only exposed in physical plan). The implementation is described in `SQLConf.get`.

In terms of future work, we should follow up to clean up CBO (remove the need for passing in config).

Updated relevant tests for constraint propagation.

Author: Reynold Xin <rxin@databricks.com>

Closes apache#18299 from rxin/SPARK-21092.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants