Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19157][SQL] should be able to change spark.sql.runSQLOnFiles at runtime #16531

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

The analyzer rule that supports to query files directly will be added to Analyzer.extendedResolutionRules when SparkSession is created, according to the spark.sql.runSQLOnFiles flag. If the flag is off when we create SparkSession, this rule is not added and we can not query files directly even we turn on the flag later.

This PR fixes this bug by always adding that rule to Analyzer.extendedResolutionRules.

How was this patch tested?

new regression test

@cloud-fan
Copy link
Contributor Author

cc @gatorsmile

@@ -45,7 +45,7 @@ import org.apache.spark.unsafe.types.UTF8String
* Replaces generic operations with specific variants that are designed to work with Spark
* SQL Data Sources.
*/
case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at other places in the diff and not clear why you changed this to NOT be a case class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same question. What is the reason why we made this change?

@SparkQA
Copy link

SparkQA commented Jan 10, 2017

Test build #71136 has finished for PR 16531 at commit 963b66f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan]

@@ -116,8 +116,8 @@ private[sql] class SessionState(sparkSession: SparkSession) {
AnalyzeCreateTable(sparkSession) ::
PreprocessTableInsertion(conf) ::
new FindDataSourceTable(sparkSession) ::
DataSourceAnalysis(conf) ::
(if (conf.runSQLonFile) new ResolveDataSource(sparkSession) :: Nil else Nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
This was the root cause why we were unable to change the conf at runtime.

@gatorsmile
Copy link
Member

LGTM except the question

@gatorsmile
Copy link
Member

LGTM pending test

@SparkQA
Copy link

SparkQA commented Jan 11, 2017

Test build #71174 has finished for PR 16531 at commit 20b2d95.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in 3b19c74 Jan 11, 2017
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…t runtime

## What changes were proposed in this pull request?

The analyzer rule that supports to query files directly will be added to `Analyzer.extendedResolutionRules` when SparkSession is created, according to the `spark.sql.runSQLOnFiles` flag. If the flag is off when we create `SparkSession`, this rule is not added and we can not query files directly even we turn on the flag later.

This PR fixes this bug by always adding that rule to `Analyzer.extendedResolutionRules`.

## How was this patch tested?

new regression test

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#16531 from cloud-fan/sql-on-files.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…t runtime

## What changes were proposed in this pull request?

The analyzer rule that supports to query files directly will be added to `Analyzer.extendedResolutionRules` when SparkSession is created, according to the `spark.sql.runSQLOnFiles` flag. If the flag is off when we create `SparkSession`, this rule is not added and we can not query files directly even we turn on the flag later.

This PR fixes this bug by always adding that rule to `Analyzer.extendedResolutionRules`.

## How was this patch tested?

new regression test

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#16531 from cloud-fan/sql-on-files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants