Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-15075][SPARK-15345][SQL] Clean up SparkSession builder and propagate config options to existing sessions if specified #13200

Closed
wants to merge 11 commits into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented May 19, 2016

What changes were proposed in this pull request?

Currently SparkSession.Builder use SQLContext.getOrCreate. It should probably the the other way around, i.e. all the core logic goes in SparkSession, and SQLContext just calls that. This patch does that.

This patch also makes sure config options specified in the builder are propagated to the existing (and of course the new) SparkSession.

How was this patch tested?

Updated tests to reflect the change, and also introduced a new SparkSessionBuilderSuite that should cover all the branches.

@rxin rxin changed the title [SPARK-15075][SQL] Cleanup dependencies between SQLContext and SparkS… [SPARK-15075][SQL] Cleanup dependencies between SQLContext and SparkSession May 19, 2016
@SparkQA
Copy link

SparkQA commented May 19, 2016

Test build #58899 has finished for PR 13200 at commit b351024.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin rxin changed the title [SPARK-15075][SQL] Cleanup dependencies between SQLContext and SparkSession [SPARK-15075][SQL] Clean up dependencies between SQLContext and SparkSession May 19, 2016
@SparkQA
Copy link

SparkQA commented May 19, 2016

Test build #58907 has finished for PR 13200 at commit 55ef850.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin rxin changed the title [SPARK-15075][SQL] Clean up dependencies between SQLContext and SparkSession [SPARK-15075][SQL] Clean up SparkSession builder and propagate config options to existing sessions if specified May 19, 2016
sparkContext.addSparkListener(new SparkListener {
override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = {
defaultSession.set(null)
// TODO(rxin): Do we need to also clear SQL listener?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @zsxwing any idea?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually @zsxwing might be good for you to review the entire pr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to clear it. Otherwise, after stopping the SparkContext, we leak it in object SQLContext.

@rxin
Copy link
Contributor Author

rxin commented May 19, 2016

cc @andrewor14 too who wrote some of this

and cc @zjffdu since you were working on #13160

@rxin rxin changed the title [SPARK-15075][SQL] Clean up SparkSession builder and propagate config options to existing sessions if specified [SPARK-15075][SPARK-15345][SQL] Clean up SparkSession builder and propagate config options to existing sessions if specified May 19, 2016
@SparkQA
Copy link

SparkQA commented May 19, 2016

Test build #58911 has finished for PR 13200 at commit c665771.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented May 20, 2016

OK I've pushed a commit to handle sql listener properly.

@SparkQA
Copy link

SparkQA commented May 20, 2016

Test build #58913 has finished for PR 13200 at commit 526896f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

We should fix session.py to use the new scala thing as well. Also tests are failing because python is still trying to call wrapped.

@rxin
Copy link
Contributor Author

rxin commented May 20, 2016

That's been updated?

@andrewor14
Copy link
Contributor

Ah, yes.

@andrewor14
Copy link
Contributor

but not the builder, which still uses SQLContext.getOrCreate: https://github.com/rxin/spark/blob/c475453c59e6f795ea9283271fe83b35ef71bee6/python/pyspark/sql/session.py#L147

@SparkQA
Copy link

SparkQA commented May 20, 2016

Test build #58918 has finished for PR 13200 at commit af481d6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 20, 2016

Test build #58919 has finished for PR 13200 at commit 918b47b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented May 20, 2016

I updated Python docs. The Python change seems slightly larger and since it is not user facing, I'm going to defer it to another pr.

@SparkQA
Copy link

SparkQA commented May 20, 2016

Test build #58921 has finished for PR 13200 at commit c475453.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 20, 2016

Test build #58931 has finished for PR 13200 at commit 4173a72.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented May 20, 2016

@marmbrus i know you were looking at this. Did you end up going through it?

@marmbrus
Copy link
Contributor

LGTM

@rxin
Copy link
Contributor Author

rxin commented May 20, 2016

Thanks - merging in master/2.0.

@asfgit asfgit closed this in f2ee0ed May 20, 2016
asfgit pushed a commit that referenced this pull request May 20, 2016
…pagate config options to existing sessions if specified

## What changes were proposed in this pull request?
Currently SparkSession.Builder use SQLContext.getOrCreate. It should probably the the other way around, i.e. all the core logic goes in SparkSession, and SQLContext just calls that. This patch does that.

This patch also makes sure config options specified in the builder are propagated to the existing (and of course the new) SparkSession.

## How was this patch tested?
Updated tests to reflect the change, and also introduced a new SparkSessionBuilderSuite that should cover all the branches.

Author: Reynold Xin <rxin@databricks.com>

Closes #13200 from rxin/SPARK-15075.

(cherry picked from commit f2ee0ed)
Signed-off-by: Reynold Xin <rxin@databricks.com>
}
session = new SparkSession(sparkContext)
options.foreach { case (k, v) => session.conf.set(k, v) }
defaultSession.set(session)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin Hi Reynold, i had a minor question just for my understanding. When users do a
new SQLContext() , we create a implicit SparkSession. Should this session be made
the defaultSession ? If we call, 1) new SQLContext 2) builder.getOrCreate() then whats the expected behaviour ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would create a new one in that case ...

I'm not too worried about the legacy corner cases here though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin Ok. Got it. Thank you.

@SparkQA
Copy link

SparkQA commented May 20, 2016

Test build #58939 has finished for PR 13200 at commit e4a4bc1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ConsoleSink(options: Map[String, String]) extends Sink with Logging
    • class ConsoleSinkProvider extends StreamSinkProvider with DataSourceRegister

asfgit pushed a commit that referenced this pull request May 25, 2016
…verriding confs of existing sessions

## What changes were proposed in this pull request?

This fixes the python SparkSession builder to allow setting confs correctly. This was a leftover TODO from #13200.

## How was this patch tested?

Python doc tests.

cc andrewor14

Author: Eric Liang <ekl@databricks.com>

Closes #13289 from ericl/spark-15520.

(cherry picked from commit 8239fdc)
Signed-off-by: Andrew Or <andrew@databricks.com>
asfgit pushed a commit that referenced this pull request May 25, 2016
…verriding confs of existing sessions

## What changes were proposed in this pull request?

This fixes the python SparkSession builder to allow setting confs correctly. This was a leftover TODO from #13200.

## How was this patch tested?

Python doc tests.

cc andrewor14

Author: Eric Liang <ekl@databricks.com>

Closes #13289 from ericl/spark-15520.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants