Skip to content

Conversation

RussellSpitzer
Copy link
Member

What changes were proposed in this pull request?

Previously Pyspark used the private constructor for SparkSession when
building that object. This resulted in a SparkSession without checking
the sql.extensions parameter for additional session extensions. To fix
this we instead use the Session.builder() path as SparkR uses, this
loads the extensions and allows their use in PySpark.

How was this patch tested?

This was manually tested by passing a class to spark.sql.extensions and making sure it's included strategies appeared in the spark._jsparkSession.sessionState.planner.strategies list. We could add a automatic test but i'm not very familiar with the Pyspark Testing framework. But I would be glad to implement that if requested.

@SparkQA
Copy link

SparkQA commented Aug 3, 2018

Test build #94144 has finished for PR 21988 at commit 9c6e067.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Previously Pyspark used the private constructor for SparkSession when
building that object. This resulted in a SparkSession without checking
the sql.extensions parameter for additional session extensions. To fix
this we instead use the Session.builder() path as SparkR uses, this
loads the extensions and allows their use in PySpark.
@RussellSpitzer
Copy link
Member Author

Local PEP didn't seem to mind this code ... Fixed up the indentation so hopefully jenkins will like it now

@SparkQA
Copy link

SparkQA commented Aug 3, 2018

Test build #94153 has finished for PR 21988 at commit 64bcb6f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Aug 4, 2018

@RussellSpitzer, btw mind if I ask to put [BRANCH-2.2] inthe PR title to just make it less confusing?

@viirya
Copy link
Member

viirya commented Aug 4, 2018

And put [BRANCH-2.3] into the title of the PR #21989 too?

@felixcheung
Copy link
Member

why do we need multiple PRs? typically we do that only then the change is non-trivial or cannot be backported by cherry-picking.

@RussellSpitzer RussellSpitzer changed the title [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark [SPARK-25003][PYSPARK][BRANCH-2.2] Use SessionExtensions in Pyspark Aug 6, 2018
@RussellSpitzer
Copy link
Member Author

@felixcheung I just didn't know what version to target so I made a a PR for each one. We can just close the ones that shouldn't be merged.

@HyukjinKwon
Copy link
Member

Yea, let's just close except the master one.

@felixcheung
Copy link
Member

we always open against master and backport if agreed upon.
this is documented here https://spark.apache.org/contributing.html

@RussellSpitzer RussellSpitzer deleted the SPARK-25003-branch-2.2 branch August 7, 2018 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants