[SPARK-55055][PYTHON] Support SparkSession.Builder.create for PySpark Classic #53820

jonmio · 2026-01-15T20:22:48Z

What changes were proposed in this pull request?

Allow users to call SparkSessionBuilder.create from PySpark in Classic Mode so that they can a new session with the provided builder configs without mutating configurations in any existing sessions.

Why are the changes needed?

Currently, create is not supported in PySpark classic. Users can call getOrCreate to obtain a session with the provided configurations, but if a session already exists, getOrCreate will mutate the existing configurations in the session. We would like to provide an API for users which guarantees creation of a session and does not have side effects on existing sessions.

This change also unifies the API between Classic and Connect mode since create is already supported in Connect.

Does this PR introduce any user-facing change?

Previously calling SparkSessionBuilder.create() would throw an exception saying that create is only supported in Connect mode.

After this change, SparkSessionBuilder.create() will create a new SparkSession with the specified builder configurations.

How was this patch tested?

Added the new test class SparkSessionBuilderCreateTests in python/pyspark/sql/tests/test_session.py

Was this patch authored or co-authored using generative AI tooling?

No

github-actions · 2026-01-15T20:22:58Z

JIRA Issue Information

=== Improvement SPARK-55055 ===
Summary: Support SparkSession.Builder.create for PySpark Classic #53820
Assignee: Jon Mio
Status: Open
Affected: ["4.2.0"]

This comment was automatically generated by GitHub Actions

sryza

This LGTM. @cloud-fan – are you interested in taking a look too?

gatorsmile · 2026-01-16T19:28:03Z

python/pyspark/sql/session.py

+
+                    sc = (
+                        self._sc
+                        if getattr(self, "_sc", None) is not None and self._sc._jsc is not None


In the Class Builder, which function actually sets self._sc? Is it None always?

CC @HyukjinKwon and @cloud-fan

It's set here:

spark/python/pyspark/sql/session.py

Lines 616 to 622 in 70a3ab5

def __init__(

self,

sparkContext: "SparkContext",

jsparkSession: Optional["JavaObject"] = None,

options: Dict[str, Any] = {},

):

self._sc = sparkContext

in the constructor of SparkSession.

gatorsmile · 2026-01-16T19:33:28Z

python/pyspark/sql/tests/test_session.py

            messageParameters={},
        )

+class SparkSessionBuilderCreateTests(unittest.TestCase, PySparkErrorTestUtils):


Thank you for adding the tests!

Could you help clarify coverage for the following areas?

Error Handling Tests
• How does create() behave when called with invalid configurations?
• What is the expected behavior if SparkContext creation fails?
• How should conflicting SparkContext-level configurations be handled when a context already exists?

Thread Safety Tests
• How should the implementation behave when multiple threads call create() concurrently?
• Are there any expected race conditions between create() and getOrCreate() that we should guard against or test explicitly?

Default/Active Session Update Logic
The PR description states:

“This method will update the default and/or active session if they are not set.”

Could we clarify:
• What is the expected behavior for updating the default session, and how should this be tested?
• In what cases should the default session be updated or preserved?
• Are there additional scenarios around active session updates that should be validated?

Session Stop/Cleanup
• What should happen when stop() is called on a session created via create()?
• How can we ensure resources are properly cleaned up?
• Should stopping one session affect other sessions sharing the same SparkContext?

Connect Mode Fallback
• How do we want to validate that the existing Connect mode logic continues to work correctly after these changes?

appName Configuration
• Should we explicitly verify that appName() and other common builder configurations work correctly with create() in addition to master()?

Catalog/Database Operations
• What expectations do we have around catalog operations for sessions created via create()? For example:
• Creating databases or tables
• Cross-session visibility of catalogs

Duplicate Config Keys
• What should the expected behavior be if the same config key is set multiple times in the builder before calling create()?

Remote Mode Error Paths
• Are there specific error scenarios in the Connect/remote path (e.g., invalid remote URLs) that we should explicitly test?

Thanks for the detailed review! I've added some tests to improve coverage, but also wanted to address a few of your high level concerns around connect mode compatibility, error/handling failure modes, and behaviors with multiple sessions. Importantly, this change is introducing parity with Connect (create already exists there and we are leaning on existing connect test coverage to ensure we have no introduced new regressions), multiple sessions attached to the same context already exists via the newSession API, and we are reusing existing primitives used by getOrCreate and newSession (e.g. session construction helpers and locking). Here are my thoughts on some of the more specific comments:

Connect mode fallback & invalid remote URLs

This PR doesn't touch connect mode at all, and we rely on existing connect test coverage

Are there race conditions between create() and getOrCreate()?

Both methods use the same self._lock (line 606) and rely on SparkContext.getOrCreate() which has its own locking via SparkContext._lock. No new race conditions introduced here - just reusing the existing thread-safe mechanisms.

Duplicate config keys

Standard SparkConf behavior - last write wins. Same as getOrCreate(), nothing specific to create().

Session stop/cleanup:

Stopping a session stops the parent SparkContext. This stops all sessions attached to the SparkContext, but this is not new behavior in Classic.

Conflicting SparkContext configs

SparkContext.getOrCreate() handles this at line 615. When a SparkContext already exists, new configs apply to the session only, not the shared context. Same behavior as getOrCreate() - there's even a comment about this at line 558: "Do not update SparkConf for existing SparkContext, as it's shared by all sessions."

Let me know if you would like specific tests for any of these behaviors though, happy to add it.

Added some additional tests for the error handling and common confs

cloud-fan · 2026-01-19T02:33:43Z

python/pyspark/sql/session.py


-        # Spark Connect-specific API
-        @remote_only
        def create(self) -> "SparkSession":


in the scala side, def create and def shareOrCreate share the code by using a forceCreate flag, shall we do the same at the python side?

I think the build(foreCreate: Boolean) pattern works really well on the Scala side within classic mode, but for the Python code we have to handle both connect and classic and it seems like there is a bunch of complex handling for each mode. IMO it would be even more confusing to put this all under one shared function. Lmk if you have good ideas on how to make this code easier to reason about though

github-actions bot added SQL PYTHON labels Jan 15, 2026

jonmio changed the title ~~Support SparkSession.Builder.create for PySpark Classic~~ [SPARK-55055] Support SparkSession.Builder.create for PySpark Classic Jan 15, 2026

sryza approved these changes Jan 15, 2026

View reviewed changes

gatorsmile reviewed Jan 16, 2026

View reviewed changes

HyukjinKwon changed the title ~~[SPARK-55055] Support SparkSession.Builder.create for PySpark Classic~~ [SPARK-55055][PYTHON] Support SparkSession.Builder.create for PySpark Classic Jan 19, 2026

cloud-fan reviewed Jan 19, 2026

View reviewed changes

jonmio added 2 commits January 25, 2026 15:25

add create support

43e3591

address comments

54f4113

jonmio force-pushed the pyspark_create branch from ab5e359 to 54f4113 Compare January 25, 2026 20:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55055][PYTHON] Support SparkSession.Builder.create for PySpark Classic #53820

[SPARK-55055][PYTHON] Support SparkSession.Builder.create for PySpark Classic #53820

Uh oh!

jonmio commented Jan 15, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

sryza left a comment

Uh oh!

gatorsmile Jan 16, 2026

Uh oh!

HyukjinKwon Jan 19, 2026

Uh oh!

gatorsmile Jan 16, 2026

Uh oh!

jonmio Jan 25, 2026

Uh oh!

jonmio Jan 25, 2026

Uh oh!

cloud-fan Jan 19, 2026

Uh oh!

jonmio Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	def __init__(
	self,
	sparkContext: "SparkContext",
	jsparkSession: Optional["JavaObject"] = None,
	options: Dict[str, Any] = {},
	):
	self._sc = sparkContext

[SPARK-55055][PYTHON] Support SparkSession.Builder.create for PySpark Classic #53820

Are you sure you want to change the base?

[SPARK-55055][PYTHON] Support SparkSession.Builder.create for PySpark Classic #53820

Uh oh!

Conversation

jonmio commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JIRA Issue Information

Uh oh!

sryza left a comment

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

jonmio Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

jonmio Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

jonmio Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jonmio commented Jan 15, 2026 •

edited

Loading

github-actions bot commented Jan 15, 2026 •

edited

Loading