Skip to content

[SPARK-52687][PYTHON] Fix SPARK_CONNECT_MODE parsing#54800

Open
mukimasta wants to merge 1 commit intoapache:masterfrom
mukimasta:master
Open

[SPARK-52687][PYTHON] Fix SPARK_CONNECT_MODE parsing#54800
mukimasta wants to merge 1 commit intoapache:masterfrom
mukimasta:master

Conversation

@mukimasta
Copy link

fix https://issues.apache.org/jira/browse/SPARK-52687

What changes were proposed in this pull request?

Replaced the stdout-based SPARK_CONNECT_MODE detection in bin/pyspark with an exit-code-based approach to avoid invisible characters (e.g. \r, spaces, tabs) causing false "unknown value" errors.

Changes:

Run Python to determine the mode and exit with 0 (classic), 1 (connect), or 2 (error) instead of printing to stdout
Use .strip() in Python to handle whitespace in the env var
Add try/except to handle import failures and invalid values
Update the bash logic to branch on exit code instead of parsing command output

Why are the changes needed?

https://issues.apache.org/jira/browse/SPARK-52687

The previous implementation used $(python -c "print(spark_connect_mode())") and compared the output to "0" or "1". This could fail when:

Python output included \r (e.g. Windows CRLF)
Any such character made the comparison fail and triggered the "unknown value" message even when the value was logically correct.

Does this PR introduce any user-facing change?

Yes. Behavior is unchanged for normal cases; the fix only affects previously broken scenarios.

Before: Users could see "The environment variable SPARK_CONNECT_MODE has unknown value or pyspark.util package is not available: 0" even when the value was valid.
After: Valid values (including with surrounding whitespace) are accepted; invalid values still produce an error.
How was this patch tested?

Manual tests for exit codes 0, 1, and 2
Manual test with SPARK_CONNECT_MODE=" 0 " (whitespace)

Was this patch authored or co-authored using generative AI tooling?

No

Thanks to @holdenk for guidance during the Spark Community Sprint

@holdenk
Copy link
Contributor

holdenk commented Mar 18, 2026

Thanks for making this PR @mukimasta :) I'll take a look right now (sorry my week got away from me :))

@HyukjinKwon HyukjinKwon changed the title fix SPARK_CONNECT_MODE parsing [SPARK-52687][PYTHON] Fix SPARK_CONNECT_MODE parsing Mar 18, 2026
Copy link
Contributor

@holdenk holdenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me :D Thank you so much for fixing this! Let's leave it open for a day and see if any other comments land :)

@HyukjinKwon
Copy link
Member

Manual test with SPARK_CONNECT_MODE=" 0 " (whitespace)

I would actually avoid such values to be allowed. Can we simply fix the error message better?

@holdenk
Copy link
Contributor

holdenk commented Mar 18, 2026

Manual test with SPARK_CONNECT_MODE=" 0 " (whitespace)

I would actually avoid such values to be allowed. Can we simply fix the error message better?

Oh yeah so this value is not allowed in the current version. The change here gets the return value from $? so we don't have to worry about whitespace being output from python/shell.

import sys
try:
from pyspark.util import spark_connect_mode
m = spark_connect_mode().strip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm .. actually wouldn't this make " 0 " working?

def spark_connect_mode() -> str:
"""
Return the env var SPARK_CONNECT_MODE; otherwise "1" if `pyspark_connect` is available.
"""
connect_by_default = os.environ.get("SPARK_CONNECT_MODE")
if connect_by_default is not None:
return connect_by_default
try:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably raise a better error message here as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants