Skip to content

Comments

[WIP][SPARK-40309][PYTHON][PS] Introduce sql_conf context manager for pyspark.sql#37777

Closed
xinrong-meng wants to merge 5 commits intoapache:masterfrom
xinrong-meng:sql_conf
Closed

[WIP][SPARK-40309][PYTHON][PS] Introduce sql_conf context manager for pyspark.sql#37777
xinrong-meng wants to merge 5 commits intoapache:masterfrom
xinrong-meng:sql_conf

Conversation

@xinrong-meng
Copy link
Member

@xinrong-meng xinrong-meng commented Sep 2, 2022

What changes were proposed in this pull request?

Introduce sql_conf context manager for pyspark.sql.

Why are the changes needed?

That simplifies the control of Spark SQL configuration as below

from

original_value = spark.conf.get("key")
spark.conf.set("key", "value")
...
spark.conf.set("key", original_value)

to

with sql_conf({"key": "value"}):
    ...

Here is such a context manager is in Pandas API on Spark.

We should introduce one in pyspark.sql, and deduplicate code if possible.

Does this PR introduce any user-facing change?

Yes. Users may use the context manager to manage the Spark SQL configuration for a code block.

For example,

>>> from pyspark.sql.utils import sql_conf
>>> with sql_conf({"spark.sql.execution.arrow.pyspark.enabled": True}):
...    pdf = sdf.toPandas()

How was this patch tested?

Unit tests.

"""
from pyspark.sql.session import SparkSession

assert isinstance(pairs, dict), "pairs should be a dictionary."
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remain the assertion to be consistent with pyspark.pandas.utils.sql_conf.

@xinrong-meng xinrong-meng changed the title [SPARK-40309][PYTHON] Introduce sql_conf context manager for pyspark.sql [SPARK-40309][PYTHON][PS] Introduce sql_conf context manager for pyspark.sql Sep 2, 2022
@xinrong-meng xinrong-meng changed the title [SPARK-40309][PYTHON][PS] Introduce sql_conf context manager for pyspark.sql [WIP][SPARK-40309][PYTHON][PS] Introduce sql_conf context manager for pyspark.sql Sep 2, 2022
@AmplabJenkins
Copy link

Can one of the admins verify this patch?



@contextmanager
def sql_conf(pairs: Dict[str, Any], *, spark: Optional["SparkSession"] = None) -> Iterator[None]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably name it as sqlConf since we follow camelcase in the API. In addition, I think this should be able to be imported via something like from pyspark.sql import sqlConf. Should also document it in the API reference at python/docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants