Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-48336][PS][CONNECT] Implement
ps.sql
in Spark Connect
### What changes were proposed in this pull request? Implement `ps.sql` in Spark Connect ### Why are the changes needed? feature parity in Spark Connect ### Does this PR introduce _any_ user-facing change? yes: ``` In [4]: spark Out[4]: <pyspark.sql.connect.session.SparkSession at 0x105136390> In [5]: >>> ps.sql(''' ...: ... SELECT m1.a, m2.b ...: ... FROM {table1} m1 INNER JOIN {table2} m2 ...: ... ON m1.key = m2.key ...: ... ORDER BY m1.a, m2.b''', ...: ... table1=ps.DataFrame({"a": [1,2], "key": ["a", "b"]}), ...: ... table2=pd.DataFrame({"b": [3,4,5], "key": ["a", "b", "b"]})) /Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/utils.py:1018: PandasAPIOnSparkAdviceWarning: The config 'spark.sql.ansi.enabled' is set to True. This can cause unexpected behavior from pandas API on Spark since pandas API on Spark follows the behavior of pandas, not SQL. warnings.warn(message, PandasAPIOnSparkAdviceWarning) /Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/utils.py:1018: PandasAPIOnSparkAdviceWarning: The config 'spark.sql.ansi.enabled' is set to True. This can cause unexpected behavior from pandas API on Spark since pandas API on Spark follows the behavior of pandas, not SQL. warnings.warn(message, PandasAPIOnSparkAdviceWarning) a b 0 1 3 1 2 4 2 2 5 ``` ### How was this patch tested? 1. enabled UTs 2. also manually tested all the examples ### Was this patch authored or co-authored using generative AI tooling? No Closes #46658 from zhengruifeng/ps_sql. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
- Loading branch information