Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update doc #29

Merged
merged 1 commit into from
Feb 25, 2021
Merged

Update doc #29

merged 1 commit into from
Feb 25, 2021

Conversation

dongjoon-hyun
Copy link
Collaborator

No description provided.

@github-actions github-actions bot added the DOCS label Feb 25, 2021
@HyukjinKwon
Copy link
Owner

Thanks!

@HyukjinKwon HyukjinKwon merged commit 923a71d into HyukjinKwon:SPARK-34531 Feb 25, 2021
@dongjoon-hyun dongjoon-hyun deleted the SPARK-PR-31640 branch February 25, 2021 02:14
HyukjinKwon pushed a commit that referenced this pull request May 15, 2023
…ized Python UDF

### What changes were proposed in this pull request?
The PR proposes to provide support for the registration of an Arrow-optimized Python UDF in both vanilla PySpark and Spark Connect.

### Why are the changes needed?
Currently, when users register an Arrow-optimized Python UDF, it will be registered as a pickled Python UDF and thus, executed without Arrow optimization.

We should support Arrow-optimized Python UDFs registration and execute them with Arrow optimization.

### Does this PR introduce _any_ user-facing change?
Yes. No API changes, but result differences are expected in some cases.

Previously, a registered Arrow-optimized Python UDF will be executed without Arrow optimization.
Now, it will be executed with Arrow optimization, as shown below.

```sh
>>> df = spark.range(2)
>>> df.createOrReplaceTempView("df")
>>> from pyspark.sql.functions import udf
>>> udf(useArrow=True)
... def f(x):
...     return str(x)
...

>>> spark.udf.register('str_f', f)
<pyspark.sql.udf.UserDefinedFunction object at 0x7fa1980c16a0>

>>> spark.sql("select str_f(id) from df").explain()  # Executed with Arrow optimization
== Physical Plan ==
*(2) Project [pythonUDF0#32 AS f(id)#30]
+- ArrowEvalPython [f(id#27L)#29], [pythonUDF0#32], 101
   +- *(1) Range (0, 2, step=1, splits=16)
```

Enabling or disabling Arrow optimization can produce result differences in some cases - we are working on minimizing the result differences though.

### How was this patch tested?
Unit test.

Closes apache#41125 from xinrong-meng/registerArrowPythonUDF.

Authored-by: Xinrong Meng <xinrong@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants