You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For a globally defined function Cythonized code fails to seriealize function definition if python decorator is used to create a udf like following:
Code to reproduce the behaviour:
import pyspark.sql.functions as F
import pyspark.sql.types as T
@F.udf(returnType=T.StringType())
def func(x):
return x
df = spark.createDataFrame(["one", "two", "three"], T.StringType())
df = df.withColumn("x", func(F.col("value")))
df.show()
Expected behaviour
while following code does not give pickle error:
import pyspark.sql.functions as F
def func(x):
return x
func_udf = F.udf(func, returnType=T.StringType())
df = spark.createDataFrame(["one", "two", "three"], T.StringType())
df = df.withColumn("x", func_udf(F.col("value")))
df.show()
OS
Linux
Python version
3.9.2
Cython version
3.0.10
Additional context
No response
The text was updated successfully, but these errors were encountered:
Both Python functions and Cython functions are pickled by name by default. That will fail (in both cases) because with the decorator func no longer isn't directly accessible by name - it's hidden by the result of the decorator returns.
PySpark looks to use cloudpickle. That has a bunch of special-casing for Python functions to pickle them in a different way (by serializing their byte code). That isn't possible on Cython functions.
While I have an open PR to improve pickling of Cython functions, it wouldn't work here because simple functions are still pickled by name (and that fails as described above).
Describe the bug
For a globally defined function Cythonized code fails to seriealize function definition if python decorator is used to create a udf like following:
Code to reproduce the behaviour:
Expected behaviour
while following code does not give pickle error:
OS
Linux
Python version
3.9.2
Cython version
3.0.10
Additional context
No response
The text was updated successfully, but these errors were encountered: