-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33268][SQL][PYTHON][3.0] Fix bugs for casting data from/to PythonUserDefinedType #30191
Conversation
Thanks, @maropu ! |
NOTE: I'll open a PR soon to backport it into branch-2.4., too. |
Kubernetes integration test starting |
Kubernetes integration test status failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @maropu and @HyukjinKwon .
Merged to branch-3.0.
…honUserDefinedType ### What changes were proposed in this pull request? This PR intends to fix bugs for casting data from/to PythonUserDefinedType. A sequence of queries to reproduce this issue is as follows; ``` >>> from pyspark.sql import Row >>> from pyspark.sql.functions import col >>> from pyspark.sql.types import * >>> from pyspark.testing.sqlutils import * >>> >>> row = Row(point=ExamplePoint(1.0, 2.0)) >>> df = spark.createDataFrame([row]) >>> df.select(col("point").cast(PythonOnlyUDT())) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/maropu/Repositories/spark/spark-master/python/pyspark/sql/dataframe.py", line 1402, in select jdf = self._jdf.select(self._jcols(*cols)) File "/Users/maropu/Repositories/spark/spark-master/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ File "/Users/maropu/Repositories/spark/spark-master/python/pyspark/sql/utils.py", line 111, in deco return f(*a, **kw) File "/Users/maropu/Repositories/spark/spark-master/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o44.select. : java.lang.NullPointerException at org.apache.spark.sql.types.UserDefinedType.acceptsType(UserDefinedType.scala:84) at org.apache.spark.sql.catalyst.expressions.Cast$.canCast(Cast.scala:96) at org.apache.spark.sql.catalyst.expressions.CastBase.checkInputDataTypes(Cast.scala:267) at org.apache.spark.sql.catalyst.expressions.CastBase.resolved$lzycompute(Cast.scala:290) at org.apache.spark.sql.catalyst.expressions.CastBase.resolved(Cast.scala:290) ``` A root cause of this issue is that, since `PythonUserDefinedType#userClassis` always null, `isAssignableFrom` in `UserDefinedType#acceptsType` throws a null exception. To fix it, this PR defines `acceptsType` in `PythonUserDefinedType` and filters out the null case in `UserDefinedType#acceptsType`. This backport comes from #30169. ### Why are the changes needed? Bug fixes. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. Closes #30191 from maropu/SPARK-33268-BRANCH3.0. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Thanks, all ~ |
Test build #130420 has finished for PR 30191 at commit
|
What changes were proposed in this pull request?
This PR intends to fix bugs for casting data from/to PythonUserDefinedType. A sequence of queries to reproduce this issue is as follows;
A root cause of this issue is that, since
PythonUserDefinedType#userClassis
always null,isAssignableFrom
inUserDefinedType#acceptsType
throws a null exception. To fix it, this PR definesacceptsType
inPythonUserDefinedType
and filters out the null case inUserDefinedType#acceptsType
.This backport comes from #30169.
Why are the changes needed?
Bug fixes.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added tests.