-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-46287][PYTHON][CONNECT] DataFrame.isEmpty
should work with all datatypes
#44209
Conversation
a42868d
to
c6b21c3
Compare
c6b21c3
to
ff4e7e3
Compare
cc @HyukjinKwon |
scala client don't have this issue: spark/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala Lines 371 to 373 in 8a4890d
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
Merged to master. Thank you all! |
thank you @dongjoon-hyun @HyukjinKwon for reviews |
…ll datatypes ### What changes were proposed in this pull request? `DataFrame.isEmpty` should work with all datatypes the schema maybe not compatible with arrow, so should not use `collect/take` to check `isEmpty` ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? before: ``` In [1]: spark.sql("SELECT INTERVAL '10-8' YEAR TO MONTH AS interval").isEmpty() 23/12/06 20:39:58 WARN CheckAllocator: More than one DefaultAllocationManager on classpath. Choosing first found --------------------------------------------------------------------------- / 1] KeyError Traceback (most recent call last) Cell In[1], line 1 ----> 1 spark.sql("SELECT INTERVAL '10-8' YEAR TO MONTH AS interval").isEmpty() File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:181, in DataFrame.isEmpty(self) 180 def isEmpty(self) -> bool: --> 181 return len(self.take(1)) == 0 ... File ~/.dev/miniconda3/envs/spark_dev_311/lib/python3.11/site-packages/pyarrow/public-api.pxi:208, in pyarrow.lib.pyarrow_wrap_array() File ~/.dev/miniconda3/envs/spark_dev_311/lib/python3.11/site-packages/pyarrow/array.pxi:3659, in pyarrow.lib.get_array_class_from_type() KeyError: 21 ``` after ``` In [1]: spark.sql("SELECT INTERVAL '10-8' YEAR TO MONTH AS interval").isEmpty() 23/12/06 20:40:26 WARN CheckAllocator: More than one DefaultAllocationManager on classpath. Choosing first found Out[1]: False ``` ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#44209 from zhengruifeng/py_connect_df_isempty. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
DataFrame.isEmpty
should work with all datatypesthe schema maybe not compatible with arrow, so should not use
collect/take
to checkisEmpty
Why are the changes needed?
bugfix
Does this PR introduce any user-facing change?
before:
after
How was this patch tested?
added ut
Was this patch authored or co-authored using generative AI tooling?
no