Skip to content

[SPARK-45026][CONNECT] spark.sql should support datatypes not compatible with arrow#42754

Closed
zhengruifeng wants to merge 1 commit intoapache:masterfrom
zhengruifeng:connect_sql_types
Closed

[SPARK-45026][CONNECT] spark.sql should support datatypes not compatible with arrow#42754
zhengruifeng wants to merge 1 commit intoapache:masterfrom
zhengruifeng:connect_sql_types

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

Move the arrow batch creation to the isCommand branch

Why are the changes needed?

#42736 and #42743 introduced the CalendarIntervalType in Spark Connect Python Client, however, there is a failure

spark.sql("SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001)")

...

pyspark.errors.exceptions.connect.UnsupportedOperationException: [UNSUPPORTED_DATATYPE] Unsupported data type "INTERVAL".

The root causes is that handleSqlCommand always create an arrow batch while ArrowUtils doesn't accept CalendarIntervalType now.

this PR mainly focus on enabling schema with datatypes not compatible with arrow.
In the future, we should make ArrowUtils accept CalendarIntervalType to make collect/toPandas works

Does this PR introduce any user-facing change?

yes

after this PR

In [1]: spark.sql("SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001)")
Out[1]: DataFrame[make_interval(100, 11, 1, 1, 12, 30, 1.001001): interval]

In [2]: spark.sql("SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001)").schema
Out[2]: StructType([StructField('make_interval(100, 11, 1, 1, 12, 30, 1.001001)', CalendarIntervalType(), True)])

How was this patch tested?

enabled ut

Was this patch authored or co-authored using generative AI tooling?

no

@zhengruifeng
Copy link
Contributor Author

@zhengruifeng
Copy link
Contributor Author

cc @HyukjinKwon @grundprinzip

init
@zhengruifeng
Copy link
Contributor Author

also cc @hvanhovell

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhengruifeng
Copy link
Contributor Author

thanks, merged to master

@zhengruifeng zhengruifeng deleted the connect_sql_types branch September 1, 2023 03:13
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants