-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-41264][CONNECT][PYTHON] Make Literal support more datatypes #38800
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing the tedious work!
message CalendarInterval { | ||
int32 months = 1; | ||
int32 days = 2; | ||
int64 microseconds = 3; | ||
} | ||
|
||
message IntervalYearToMonth { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're removing the reference above, please remove it here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will update
// Date in units of days since the UNIX epoch. | ||
int32 date = 16; | ||
// Time in units of microseconds past midnight | ||
int64 time = 17; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the ``datetime.time`?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I don't find a corresponding sql type for datetime.time
, so remove it.
DataType null = 29; // a typed null literal | ||
List list = 30; | ||
DataType.Array empty_array = 31; | ||
DataType.Map empty_map = 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clean up the below elements as well, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it!
e3e5ccb
to
e271a41
Compare
also cc @HyukjinKwon @amaliujia @cloud-fan |
e271a41
to
d1fc8e8
Compare
@@ -40,37 +40,34 @@ message Expression { | |||
|
|||
message Literal { | |||
oneof literal_type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In catalyst, Literal
is consist of value and type. The benefit is, there aren't many physical values (timestamp and timestamp ntz are both int64). When we need to add a new data type with existing physical values (e.g. char are varchar), we don't need to update Literal
.
This is not a design decision made by this PR, so I won't block this PR for this reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only works in Scala because of the inherited type hierarchy, the ability of an Any type and deferring the necessary apply / unaply methods.
Since proto does not have this imperative concept of inheritance, we need to embed the physical type with the logical one.
@@ -134,6 +138,37 @@ def test_list_to_literal(self): | |||
lit_list_plan = fun.lit([fun.lit(10), fun.lit("str")]).to_plan(None) | |||
self.assertIsNotNone(lit_list_plan) | |||
|
|||
def test_tuple_to_literal(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ is this Python specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, scala lit
do not support tuple and map.
LGTM A big left work is testing coverage over literals. For example does This is not because of current PR so that is not a blocker. |
// (ignoring precision) Always 16 bytes in length | ||
bytes value = 1; | ||
// the string representation. | ||
string value = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refer to the constructors of Decimal https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L541-L564
I think we can use :
def apply(unscaled: Long, precision: Int, scale: Int): Decimal =
new Decimal().set(unscaled, precision, scale)
def apply(value: String): Decimal = new Decimal().set(BigDecimal(value))
I just choose string
for two reasons:
1, it is convenient;
2, not sure about how to get the unscaled
and scale
from python decimal
d1fc8e8
to
85b6c9e
Compare
@amaliujia add a simple test for nan and inf. Will add more tests as follow ups. |
since I have other PRs depending on this work, let me merge it into master now, thanks for reviews |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks!
### What changes were proposed in this pull request? 1, in the sever side, try to match https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L63-L101, and use `CreateArray`, `CreateStruct`, `CreateMap` for complex inputs; 2, in the client side, try to match https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L1335-L1349 , but do not support `datetime.time` since I don't find a corrsponding sql type for it. ### Why are the changes needed? try to support all datatype ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? updated tests Closes apache#38800 from zhengruifeng/connect_update_literal. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
### What changes were proposed in this pull request? 1, in the sever side, try to match https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L63-L101, and use `CreateArray`, `CreateStruct`, `CreateMap` for complex inputs; 2, in the client side, try to match https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L1335-L1349 , but do not support `datetime.time` since I don't find a corrsponding sql type for it. ### Why are the changes needed? try to support all datatype ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? updated tests Closes apache#38800 from zhengruifeng/connect_update_literal. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
What changes were proposed in this pull request?
1, in the sever side, try to match https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L63-L101, and use
CreateArray
,CreateStruct
,CreateMap
for complex inputs;2, in the client side, try to match https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L1335-L1349 ,
but do not support
datetime.time
since I don't find a corrsponding sql type for it.Why are the changes needed?
try to support all datatype
Does this PR introduce any user-facing change?
No
How was this patch tested?
updated tests