-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24148][SQL] Overloading array function to support typed empty arrays #21215
[SPARK-24148][SQL] Overloading array function to support typed empty arrays #21215
Conversation
ok to test |
Test build #90073 has finished for PR 21215 at commit
|
Test build #90079 has finished for PR 21215 at commit
|
retest this please |
Do you wanna do this?
|
Test build #90086 has finished for PR 21215 at commit
|
Hey @maropu, So we've encountered a number of issues with casting:
This code produces
|
How about this?
|
@maropu That would work if you had scala case classes for all the types. In our case, we're working on a generic framework, where we only have Spark schemas (and I'd rather not generate case classes at runtime). Can you suggest an existing way to do this using spark's DataType please? |
Like this?
|
@maropu Thanks! Didn't know about creating a literal this way. Don't you feel that the suggested change is way more elegant? |
@maropu Really nice idea to create typed empty arrays via an I've tailored the solution according to your suggestion, but still think that some function should be introduced. What do you think? |
Test build #90137 has finished for PR 21215 at commit
|
Test build #104578 has finished for PR 21215 at commit
|
What about Map types too -- same issue? I just wonder if it's worth a whole new API method given there's a way to express it if really needed, or possibly a way to simply rewrite the code to avoid it? |
@srowen Yep, Map types suffer from the same problem. Users could eventually create a column of empty typed maps with using this function and This feature is definitely just nice to have since users can directly use |
Can one of the admins verify this patch? |
I think we should rather have |
What changes were proposed in this pull request?
The PR proposes to overload
array
function and allow users to specify the element type for empty arrays. Currently, empty arrays produced byarray
function are ofStringType
and there is no way how to cast them to a different type.A perfect example of the use case is
when(cond, trueExp).otherwise(falseExp)
, which expectstrueExp
andfalseExp
of being the same type. In scenario where we want to produce an empty array, in one of these cases, there's no other way than creating anUDF
.How was this patch tested?
Added test cases into
DataFrameComplexTypeSuite
Note
Eventually, I will add a wrapper for PySpark, but would like to discuss the idea first.