-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-25791][SQL] Datatype of serializers in RowEncoder should be accessible #22785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #97674 has finished for PR 22785 at commit
|
|
Test build #97676 has finished for PR 22785 at commit
|
|
cc @cloud-fan |
| val schema = new StructType().add("pythonUDT", pythonUDT, true) | ||
| val encoder = RowEncoder(schema) | ||
| // scalastyle:off println | ||
| encoder.serializer.foreach(s => println(s.dataType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't print in a test, can we use assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try it.
| if (inputObject.nullable) { | ||
| If(IsNull(inputObject), | ||
| Literal.create(null, inputType), | ||
| Literal.create(null, nonNullOutput.dataType), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, for this case it has no difference, but makes the code clearer
| If( | ||
| Invoke(inputObject, "isNullAt", BooleanType, Literal(index) :: Nil), | ||
| Literal.create(null, field.dataType), | ||
| Literal.create(null, fieldValue.dataType), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a comment to explain it? field.dataType can be different from fieldValue.dataType, because we strip UDT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment.
|
|
||
| test("SPARK-25791: Datatype of serializers should be accessible") { | ||
| val udtSQLType = new StructType().add("a", IntegerType) | ||
| val pythonUDT = new PythonUserDefinedType(udtSQLType, "pyUDT", "serializedPyClass") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we reproduce the bug with normal UDT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For normal UDT, we create an Invoke with the original udt type. So it won't cause such problem.
|
Test build #97915 has finished for PR 22785 at commit
|
|
thanks, merging to master! |
…cessible ## What changes were proposed in this pull request? The serializers of `RowEncoder` use few `If` Catalyst expression which inherits `ComplexTypeMergingExpression` that will check input data types. It is possible to generate serializers which fail the check and can't to access the data type of serializers. When producing If expression, we should use the same data type at its input expressions. ## How was this patch tested? Added test. Closes apache#22785 from viirya/SPARK-25791. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
The serializers of
RowEncoderuse fewIfCatalyst expression which inheritsComplexTypeMergingExpressionthat will check input data types.It is possible to generate serializers which fail the check and can't to access the data type of serializers. When producing If expression, we should use the same data type at its input expressions.
How was this patch tested?
Added test.