Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Oct 21, 2018

What changes were proposed in this pull request?

The serializers of RowEncoder use few If Catalyst expression which inherits ComplexTypeMergingExpression that will check input data types.

It is possible to generate serializers which fail the check and can't to access the data type of serializers. When producing If expression, we should use the same data type at its input expressions.

How was this patch tested?

Added test.

@SparkQA
Copy link

SparkQA commented Oct 21, 2018

Test build #97674 has finished for PR 22785 at commit d6ed4b7.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 21, 2018

Test build #97676 has finished for PR 22785 at commit 6d41fe0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Oct 21, 2018

cc @cloud-fan

val schema = new StructType().add("pythonUDT", pythonUDT, true)
val encoder = RowEncoder(schema)
// scalastyle:off println
encoder.serializer.foreach(s => println(s.dataType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't print in a test, can we use assert?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try it.

if (inputObject.nullable) {
If(IsNull(inputObject),
Literal.create(null, inputType),
Literal.create(null, nonNullOutput.dataType),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, for this case it has no difference, but makes the code clearer

If(
Invoke(inputObject, "isNullAt", BooleanType, Literal(index) :: Nil),
Literal.create(null, field.dataType),
Literal.create(null, fieldValue.dataType),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a comment to explain it? field.dataType can be different from fieldValue.dataType, because we strip UDT

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.


test("SPARK-25791: Datatype of serializers should be accessible") {
val udtSQLType = new StructType().add("a", IntegerType)
val pythonUDT = new PythonUserDefinedType(udtSQLType, "pyUDT", "serializedPyClass")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reproduce the bug with normal UDT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For normal UDT, we create an Invoke with the original udt type. So it won't cause such problem.

@SparkQA
Copy link

SparkQA commented Oct 23, 2018

Test build #97915 has finished for PR 22785 at commit 61c8b2f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 736fc03 Oct 23, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…cessible

## What changes were proposed in this pull request?

The serializers of `RowEncoder` use few `If` Catalyst expression which inherits `ComplexTypeMergingExpression` that will check input data types.

It is possible to generate serializers which fail the check and can't to access the data type of serializers. When producing If expression, we should use the same data type at its input expressions.

## How was this patch tested?

Added test.

Closes apache#22785 from viirya/SPARK-25791.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@viirya viirya deleted the SPARK-25791 branch December 27, 2023 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants