-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22500][SQL] Fix 64KB JVM bytecode limit problem with cast #19730
Conversation
@@ -1015,7 +1015,9 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String | |||
} | |||
val rowClass = classOf[GenericInternalRow].getName | |||
val result = ctx.freshName("result") | |||
ctx.addMutableState(s"$rowClass", result, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that since we not assigning it, we might keep it as a local variable and pass it to the generated methods. In this way we can avoid to introduce new global variables. Have you tried that?
val tmpRow = ctx.freshName("tmpRow") | ||
ctx.addMutableState("InternalRow", tmpRow, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Test build #83750 has finished for PR 19730 at commit
|
@@ -1039,13 +1039,19 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String | |||
} | |||
} | |||
""" | |||
}.mkString("\n") | |||
} | |||
val fieldsEvalCodes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't be ctx.currentVars != null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ctx.currentVars != null
, we need to use mkString("\n")
.
Test build #83790 has finished for PR 19730 at commit
|
Test build #83792 has finished for PR 19730 at commit
|
} | ||
val fieldsEvalCodes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) { | ||
ctx.splitExpressions(fieldsEvalCode, "castStruct", | ||
("InternalRow", ctx.INPUT_ROW) :: (rowClass, result) :: ("InternalRow", tmpRow) :: Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about inner struct? We also need to pass in the ctx.INPUT_ROW
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inner struct case in the following code and existing cases in CastSuite
works well. Tomorrow, I will add similar test case with many fields.
val struct = Literal.create(
Row(
UTF8String.fromString("123.4"),
Seq("456", "true", "78.9"),
Row(7)),
StructType(Seq(
StructField("i", StringType, nullable = true),
StructField("a",
ArrayType(StringType, containsNull = false), nullable = true),
StructField("s",
StructType(Seq(
StructField("i", IntegerType, nullable = true)))))))
val ret = cast(struct, StructType(Seq(
StructField("d", DoubleType, nullable = true),
StructField("a",
ArrayType(IntegerType, containsNull = true), nullable = true),
StructField("s",
StructType(Seq(
StructField("l", LongType, nullable = true)))))))
assert(ret.resolved === true)
checkEvaluation(ret, Row(123.4, Seq(456, null, 78), Row(7L)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, we don't need to pass in ctx.INPUT_ROW
to the split functions.
checkEvaluation(cast(Literal.create(input1, from1), to1), output1) | ||
|
||
val from2 = new StructType( | ||
(1 to N).map(i => StructField(s"a$i", ArrayType(StringType, containsNull = false))).toArray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect something like
val from2 = new StructType(
(1 to N).map(i => StructField(s"s$i", from1)).toArray)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just test this case.
Test build #83970 has finished for PR 19730 at commit
|
ping @kiszk |
Working for this. I met another problem. Since this code creates a lot of global variable, I met the problem due to 64K constant pool entries.
|
Yeah. We can fix them in this PR. BTW, could you check all the other calls of |
I think this is a different issue and should be fixed with another PR. @kiszk how about we change the test to cast int to long to avoid this issue? |
@cloud-fan I see, I will create another PR to fix this global variable issue. |
(1 to M).map(i => StructField(s"s$i", toInner)).toArray) | ||
val inputOuter = Row.fromSeq((1 to M).map(_ => inputInner)) | ||
val outputOuter = Row.fromSeq((1 to M).map(_ => outputInner)) | ||
checkEvaluation(cast(Literal.create(inputOuter, fromOuter), toOuter), outputOuter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this case is good enough to cover all the above cases?
Test build #84064 has finished for PR 19730 at commit
|
Test build #84079 has finished for PR 19730 at commit
|
thanks, merging to master/2.2! |
This PR changes `cast` code generation to place generated code for expression for fields of a structure into separated methods if these size could be large. Added new test cases into `CastSuite` Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #19730 from kiszk/SPARK-22500. (cherry picked from commit ac10171) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
… whole stage codegen ## What changes were proposed in this pull request? A followup of #19730, we can split the code for casting struct even with whole stage codegen. This PR also has some renaming to make the code easier to read. ## How was this patch tested? existing test Author: Wenchen Fan <wenchen@databricks.com> Closes #19891 from cloud-fan/cast.
This PR changes `cast` code generation to place generated code for expression for fields of a structure into separated methods if these size could be large. Added new test cases into `CastSuite` Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes apache#19730 from kiszk/SPARK-22500. (cherry picked from commit ac10171) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This PR changes
cast
code generation to place generated code for expression for fields of a structure into separated methods if these size could be large.How was this patch tested?
Added new test cases into
CastSuite