-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8935][SQL] Implement code generation for all casts #7365
Conversation
Not ready to be reviewed, just want to trigger test first. |
Jenkins, OK to test |
@YijieSHEN Just want to mention that you can easily test it locally by:
|
Test build #1061 has finished for PR 7365 at commit
|
case (BinaryType, StringType) => | ||
defineCodeGen (ctx, ev, c => | ||
s"${ctx.stringType}.fromBytes($c)") | ||
private[this] def castGen(from: DataType, to: DataType, ctx: CodeGenContext, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not remove this and put it into genCode directly?
@davies , thanks for the tip, it's just what I need to check any unconscious situations. :) @davies @rxin, the pr is almost done for the current implementation, I introduce |
@@ -421,48 +421,515 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w | |||
|
|||
protected override def nullSafeEval(input: Any): Any = cast(input) | |||
|
|||
private[this] class CodeHolder private() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It imitate the behaviour of defineCodeGen
and nullSafeCodeGen
, hold the calculation for either genCode
if it's a primitive type cast or to be later used in ComplexType's member evaluation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems a bit too much here to be honest
|
@cloud-fan Take a look at this also if you have time. |
@@ -371,7 +371,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w | |||
|
|||
private[this] def castArray(from: ArrayType, to: ArrayType): Any => Any = { | |||
val elementCast = cast(from.elementType, to.elementType) | |||
buildCast[Seq[Any]](_, _.map(v => if (v == null) null else elementCast(v))) | |||
buildCast[Seq[Any]](_, seq => seq.map(v => if (v == null) null else elementCast(v))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, change this by accident, will revert in following commit.
@cloud-fan , thanks for reviewing, will resolve the comment soon. |
2edc21c
to
5de0a95
Compare
Jenkins, ok to test. |
I'm pretty sure Jenkins hates you. |
Jenkins, ok to test. |
Jenkins, test this please. |
Test build #37752 has finished for PR 7365 at commit
|
Test build #37773 has finished for PR 7365 at commit
|
Test build #37778 has finished for PR 7365 at commit
|
Test build #37784 has finished for PR 7365 at commit
|
ctx: CodeGenContext): CastFunction = to match { | ||
|
||
case _ if from == NullType => (c, evPrim, evNull) => s"$evNull = true;" | ||
case _ if to == from => (c, evPrim, evNull) => s"$evPrim = $c;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimizer will remove unnecessary casts, so I think we don't need this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, let me check why optimizer didn't work for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get this, checkEvaluation
in testSuite doesn't trigger optimiser at all, does it? So I think I should remove unnecessary cast introduced in DateExpressionSuite
, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the fact that we also do cast in interpreted version when from == to
, let's do this in codegen version too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess its reasonable to remove unnecessary cast introduced in DateExpressionSuite
and remove the interpreted version as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok to keep them. I don't really have a strong preference on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast expression will be inserted during physical operator transformation,for example Average
, in which case its possible we add a cast to self
expression.
Test build #37800 has finished for PR 7365 at commit
|
More comments on this? |
} | ||
|
||
private[this] def castCode(ctx: CodeGenContext, childPrim: String, childNull: String, | ||
resultPrim: String, resultNull: String, resultType: DataType, cast: CastFunction): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we can use result
as GeneratedExpressionCode for resultPrim, resultNull, child for childPrim and childNull. resultType is this.dataType
.
Right now, it has two many parameters, hard to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is short, I'd like to inline it into genCode()
, it will be easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
castCode is intendedly take all these parameters because we have to do cast for ComplexType
as well, so we need call castCode recursively in these casts.
I think I should add a comment in code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used by castStructCode
, never mind the above comments.
Test build #38013 has finished for PR 7365 at commit
|
It's a
|
@davies , could you review this again? thanks! |
$result.update($i, null); | ||
} else { | ||
$fromType $fromFieldPrim = | ||
${unboxPrimitive(ctx, from.fields(i).dataType, s"$tmpRow.apply($i)")}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use should use the special getter to access them, because UnsafeRow does not support generic getter for primitive types, see ctx.getColumn
, then we don't need to unboxPrimitive()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I get this, I will use ctx.getColumn
and ctx.setColumn
in new commit.
Test build #1153 has finished for PR 7365 at commit
|
Test build #38067 has finished for PR 7365 at commit
|
build error caused by : |
Test build #1183 has finished for PR 7365 at commit
|
LGTM |
Merging into master, thanks! |
s""" | ||
final int $size = $c.size(); | ||
final $arraySeqClass<Object> $result = new $arraySeqClass<Object>($size); | ||
for (int $j = 0; $j < $size; $j ++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason that we create fresh name for these temp variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose we are casting a structType that has two arrayType field, all the structType's filed will be expand one by one, therefore, we need fresh name for each of the arrayField cast to avoid redeclaring variables in the same scope.
JIRA: https://issues.apache.org/jira/browse/SPARK-8935