Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8935][SQL] Implement code generation for all casts #7365

Closed
wants to merge 11 commits into from

Conversation

yjshen
Copy link
Member

@yjshen yjshen commented Jul 13, 2015

@yjshen
Copy link
Member Author

yjshen commented Jul 13, 2015

Not ready to be reviewed, just want to trigger test first.

@davies
Copy link
Contributor

davies commented Jul 13, 2015

Jenkins, OK to test

@davies
Copy link
Contributor

davies commented Jul 13, 2015

@YijieSHEN Just want to mention that you can easily test it locally by:

$ buid/sbt
sbt> catalyst/test
sbt> sql/test

@SparkQA
Copy link

SparkQA commented Jul 13, 2015

Test build #1061 has finished for PR 7365 at commit 6c00e5a.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

case (BinaryType, StringType) =>
defineCodeGen (ctx, ev, c =>
s"${ctx.stringType}.fromBytes($c)")
private[this] def castGen(from: DataType, to: DataType, ctx: CodeGenContext,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not remove this and put it into genCode directly?

@yjshen
Copy link
Member Author

yjshen commented Jul 14, 2015

@davies , thanks for the tip, it's just what I need to check any unconscious situations. :)

@davies @rxin, the pr is almost done for the current implementation, I introduce CodeHolder in the latest commit to support complex types cast, can you have a quick glimpse of it? Not sure if I'm using too much unnecessary abstraction again.

@@ -421,48 +421,515 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w

protected override def nullSafeEval(input: Any): Any = cast(input)

private[this] class CodeHolder private() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It imitate the behaviour of defineCodeGen and nullSafeCodeGen, hold the calculation for either genCode if it's a primitive type cast or to be later used in ComplexType's member evaluation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems a bit too much here to be honest

@yjshen
Copy link
Member Author

yjshen commented Jul 15, 2015

CodeHolder is removed in the latest commit.

@rxin
Copy link
Contributor

rxin commented Jul 15, 2015

@cloud-fan Take a look at this also if you have time.

@@ -371,7 +371,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w

private[this] def castArray(from: ArrayType, to: ArrayType): Any => Any = {
val elementCast = cast(from.elementType, to.elementType)
buildCast[Seq[Any]](_, _.map(v => if (v == null) null else elementCast(v)))
buildCast[Seq[Any]](_, seq => seq.map(v => if (v == null) null else elementCast(v)))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, change this by accident, will revert in following commit.

@yjshen
Copy link
Member Author

yjshen commented Jul 18, 2015

@cloud-fan , thanks for reviewing, will resolve the comment soon.

@yjshen yjshen force-pushed the cast_codegen branch 2 times, most recently from 2edc21c to 5de0a95 Compare July 18, 2015 14:19
@yjshen yjshen changed the title [SPARK-8935][WIP] Implement code generation for all casts [SPARK-8935][SQL] Implement code generation for all casts Jul 18, 2015
@yjshen
Copy link
Member Author

yjshen commented Jul 19, 2015

@rxin , Since #7488 is merged, can we trigger the test now?

@rxin
Copy link
Contributor

rxin commented Jul 19, 2015

Jenkins, ok to test.

@rxin
Copy link
Contributor

rxin commented Jul 19, 2015

I'm pretty sure Jenkins hates you.

@rxin
Copy link
Contributor

rxin commented Jul 19, 2015

Jenkins, ok to test.

@rxin
Copy link
Contributor

rxin commented Jul 19, 2015

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Jul 19, 2015

Test build #37752 has finished for PR 7365 at commit 5de0a95.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 19, 2015

Test build #37773 has finished for PR 7365 at commit 1683876.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 19, 2015

Test build #37778 has finished for PR 7365 at commit d01eada.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 20, 2015

Test build #37784 has finished for PR 7365 at commit 80378a5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ConcatWs(children: Seq[Expression])

ctx: CodeGenContext): CastFunction = to match {

case _ if from == NullType => (c, evPrim, evNull) => s"$evNull = true;"
case _ if to == from => (c, evPrim, evNull) => s"$evPrim = $c;"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimizer will remove unnecessary casts, so I think we don't need this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, let me check why optimizer didn't work for this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this, checkEvaluation in testSuite doesn't trigger optimiser at all, does it? So I think I should remove unnecessary cast introduced in DateExpressionSuite, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the fact that we also do cast in interpreted version when from == to, let's do this in codegen version too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess its reasonable to remove unnecessary cast introduced in DateExpressionSuite and remove the interpreted version as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok to keep them. I don't really have a strong preference on this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast expression will be inserted during physical operator transformation,for example Average, in which case its possible we add a cast to self expression.

@SparkQA
Copy link

SparkQA commented Jul 20, 2015

Test build #37800 has finished for PR 7365 at commit fd7eba4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yjshen
Copy link
Member Author

yjshen commented Jul 21, 2015

More comments on this?

}

private[this] def castCode(ctx: CodeGenContext, childPrim: String, childNull: String,
resultPrim: String, resultNull: String, resultType: DataType, cast: CastFunction): String = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we can use result as GeneratedExpressionCode for resultPrim, resultNull, child for childPrim and childNull. resultType is this.dataType.

Right now, it has two many parameters, hard to read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is short, I'd like to inline it into genCode(), it will be easier to understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

castCode is intendedly take all these parameters because we have to do cast for ComplexType as well, so we need call castCode recursively in these casts.
I think I should add a comment in code here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used by castStructCode, never mind the above comments.

@SparkQA
Copy link

SparkQA commented Jul 22, 2015

Test build #38013 has finished for PR 7365 at commit eaece18.

  • This patch fails some tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yjshen
Copy link
Member Author

yjshen commented Jul 22, 2015

It's a run-tests.py error:

File "./dev/run-tests.py", line 66, in __main__.identify_changed_files_from_git_commits
Failed example:
    [x.name for x in determine_modules_for_files(             identify_changed_files_from_git_commits("fc0a1475ef", target_ref="5da21f07"))]
Exception raised:

@yjshen
Copy link
Member Author

yjshen commented Jul 22, 2015

@davies , could you review this again? thanks!

$result.update($i, null);
} else {
$fromType $fromFieldPrim =
${unboxPrimitive(ctx, from.fields(i).dataType, s"$tmpRow.apply($i)")};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use should use the special getter to access them, because UnsafeRow does not support generic getter for primitive types, see ctx.getColumn, then we don't need to unboxPrimitive()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I get this, I will use ctx.getColumn and ctx.setColumn in new commit.

@SparkQA
Copy link

SparkQA commented Jul 22, 2015

Test build #1153 has finished for PR 7365 at commit eaece18.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 22, 2015

Test build #38067 has finished for PR 7365 at commit ef6e8b5.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yjshen
Copy link
Member Author

yjshen commented Jul 23, 2015

build error caused by :Error: Invalid or corrupt jarfile build/sbt-launch-0.13.7.jar
please trigger the test again?

@SparkQA
Copy link

SparkQA commented Jul 23, 2015

Test build #1183 has finished for PR 7365 at commit ef6e8b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Jul 23, 2015

LGTM

@davies
Copy link
Contributor

davies commented Jul 23, 2015

Merging into master, thanks!

@asfgit asfgit closed this in 6d0d8b4 Jul 23, 2015
@yjshen yjshen deleted the cast_codegen branch July 28, 2015 04:04
s"""
final int $size = $c.size();
final $arraySeqClass<Object> $result = new $arraySeqClass<Object>($size);
for (int $j = 0; $j < $size; $j ++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason that we create fresh name for these temp variables?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose we are casting a structType that has two arrayType field, all the structType's filed will be expand one by one, therefore, we need fresh name for each of the arrayField cast to avoid redeclaring variables in the same scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants