Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23546][SQL] Refactor stateless methods/values in CodegenContext #20700

Closed
wants to merge 11 commits into from

Conversation

kiszk
Copy link
Member

@kiszk kiszk commented Mar 1, 2018

What changes were proposed in this pull request?

A current CodegenContext class has immutable value or method without mutable state, too.
This refactoring moves them to CodeGenerator object class which can be accessed from anywhere without an instantiated CodegenContext in the program.

How was this patch tested?

Existing tests

@@ -773,7 +773,7 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
| ${if (i != 0) s"""$buffer.append(" ");""" else ""}
|
| // Append $i field into the string buffer
| ${ctx.javaType(ft)} $field = ${ctx.getValue(row, ft, s"$i")};
| ${ctx.javaType(ft)} $field = ${CodeGenerator.getValue(row, ft, s"$i")};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some methods that are very frequently used (e.g. ctx.javaType()) are not refactored yet. This is because this refactoring updates more files.

@SparkQA
Copy link

SparkQA commented Mar 1, 2018

Test build #87828 has finished for PR 20700 at commit 0b2b381.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 1, 2018

Test build #87832 has finished for PR 20700 at commit fc4c6ad.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 1, 2018

Test build #87835 has finished for PR 20700 at commit 276df59.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk kiszk changed the title [SPARK-23546][SQL] Refactor non-stateful methods/values in CodegenContext [SPARK-23546][SQL] Refactor stateless methods/values in CodegenContext Mar 1, 2018
@SparkQA
Copy link

SparkQA commented Mar 1, 2018

Test build #87837 has finished for PR 20700 at commit f8b478f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor

mgaido91 commented Mar 1, 2018

I like this approach. Let's see others' opinion, but it seems a good thing to do up to me.

@kiszk
Copy link
Member Author

kiszk commented Mar 1, 2018

@mgaido91 thanks. Let me ping @cloud-fan and @viirya

@@ -1524,4 +1348,215 @@ object CodeGenerator extends Logging {
result
}
})

final val JAVA_BOOLEAN = "boolean"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this normally spaced?

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me.

final val JAVA_FLOAT = "float"
final val JAVA_DOUBLE = "double"
/**
* Returns true if the Java type has a special accessor and setter in [[InternalRow]].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is copied. But seems not totally correct. InternalRow also has special accessor for Decimal. But it is not a primitive type here.

val getValueMapKeyArray0 = CodeGenerator.getValue(s"$map.keyArray()", kt, "0")
val getValueMapValArray0 = CodeGenerator.getValue(s"$map.valueArray()", vt, "0")
val getValueMapKeyArray = CodeGenerator.getValue(s"$map.keyArray()", kt, loopIndex)
val getValueMapValArray = CodeGenerator.getValue(s"$map.valueArray()", vt, loopIndex)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getMapKeyArray and getMapValueArray

@@ -734,23 +734,26 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
val keyToStringFunc = dataToStringFunc("keyToString", kt)
val valueToStringFunc = dataToStringFunc("valueToString", vt)
val loopIndex = ctx.freshName("loopIndex")
val getValueMapKeyArray0 = CodeGenerator.getValue(s"$map.keyArray()", kt, "0")
val getValueMapValArray0 = CodeGenerator.getValue(s"$map.valueArray()", vt, "0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getMapFirstKey and getMapFirstValue

${javaType(elementType)} $elementA = ${getValue("a", elementType, "i")};
${javaType(elementType)} $elementB = ${getValue("b", elementType, "i")};
$jt $elementA = ${CodeGenerator.getValue("a", elementType, "i")};
$jt $elementB = ${CodeGenerator.getValue("b", elementType, "i")};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside CodegenContext, I think we can just do import CodeGenerator._.

@SparkQA
Copy link

SparkQA commented Mar 2, 2018

Test build #87881 has finished for PR 20700 at commit 9fed753.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rednaxelafx
Copy link
Contributor

Aha! Thanks @kiszk san for working on this! I really wanted the stateless methods to be extracted so that I can use more utils without having to pass around a CodegenContext for no good.

I was actually working on the exact same thing last week but got carried away on other tasks so my patch is still WIP.
For those curious, here's my WIP patch: rednaxelafx@249fb93

I was hoping we can cover the ones I extracted here: rednaxelafx@249fb93#diff-8bcc5aea39c73d4bf38aef6f6951d42cR1201
Your PR and my version extracted mostly the same fields/methods, which is great for me that I'll just help get this PR in instead of having to send mine out.

The approach is different, though. I'd like to have a discussion on the tradeoffs you had in mind when you picked your approach.
I initially did something similar, which is to extract stateless methods into companion object methods and change the original one to delegate to the companion one. But the old instance methods were a bad smell to me (passing in a instance when it shouldn't need the instance), so since we're at it, I thought why not just completely remove that bad smell once and for all.
So I deleted the original instance methods, and yes that made the diff huge because all use sites of those fields/methods have to be touched to switch to using the new version.

@kiszk WDYT? Were you intending to minimize the diff but still want to be able to access the stateless util methods as standalone functions?

@kiszk
Copy link
Member Author

kiszk commented Mar 2, 2018

@rednaxelafx Oh, very interesting since we are doing the similar thing in West coast and Japan!

I just say not refactored YET. Yeah, I absolutely love to delete old instance method if we totally agree. Is it OK to delete all of old method in this PR?
Some methods that very frequently used (e.g. javaType()) are not refactored yet.

@SparkQA
Copy link

SparkQA commented Mar 2, 2018

Test build #87886 has finished for PR 20700 at commit f589a2a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2018

Test build #87887 has finished for PR 20700 at commit ebce1f2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2018

Test build #87889 has finished for PR 20700 at commit 7da3e6b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2018

Test build #87894 has finished for PR 20700 at commit 4538ebc.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2018

Test build #87897 has finished for PR 20700 at commit 3837eff.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2018

Test build #87899 has finished for PR 20700 at commit 4069e1f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Mar 3, 2018

@rednaxelafx thanks, I integrated some of your changes into PR.

@hvanhovell
Copy link
Contributor

@kiszk can you update the PR description?

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - as stated before update the description and then we should be gtg.

@kiszk
Copy link
Member Author

kiszk commented Mar 5, 2018

Good catch. I removed the old statement from the description.

@hvanhovell
Copy link
Contributor

Merging to master. Thanks!

@hvanhovell
Copy link
Contributor

I am having some problems with the merge script, give me a little bit of time.

@asfgit asfgit closed this in 2ce37b5 Mar 5, 2018
@cloud-fan
Copy link
Contributor

a late LGTM :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants