Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23582][SQL] StaticInvoke should support interpreted execution #20753

Closed
wants to merge 13 commits into from

Conversation

kiszk
Copy link
Member

@kiszk kiszk commented Mar 7, 2018

What changes were proposed in this pull request?

This pr added interpreted execution for StaticInvoke.

How was this patch tested?

Added tests in ObjectExpressionsSuite.

@SparkQA
Copy link

SparkQA commented Mar 7, 2018

Test build #88028 has finished for PR 20753 at commit 0790dea.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • throw new RuntimeException(\"The static class cannot be null.\")

val parms = arguments.map(e => e.eval(input).asInstanceOf[Object])
val method = staticObject.getDeclaredMethod(functionName, parmTypes : _*)
val ret = method.invoke(null, parms : _*)
val retClass = CallMethodViaReflection.typeMapping.getOrElse(dataType,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to support more types (e.g. ArrayType)

@kiszk
Copy link
Member Author

kiszk commented Mar 7, 2018

ping @hvanhovell

val method = staticObject.getDeclaredMethod(functionName, parmTypes : _*)
val ret = method.invoke(null, parms : _*)
val retClass = CallMethodViaReflection.typeMapping.getOrElse(dataType,
Seq(dataType.asInstanceOf[ObjectType].cls))(0)
Copy link
Member

@viirya viirya Mar 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will dataType always be an ObjectType here? If returned data type is CalendarIntervalType?


val parmTypes = arguments.map(e =>
CallMethodViaReflection.typeMapping.getOrElse(e.dataType,
Seq(e.dataType.asInstanceOf[ObjectType].cls))(0))
Copy link
Member

@viirya viirya Mar 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The external types of native types CalendarIntervalType and BinaryType are not ObjectType. If we have arguments in those types, we may not use ObjectType.

Copy link
Member Author

@kiszk kiszk Mar 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I have to support other types before merging this change.
This is a small prototype for discussing whether we use reflection or not.

@SparkQA
Copy link

SparkQA commented Mar 7, 2018

Test build #88029 has finished for PR 20753 at commit f570692.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

override def eval(input: InternalRow): Any =
throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
override def eval(input: InternalRow): Any = {
if (staticObject == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this check?

Copy link
Contributor

@rednaxelafx rednaxelafx Mar 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this check, for sure. See line 132,

val objectName = staticObject.getName.stripSuffix("$")

^^ this will run as a part of the constructor, which will throw an NPE if staticObject is null, so it's redundant to null check it here.

val parmTypes = arguments.map(e =>
CallMethodViaReflection.typeMapping.getOrElse(e.dataType,
Seq(e.dataType.asInstanceOf[ObjectType].cls))(0))
val parms = arguments.map(e => e.eval(input).asInstanceOf[Object])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need null checks here for inputs? Also, can we add a common function in InvokeLike to handle input arguments for other InvokeLike eprs? (I mean the interpreted version of InvokeLike.prepareArguments).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is good since this would be used in 'Invoke', too. After supporting other types, I will create an utility function.

To discuss design choice (reflection or method handle), I put earlier version.

@kiszk
Copy link
Member Author

kiszk commented Mar 7, 2018

@hvanhovell Is it better to implement method handler approach, too?

@hvanhovell
Copy link
Contributor

@kiszk I think we should benchmark it. My rational for considering method handles is that they seem to be made for this purpose, and they should become more performant with newer versions of java.

@kiszk
Copy link
Member Author

kiszk commented Mar 7, 2018

Which use case is typical? The same method is called many times. Are different method called?

If the same method is called, reflection can also improve performance by creating custom bytecode.

@hvanhovell
Copy link
Contributor

Ok, let's go with reflection then.

cc @rednaxelafx

@hvanhovell
Copy link
Contributor

and cc @viirya

@kiszk
Copy link
Member Author

kiszk commented Mar 7, 2018

Sure

CallMethodViaReflection.typeMapping.getOrElse(e.dataType,
Seq(e.dataType.asInstanceOf[ObjectType].cls))(0))
val parms = arguments.map(e => e.eval(input).asInstanceOf[Object])
val method = staticObject.getDeclaredMethod(functionName, parmTypes : _*)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can also move getDeclaredMethod outside eval?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this movement caused failure of reflection.

Copy link
Contributor

@rednaxelafx rednaxelafx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment following @maropu 's.

BTW, on the choice between reflection and method handles, the latter might be slightly faster if the code pattern is right (that the underlying JVM likes...). But in general invoking via old school reflection and invoking via MethodHandles are not that different. We always have the option to put the reflection-based version in first and then see if we can further improve it with MethodHandles.

override def eval(input: InternalRow): Any =
throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
override def eval(input: InternalRow): Any = {
if (staticObject == null) {
Copy link
Contributor

@rednaxelafx rednaxelafx Mar 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this check, for sure. See line 132,

val objectName = staticObject.getName.stripSuffix("$")

^^ this will run as a part of the constructor, which will throw an NPE if staticObject is null, so it's redundant to null check it here.

@rednaxelafx
Copy link
Contributor

@hvanhovell @kiszk Yes, the old school JDK reflection does perform bytecode generation to speed it up, since the JDK1.4 era. But the advantage of fully optimized MethodHandle versus reflection Method.invoke is that the former performs privilege checks at MethodHandle creation time, whereas the latter is required to perform privilege checks at every invocation (and that's not covered by the generated bytecode). So theoretically the MethodHandle version can be faster.

But note that even in OpenJDK8, the HotSpot VM is still exploring ways to optimize MethodHandle performance, and it's somewhat picky about the shape of the MethodHandles, so it's not like we will always get better performance just by sticking it in place of reflection. It'll need some tuning / trial-and-error. We don't have to get it right in the first cut.

@kiszk kiszk force-pushed the SPARK-23582 branch 2 times, most recently from f9f5701 to 9a36442 Compare March 8, 2018 13:35
@SparkQA
Copy link

SparkQA commented Mar 8, 2018

Test build #88090 has finished for PR 20753 at commit 9a36442.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 8, 2018

Test build #88091 has finished for PR 20753 at commit 7d107a6.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 8, 2018

Test build #88093 has finished for PR 20753 at commit 91a454c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 8, 2018

Test build #88097 has finished for PR 20753 at commit eb6171a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 8, 2018

Test build #88089 has finished for PR 20753 at commit f9f5701.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 8, 2018

Test build #88100 has finished for PR 20753 at commit f74d48e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

ret
} else {
// cast a primitive value using Boxed class
val boxedClass = CallMethodViaReflection.typeBoxedJavaMapping(dataType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can directly do like this without the if.

val boxedClass = CallMethodViaReflection.typeBoxedJavaMapping.get(dataType)
boxedClass.map(_.cast(ret)).getOrElse(ret)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. When discussions related to typeJavaMapping, typeBoxedJavaMapping`, and others are fixed, I will address this.

case ObjectType(cls) => cls
case _ => typeJavaMapping.getOrElse(dt, classOf[java.lang.Object])
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above should be in CallMethodViaReflection or CodeGenerator?

@@ -95,6 +162,21 @@ class ObjectExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
checkEvaluation(createExternalRow, Row.fromSeq(Seq(1, "x")), InternalRow.fromSeq(Seq()))
}

// This is an alternative version of `checkEvaluation` to compare results
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is intentionally imported for testing from #20757

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this pr is merged first, I'll remove this in my pr.

@kiszk
Copy link
Member Author

kiszk commented Mar 19, 2018

ping @hvanhovell

1 similar comment
@kiszk
Copy link
Member Author

kiszk commented Mar 26, 2018

ping @hvanhovell

@SparkQA
Copy link

SparkQA commented Mar 28, 2018

Test build #88656 has finished for PR 20753 at commit 09cdf5e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Mar 28, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Mar 28, 2018

Test build #88663 has finished for PR 20753 at commit 09cdf5e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Mar 28, 2018

org.apache.spark.sql.execution.streaming.sources.RateSourceSuite.overflow is not a problem for this PR.
Other PRs also fails due to this test.

@kiszk
Copy link
Member Author

kiszk commented Mar 29, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Mar 29, 2018

Test build #88687 has finished for PR 20753 at commit 09cdf5e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Apr 2, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Apr 2, 2018

Test build #88822 has finished for PR 20753 at commit 09cdf5e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 4, 2018

Test build #88901 has finished for PR 20753 at commit 0c61e60.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 4, 2018

Test build #88905 has finished for PR 20753 at commit 984cee7.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 5, 2018

Test build #88922 has finished for PR 20753 at commit 7b30c30.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - merging to master. Thanks!

@asfgit asfgit closed this in 1822ecd Apr 5, 2018
robert3005 pushed a commit to palantir/spark that referenced this pull request Apr 7, 2018
## What changes were proposed in this pull request?

This pr added interpreted execution for `StaticInvoke`.

## How was this patch tested?

Added tests in `ObjectExpressionsSuite`.

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes apache#20753 from kiszk/SPARK-23582.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants