Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23589][SQL] ExternalMapToCatalyst should support interpreted execution #20980

Closed
wants to merge 4 commits into from

Conversation

maropu
Copy link
Member

@maropu maropu commented Apr 5, 2018

What changes were proposed in this pull request?

This pr supported interpreted mode for ExternalMapToCatalyst.

How was this patch tested?

Added tests in ObjectExpressionsSuite.

override def eval(input: InternalRow): Any = {
val result = child.eval(input)
if (result != null) {
val mapValue = result.asInstanceOf[Map[Any, Any]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The external map can be java.util.Map or scala.collection.Map.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@SparkQA
Copy link

SparkQA commented Apr 5, 2018

Test build #88924 has finished for PR 20980 at commit 52e76d7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 6, 2018

Test build #88980 has finished for PR 20980 at commit 8783b2b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Apr 7, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Apr 7, 2018

Test build #89005 has finished for PR 20980 at commit 8783b2b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Apr 11, 2018

ping

@SparkQA
Copy link

SparkQA commented Apr 19, 2018

Test build #89549 has finished for PR 20980 at commit b13f893.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Apr 19, 2018

retest this please

val result = child.eval(input)
if (result != null) {
val (keys, values) = mapCatalystConverter(result)
new ArrayBasedMapData(ArrayData.toArrayData(keys), ArrayData.toArrayData(values))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is probably Ok since ArrayData.toArrayData calls new GenericArrayData(any) which calls GenericArrayData.anyToSeq which converts the array into a seq (WrappedArray) which is then converted back into an array. It might copy at the end, but I am not entirely sure.

Can we cut the red tape here and just create GenericArrayData objects on the arrays directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, I also think you're right. I'll fix.

}
}

test("SPARK-23589 ExternalMapToCatalyst should support interpreted execution") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test for a null key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -357,7 +357,8 @@ object JavaTypeInference {
}
}

private def serializerFor(inputObject: Expression, typeToken: TypeToken[_]): Expression = {
private[catalyst] def serializerFor(
inputObject: Expression, typeToken: TypeToken[_]): Expression = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: put typeToken on a new line?

@SparkQA
Copy link

SparkQA commented Apr 19, 2018

Test build #89564 has finished for PR 20980 at commit b13f893.

  • This patch fails from timeout after a configured wait of `300m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 20, 2018

Test build #89606 has finished for PR 20980 at commit ec508cf.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Apr 20, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Apr 20, 2018

Test build #89619 has finished for PR 20980 at commit ec508cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
}

test("SPARK-23589 ExternalMapToCatalyst should support interpreted execution") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing, can you please directly add ExternalMapToCatalyst expressions here. Using javaSerializerFor for this is pretty confusing and might cause us to test a different code path at some point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@hvanhovell
Copy link
Contributor

@maropu one more thing, otherwise LGTM.

@SparkQA
Copy link

SparkQA commented Apr 23, 2018

Test build #89698 has finished for PR 20980 at commit eaef6b3.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Apr 23, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Apr 23, 2018

Test build #89709 has finished for PR 20980 at commit eaef6b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

private def scalaMapSerializerFor[T: TypeTag, U: TypeTag](inputObject: Expression): Expression = {
import org.apache.spark.sql.catalyst.ScalaReflection._

val curId = new java.util.concurrent.atomic.AtomicInteger()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this?

val entry = iter.next()
val (key, value) = (entry.getKey, entry.getValue)
keys(i) = if (key != null) {
keyConverter.eval(InternalRow.fromSeq(key :: Nil))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reuse the InternalRow.

var i = 0
for ((key, value) <- data) {
keys(i) = if (key != null) {
keyConverter.eval(InternalRow.fromSeq(key :: Nil))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - merging to master. Can you address the nits in a follow-up? Thanks!

@asfgit asfgit closed this in afbdf42 Apr 23, 2018
@maropu
Copy link
Member Author

maropu commented Apr 23, 2018

ok, Thanks!

asfgit pushed a commit that referenced this pull request Apr 24, 2018
…yst eval

## What changes were proposed in this pull request?
This pr is a follow-up of #20980 and fixes code to reuse `InternalRow` for converting input keys/values in `ExternalMapToCatalyst` eval.

## How was this patch tested?
Existing tests.

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes #21137 from maropu/SPARK-23589-FOLLOWUP.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants