Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23588][SQL] CatalystToExternalMap should support interpreted execution #20979

Closed
wants to merge 2 commits into from

Conversation

maropu
Copy link
Member

@maropu maropu commented Apr 5, 2018

What changes were proposed in this pull request?

This pr supported interpreted mode for CatalystToExternalMap.

How was this patch tested?

Added tests in ObjectExpressionsSuite.

private lazy val toScalaValue: Any => Any = {
assert(inputData.dataType.isInstanceOf[MapType])
val mapType = inputData.dataType.asInstanceOf[MapType]
CatalystTypeConverters.createToScalaConverter(mapType)
Copy link
Member

@viirya viirya Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, don't we need to support different types of resulting collection specified by collClass?

Copy link
Member Author

@maropu maropu Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, you're right. I'll update soon.
I checked the related code; it seems the input type is always the same with the output type specified by collClass? ex.)

scala> Seq(Map(1 -> 1, 2 -> 2)).toDS.as[Map[Long, String]].map(y => y).explain(true)
== Physical Plan ==
*(1) SerializeFromObject [externalmaptocatalyst(ExternalMapToCatalyst_key34, false, LongType, lambdavariable(ExternalMapToCatalyst_key34, false, LongType, false), ExternalMapToCatalyst_value34, ExternalMapToCatalyst_value_isNull34, ObjectType(class java.lang.String), staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, lambdavariable(ExternalMapToCatalyst_value34, ExternalMapToCatalyst_value_isNull34, ObjectType(class java.lang.String), true), true, false), input[0, scala.collection.immutable.Map, true]) AS value#137]
+- *(1) MapElements <function1>, obj#136: scala.collection.immutable.Map
   +- *(1) DeserializeToObject catalysttoexternalmap(CatalystToExternalMap_keyLoopValue33, lambdavariable(CatalystToExternalMap_keyLoopValue33, , LongType, false), CatalystToExternalMap_valueLoopValue33, CatalystToExternalMap_valueLoopIsNull33, lambdavariable(CatalystToExternalMap_valueLoopValue33, CatalystToExternalMap_valueLoopIsNull33, StringType, true).toString, cast(value#130 as map<bigint,string>), interface scala.collection.immutable.Map, MapType(LongType,StringType,true)), obj#135: scala.collection.immutable.Map
      +- LocalTableScan [value#130]

deserializerFor adds casts implicitly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I buy that argument. This expression can produce a map type other than scala.collection.immutable.Map, so we should either implement this, or we should remove the ability to produce different map types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll consider again.

@SparkQA
Copy link

SparkQA commented Apr 5, 2018

Test build #88925 has finished for PR 20979 at commit 352777b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Apr 5, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Apr 5, 2018

Test build #88935 has finished for PR 20979 at commit 352777b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 11, 2018

Test build #89201 has finished for PR 20979 at commit acfad49.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 19, 2018

Test build #89548 has finished for PR 20979 at commit 17a0bca.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

private lazy val valueConverter =
CatalystTypeConverters.createToScalaConverter(inputMapType.valueType)

private def newMapBuilder(): Builder[AnyRef, AnyRef] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make sure the builder only get's resolved once per expression instead of once per row. Can you address this in a follow-up?

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hvanhovell
Copy link
Contributor

Disregard my earlier comment. Merging to master. Thanks! Can you address my last comment in a follow-up?

@asfgit asfgit closed this in e134165 Apr 19, 2018
@SparkQA
Copy link

SparkQA commented Apr 19, 2018

Test build #89567 has finished for PR 20979 at commit 07f4c82.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Apr 20, 2018

ok, thanks!

ghost pushed a commit to dbtsai/spark that referenced this pull request Apr 20, 2018
…ion in CatalystToExternalMap

## What changes were proposed in this pull request?
This pr is a follow-up pr of apache#20979 and fixes code to resolve a map builder method per execution instead of per row in `CatalystToExternalMap`.

## How was this patch tested?
Existing tests.

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes apache#21112 from maropu/SPARK-23588-FOLLOWUP.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants