[SPARK-33292][SQL] Make Literal ArrayBasedMapData string representation disambiguous #30190

dongjoon-hyun · 2020-10-29T23:00:43Z

What changes were proposed in this pull request?

This PR aims to wrap ArrayBasedMapData literal representation with map(...).

Why are the changes needed?

Literal ArrayBasedMapData has inconsistent string representation from LogicalPlan to Optimized Logical Plan/Physical Plan. Also, the representation at Optimized Logical Plan and Physical Plan is ambiguous like [1 AS a#0, keys: [key1], values: [value1] AS b#1].

BEFORE

scala> spark.version
res0: String = 2.4.7

scala> sql("SELECT 1 a, map('key1', 'value1') b").explain(true)
== Parsed Logical Plan ==
'Project [1 AS a#0, 'map(key1, value1) AS b#1]
+- OneRowRelation

== Analyzed Logical Plan ==
a: int, b: map<string,string>
Project [1 AS a#0, map(key1, value1) AS b#1]
+- OneRowRelation

== Optimized Logical Plan ==
Project [1 AS a#0, keys: [key1], values: [value1] AS b#1]
+- OneRowRelation

== Physical Plan ==
*(1) Project [1 AS a#0, keys: [key1], values: [value1] AS b#1]
+- Scan OneRowRelation[]

AFTER

scala> spark.version
res0: String = 3.1.0-SNAPSHOT

scala> sql("SELECT 1 a, map('key1', 'value1') b").explain(true)
== Parsed Logical Plan ==
'Project [1 AS a#4, 'map(key1, value1) AS b#5]
+- OneRowRelation

== Analyzed Logical Plan ==
a: int, b: map<string,string>
Project [1 AS a#4, map(key1, value1) AS b#5]
+- OneRowRelation

== Optimized Logical Plan ==
Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5]
+- OneRowRelation

== Physical Plan ==
*(1) Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5]
+- *(1) Scan OneRowRelation[]

Does this PR introduce any user-facing change?

Yes. This changes the query plan's string representation in explain command and UI. However, this is a bug fix.

How was this patch tested?

Pass the CI with the newly added test case.

…on disambiguous

SparkQA · 2020-10-29T23:59:08Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35023/

SparkQA · 2020-10-30T00:21:53Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35023/

dongjoon-hyun · 2020-10-30T00:37:36Z

Could you review this please, @viirya and @maropu ?

viirya · 2020-10-30T00:46:51Z

Seems it is still inconsistent between Logical Plan and Optimized Logical Plan? Although it is better after this change.

== Analyzed Logical Plan ==
a: int, b: map<string,string>
Project [1 AS a#4, map(key1, value1) AS b#5]
+- OneRowRelation

== Optimized Logical Plan ==
Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5]
+- OneRowRelation

maropu · 2020-10-30T00:48:57Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala

@@ -297,6 +297,7 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression {
  override def toString: String = value match {
    case null => "null"
    case binary: Array[Byte] => s"0x" + DatatypeConverter.printHexBinary(binary)
+    case d: ArrayBasedMapData => s"map(${d.toString})"


Just a question; any reason not to update ArrayBasedMapData#toString instead?

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala

Lines 35 to 37 in cbd3fde

override def toString: String = {

s"keys: $keyArray, values: $valueArray"

}

Yes. It's because that class is MapData technically, not Map.

The static literal map's string representation happens to be ArrayBasedMapData for now, but it was just one of design choice and can be changed later.

dongjoon-hyun · 2020-10-30T00:49:25Z

Yes, @viirya . That's intentional because it's ArrayBasedMapData. This PR didn't want to change ArrayBasedMapData. The main point is grouping with map(...) to remove disambiguity when we attached #5.

The consistency here is keeping map(...) consistently across all plans.

maropu · 2020-10-30T01:35:32Z

The main point is grouping with map(...) to remove disambiguity when we attached #5.

This update seems fine to me. But, as @viirya suggested above, since consistent string forms for maps looks better, I also think it might be worth fixing it in future work.

maropu

LGTM if the tests pass.

dongjoon-hyun · 2020-10-30T02:09:45Z

Thank you, @maropu and @viirya !

…on disambiguous ### What changes were proposed in this pull request? This PR aims to wrap `ArrayBasedMapData` literal representation with `map(...)`. ### Why are the changes needed? Literal ArrayBasedMapData has inconsistent string representation from `LogicalPlan` to `Optimized Logical Plan/Physical Plan`. Also, the representation at `Optimized Logical Plan` and `Physical Plan` is ambiguous like `[1 AS a#0, keys: [key1], values: [value1] AS b#1]`. **BEFORE** ```scala scala> spark.version res0: String = 2.4.7 scala> sql("SELECT 1 a, map('key1', 'value1') b").explain(true) == Parsed Logical Plan == 'Project [1 AS a#0, 'map(key1, value1) AS b#1] +- OneRowRelation == Analyzed Logical Plan == a: int, b: map<string,string> Project [1 AS a#0, map(key1, value1) AS b#1] +- OneRowRelation == Optimized Logical Plan == Project [1 AS a#0, keys: [key1], values: [value1] AS b#1] +- OneRowRelation == Physical Plan == *(1) Project [1 AS a#0, keys: [key1], values: [value1] AS b#1] +- Scan OneRowRelation[] ``` **AFTER** ```scala scala> spark.version res0: String = 3.1.0-SNAPSHOT scala> sql("SELECT 1 a, map('key1', 'value1') b").explain(true) == Parsed Logical Plan == 'Project [1 AS a#4, 'map(key1, value1) AS b#5] +- OneRowRelation == Analyzed Logical Plan == a: int, b: map<string,string> Project [1 AS a#4, map(key1, value1) AS b#5] +- OneRowRelation == Optimized Logical Plan == Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5] +- OneRowRelation == Physical Plan == *(1) Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5] +- *(1) Scan OneRowRelation[] ``` ### Does this PR introduce _any_ user-facing change? Yes. This changes the query plan's string representation in `explain` command and UI. However, this is a bug fix. ### How was this patch tested? Pass the CI with the newly added test case. Closes #30190 from dongjoon-hyun/SPARK-33292. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 838791b) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

dongjoon-hyun · 2020-10-30T02:11:45Z

Merged to master/3.0

SparkQA · 2020-10-30T04:06:06Z

Test build #130419 has finished for PR 30190 at commit 0fea5a9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

[SPARK-33292][SQL] Make Literal ArrayBasedMapData string representati…

0fea5a9

…on disambiguous

maropu reviewed Oct 30, 2020

View reviewed changes

maropu approved these changes Oct 30, 2020

View reviewed changes

viirya approved these changes Oct 30, 2020

View reviewed changes

dongjoon-hyun closed this in 838791b Oct 30, 2020

dongjoon-hyun deleted the SPARK-33292 branch October 30, 2020 02:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-33292][SQL] Make Literal ArrayBasedMapData string representation disambiguous #30190

[SPARK-33292][SQL] Make Literal ArrayBasedMapData string representation disambiguous #30190

dongjoon-hyun commented Oct 29, 2020 •

edited

Loading

SparkQA commented Oct 29, 2020

SparkQA commented Oct 30, 2020

dongjoon-hyun commented Oct 30, 2020

viirya commented Oct 30, 2020

maropu Oct 30, 2020

dongjoon-hyun Oct 30, 2020

dongjoon-hyun Oct 30, 2020 •

edited

Loading

dongjoon-hyun commented Oct 30, 2020 •

edited

Loading

maropu commented Oct 30, 2020

maropu left a comment

dongjoon-hyun commented Oct 30, 2020

dongjoon-hyun commented Oct 30, 2020 •

edited

Loading

SparkQA commented Oct 30, 2020

	override def toString: String = {
	s"keys: $keyArray, values: $valueArray"
	}

[SPARK-33292][SQL] Make Literal ArrayBasedMapData string representation disambiguous #30190

[SPARK-33292][SQL] Make Literal ArrayBasedMapData string representation disambiguous #30190

Conversation

dongjoon-hyun commented Oct 29, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Oct 29, 2020

SparkQA commented Oct 30, 2020

dongjoon-hyun commented Oct 30, 2020

viirya commented Oct 30, 2020

maropu Oct 30, 2020

Choose a reason for hiding this comment

dongjoon-hyun Oct 30, 2020

Choose a reason for hiding this comment

dongjoon-hyun Oct 30, 2020 • edited Loading

Choose a reason for hiding this comment

dongjoon-hyun commented Oct 30, 2020 • edited Loading

maropu commented Oct 30, 2020

maropu left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Oct 30, 2020

dongjoon-hyun commented Oct 30, 2020 • edited Loading

SparkQA commented Oct 30, 2020

dongjoon-hyun commented Oct 29, 2020 •

edited

Loading

dongjoon-hyun Oct 30, 2020 •

edited

Loading

dongjoon-hyun commented Oct 30, 2020 •

edited

Loading

dongjoon-hyun commented Oct 30, 2020 •

edited

Loading