Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33292][SQL] Make Literal ArrayBasedMapData string representation disambiguous #30190

Closed
wants to merge 1 commit into from
Closed

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Oct 29, 2020

What changes were proposed in this pull request?

This PR aims to wrap ArrayBasedMapData literal representation with map(...).

Why are the changes needed?

Literal ArrayBasedMapData has inconsistent string representation from LogicalPlan to Optimized Logical Plan/Physical Plan. Also, the representation at Optimized Logical Plan and Physical Plan is ambiguous like [1 AS a#0, keys: [key1], values: [value1] AS b#1].

BEFORE

scala> spark.version
res0: String = 2.4.7

scala> sql("SELECT 1 a, map('key1', 'value1') b").explain(true)
== Parsed Logical Plan ==
'Project [1 AS a#0, 'map(key1, value1) AS b#1]
+- OneRowRelation

== Analyzed Logical Plan ==
a: int, b: map<string,string>
Project [1 AS a#0, map(key1, value1) AS b#1]
+- OneRowRelation

== Optimized Logical Plan ==
Project [1 AS a#0, keys: [key1], values: [value1] AS b#1]
+- OneRowRelation

== Physical Plan ==
*(1) Project [1 AS a#0, keys: [key1], values: [value1] AS b#1]
+- Scan OneRowRelation[]

AFTER

scala> spark.version
res0: String = 3.1.0-SNAPSHOT

scala> sql("SELECT 1 a, map('key1', 'value1') b").explain(true)
== Parsed Logical Plan ==
'Project [1 AS a#4, 'map(key1, value1) AS b#5]
+- OneRowRelation

== Analyzed Logical Plan ==
a: int, b: map<string,string>
Project [1 AS a#4, map(key1, value1) AS b#5]
+- OneRowRelation

== Optimized Logical Plan ==
Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5]
+- OneRowRelation

== Physical Plan ==
*(1) Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5]
+- *(1) Scan OneRowRelation[]

Does this PR introduce any user-facing change?

Yes. This changes the query plan's string representation in explain command and UI. However, this is a bug fix.

How was this patch tested?

Pass the CI with the newly added test case.

@SparkQA
Copy link

SparkQA commented Oct 29, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35023/

@SparkQA
Copy link

SparkQA commented Oct 30, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35023/

@dongjoon-hyun
Copy link
Member Author

Could you review this please, @viirya and @maropu ?

@viirya
Copy link
Member

viirya commented Oct 30, 2020

Seems it is still inconsistent between Logical Plan and Optimized Logical Plan? Although it is better after this change.

== Analyzed Logical Plan ==
a: int, b: map<string,string>
Project [1 AS a#4, map(key1, value1) AS b#5]
+- OneRowRelation

== Optimized Logical Plan ==
Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5]
+- OneRowRelation

@@ -297,6 +297,7 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression {
override def toString: String = value match {
case null => "null"
case binary: Array[Byte] => s"0x" + DatatypeConverter.printHexBinary(binary)
case d: ArrayBasedMapData => s"map(${d.toString})"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question; any reason not to update ArrayBasedMapData#toString instead?

override def toString: String = {
s"keys: $keyArray, values: $valueArray"
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's because that class is MapData technically, not Map.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static literal map's string representation happens to be ArrayBasedMapData for now, but it was just one of design choice and can be changed later.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Oct 30, 2020

Yes, @viirya . That's intentional because it's ArrayBasedMapData. This PR didn't want to change ArrayBasedMapData. The main point is grouping with map(...) to remove disambiguity when we attached #5.

The consistency here is keeping map(...) consistently across all plans.

@maropu
Copy link
Member

maropu commented Oct 30, 2020

The main point is grouping with map(...) to remove disambiguity when we attached #5.

This update seems fine to me. But, as @viirya suggested above, since consistent string forms for maps looks better, I also think it might be worth fixing it in future work.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if the tests pass.

@dongjoon-hyun
Copy link
Member Author

Thank you, @maropu and @viirya !

dongjoon-hyun added a commit that referenced this pull request Oct 30, 2020
…on disambiguous

### What changes were proposed in this pull request?

This PR aims to wrap `ArrayBasedMapData` literal representation with `map(...)`.

### Why are the changes needed?

Literal ArrayBasedMapData has inconsistent string representation from `LogicalPlan` to `Optimized Logical Plan/Physical Plan`. Also, the representation at `Optimized Logical Plan` and `Physical Plan` is ambiguous like `[1 AS a#0, keys: [key1], values: [value1] AS b#1]`.

**BEFORE**
```scala
scala> spark.version
res0: String = 2.4.7

scala> sql("SELECT 1 a, map('key1', 'value1') b").explain(true)
== Parsed Logical Plan ==
'Project [1 AS a#0, 'map(key1, value1) AS b#1]
+- OneRowRelation

== Analyzed Logical Plan ==
a: int, b: map<string,string>
Project [1 AS a#0, map(key1, value1) AS b#1]
+- OneRowRelation

== Optimized Logical Plan ==
Project [1 AS a#0, keys: [key1], values: [value1] AS b#1]
+- OneRowRelation

== Physical Plan ==
*(1) Project [1 AS a#0, keys: [key1], values: [value1] AS b#1]
+- Scan OneRowRelation[]
```

**AFTER**
```scala
scala> spark.version
res0: String = 3.1.0-SNAPSHOT

scala> sql("SELECT 1 a, map('key1', 'value1') b").explain(true)
== Parsed Logical Plan ==
'Project [1 AS a#4, 'map(key1, value1) AS b#5]
+- OneRowRelation

== Analyzed Logical Plan ==
a: int, b: map<string,string>
Project [1 AS a#4, map(key1, value1) AS b#5]
+- OneRowRelation

== Optimized Logical Plan ==
Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5]
+- OneRowRelation

== Physical Plan ==
*(1) Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5]
+- *(1) Scan OneRowRelation[]
```

### Does this PR introduce _any_ user-facing change?

Yes. This changes the query plan's string representation in `explain` command and UI. However, this is a bug fix.

### How was this patch tested?

Pass the CI with the newly added test case.

Closes #30190 from dongjoon-hyun/SPARK-33292.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 838791b)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun added a commit that referenced this pull request Oct 30, 2020
…on disambiguous

### What changes were proposed in this pull request?

This PR aims to wrap `ArrayBasedMapData` literal representation with `map(...)`.

### Why are the changes needed?

Literal ArrayBasedMapData has inconsistent string representation from `LogicalPlan` to `Optimized Logical Plan/Physical Plan`. Also, the representation at `Optimized Logical Plan` and `Physical Plan` is ambiguous like `[1 AS a#0, keys: [key1], values: [value1] AS b#1]`.

**BEFORE**
```scala
scala> spark.version
res0: String = 2.4.7

scala> sql("SELECT 1 a, map('key1', 'value1') b").explain(true)
== Parsed Logical Plan ==
'Project [1 AS a#0, 'map(key1, value1) AS b#1]
+- OneRowRelation

== Analyzed Logical Plan ==
a: int, b: map<string,string>
Project [1 AS a#0, map(key1, value1) AS b#1]
+- OneRowRelation

== Optimized Logical Plan ==
Project [1 AS a#0, keys: [key1], values: [value1] AS b#1]
+- OneRowRelation

== Physical Plan ==
*(1) Project [1 AS a#0, keys: [key1], values: [value1] AS b#1]
+- Scan OneRowRelation[]
```

**AFTER**
```scala
scala> spark.version
res0: String = 3.1.0-SNAPSHOT

scala> sql("SELECT 1 a, map('key1', 'value1') b").explain(true)
== Parsed Logical Plan ==
'Project [1 AS a#4, 'map(key1, value1) AS b#5]
+- OneRowRelation

== Analyzed Logical Plan ==
a: int, b: map<string,string>
Project [1 AS a#4, map(key1, value1) AS b#5]
+- OneRowRelation

== Optimized Logical Plan ==
Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5]
+- OneRowRelation

== Physical Plan ==
*(1) Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5]
+- *(1) Scan OneRowRelation[]
```

### Does this PR introduce _any_ user-facing change?

Yes. This changes the query plan's string representation in `explain` command and UI. However, this is a bug fix.

### How was this patch tested?

Pass the CI with the newly added test case.

Closes #30190 from dongjoon-hyun/SPARK-33292.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 838791b)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Oct 30, 2020

Merged to master/3.0

@dongjoon-hyun dongjoon-hyun deleted the SPARK-33292 branch October 30, 2020 02:11
@SparkQA
Copy link

SparkQA commented Oct 30, 2020

Test build #130419 has finished for PR 30190 at commit 0fea5a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants