-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33292][SQL] Make Literal ArrayBasedMapData string representation disambiguous #30190
Conversation
Kubernetes integration test starting |
Kubernetes integration test status success |
Seems it is still inconsistent between Logical Plan and Optimized Logical Plan? Although it is better after this change. == Analyzed Logical Plan ==
a: int, b: map<string,string>
Project [1 AS a#4, map(key1, value1) AS b#5]
+- OneRowRelation
== Optimized Logical Plan ==
Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5]
+- OneRowRelation |
@@ -297,6 +297,7 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression { | |||
override def toString: String = value match { | |||
case null => "null" | |||
case binary: Array[Byte] => s"0x" + DatatypeConverter.printHexBinary(binary) | |||
case d: ArrayBasedMapData => s"map(${d.toString})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question; any reason not to update ArrayBasedMapData#toString
instead?
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala
Lines 35 to 37 in cbd3fde
override def toString: String = { | |
s"keys: $keyArray, values: $valueArray" | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It's because that class is MapData
technically, not Map
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The static literal map's string representation happens to be ArrayBasedMapData
for now, but it was just one of design choice and can be changed later.
Yes, @viirya . That's intentional because it's The consistency here is keeping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if the tests pass.
…on disambiguous ### What changes were proposed in this pull request? This PR aims to wrap `ArrayBasedMapData` literal representation with `map(...)`. ### Why are the changes needed? Literal ArrayBasedMapData has inconsistent string representation from `LogicalPlan` to `Optimized Logical Plan/Physical Plan`. Also, the representation at `Optimized Logical Plan` and `Physical Plan` is ambiguous like `[1 AS a#0, keys: [key1], values: [value1] AS b#1]`. **BEFORE** ```scala scala> spark.version res0: String = 2.4.7 scala> sql("SELECT 1 a, map('key1', 'value1') b").explain(true) == Parsed Logical Plan == 'Project [1 AS a#0, 'map(key1, value1) AS b#1] +- OneRowRelation == Analyzed Logical Plan == a: int, b: map<string,string> Project [1 AS a#0, map(key1, value1) AS b#1] +- OneRowRelation == Optimized Logical Plan == Project [1 AS a#0, keys: [key1], values: [value1] AS b#1] +- OneRowRelation == Physical Plan == *(1) Project [1 AS a#0, keys: [key1], values: [value1] AS b#1] +- Scan OneRowRelation[] ``` **AFTER** ```scala scala> spark.version res0: String = 3.1.0-SNAPSHOT scala> sql("SELECT 1 a, map('key1', 'value1') b").explain(true) == Parsed Logical Plan == 'Project [1 AS a#4, 'map(key1, value1) AS b#5] +- OneRowRelation == Analyzed Logical Plan == a: int, b: map<string,string> Project [1 AS a#4, map(key1, value1) AS b#5] +- OneRowRelation == Optimized Logical Plan == Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5] +- OneRowRelation == Physical Plan == *(1) Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5] +- *(1) Scan OneRowRelation[] ``` ### Does this PR introduce _any_ user-facing change? Yes. This changes the query plan's string representation in `explain` command and UI. However, this is a bug fix. ### How was this patch tested? Pass the CI with the newly added test case. Closes #30190 from dongjoon-hyun/SPARK-33292. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 838791b) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…on disambiguous ### What changes were proposed in this pull request? This PR aims to wrap `ArrayBasedMapData` literal representation with `map(...)`. ### Why are the changes needed? Literal ArrayBasedMapData has inconsistent string representation from `LogicalPlan` to `Optimized Logical Plan/Physical Plan`. Also, the representation at `Optimized Logical Plan` and `Physical Plan` is ambiguous like `[1 AS a#0, keys: [key1], values: [value1] AS b#1]`. **BEFORE** ```scala scala> spark.version res0: String = 2.4.7 scala> sql("SELECT 1 a, map('key1', 'value1') b").explain(true) == Parsed Logical Plan == 'Project [1 AS a#0, 'map(key1, value1) AS b#1] +- OneRowRelation == Analyzed Logical Plan == a: int, b: map<string,string> Project [1 AS a#0, map(key1, value1) AS b#1] +- OneRowRelation == Optimized Logical Plan == Project [1 AS a#0, keys: [key1], values: [value1] AS b#1] +- OneRowRelation == Physical Plan == *(1) Project [1 AS a#0, keys: [key1], values: [value1] AS b#1] +- Scan OneRowRelation[] ``` **AFTER** ```scala scala> spark.version res0: String = 3.1.0-SNAPSHOT scala> sql("SELECT 1 a, map('key1', 'value1') b").explain(true) == Parsed Logical Plan == 'Project [1 AS a#4, 'map(key1, value1) AS b#5] +- OneRowRelation == Analyzed Logical Plan == a: int, b: map<string,string> Project [1 AS a#4, map(key1, value1) AS b#5] +- OneRowRelation == Optimized Logical Plan == Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5] +- OneRowRelation == Physical Plan == *(1) Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5] +- *(1) Scan OneRowRelation[] ``` ### Does this PR introduce _any_ user-facing change? Yes. This changes the query plan's string representation in `explain` command and UI. However, this is a bug fix. ### How was this patch tested? Pass the CI with the newly added test case. Closes #30190 from dongjoon-hyun/SPARK-33292. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 838791b) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Merged to master/3.0 |
Test build #130419 has finished for PR 30190 at commit
|
What changes were proposed in this pull request?
This PR aims to wrap
ArrayBasedMapData
literal representation withmap(...)
.Why are the changes needed?
Literal ArrayBasedMapData has inconsistent string representation from
LogicalPlan
toOptimized Logical Plan/Physical Plan
. Also, the representation atOptimized Logical Plan
andPhysical Plan
is ambiguous like[1 AS a#0, keys: [key1], values: [value1] AS b#1]
.BEFORE
AFTER
Does this PR introduce any user-facing change?
Yes. This changes the query plan's string representation in
explain
command and UI. However, this is a bug fix.How was this patch tested?
Pass the CI with the newly added test case.