Skip to content

Conversation

maropu
Copy link
Member

@maropu maropu commented Jan 5, 2018

What changes were proposed in this pull request?

This pr fixed the issue when casting maps into strings;

scala> Seq(Map(1 -> "a", 2 -> "b")).toDF("a").write.saveAsTable("t")
scala> sql("SELECT cast(a as String) FROM t").show(false)
+----------------------------------------------------------------+
|a                                                               |
+----------------------------------------------------------------+
|org.apache.spark.sql.catalyst.expressions.UnsafeMapData@38bdd75d|
+----------------------------------------------------------------+

This pr modified the result into;

+----------------+
|a               |
+----------------+
|[1 -> a, 2 -> b]|
+----------------+

How was this patch tested?

Added tests in CastSuite.

@SparkQA
Copy link

SparkQA commented Jan 5, 2018

Test build #85726 has finished for PR 20166 at commit efc28e3.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 5, 2018

Test build #85727 has finished for PR 20166 at commit be04e64.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Jan 6, 2018

@cloud-fan

if (map.numElements > 0) {
val keyToUTF8String = castToString(kt)
val valueToUTF8String = castToString(vt)
builder.append(keyToUTF8String(map.keyArray().get(0, kt)).asInstanceOf[UTF8String])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map.keyArray() and map.valueArray appear many times, we can create 2 local variables for them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@cloud-fan
Copy link
Contributor

LGTM except one minor comment

@maropu
Copy link
Member Author

maropu commented Jan 6, 2018

BTW, the current Datset.showString prints rows thru RowEncoder deserializers like;

scala> Seq(Seq(Seq(1, 2), Seq(3), Seq(4, 5, 6))).toDF("a").show(false)
+------------------------------------------------------------+
|a                                                           |
+------------------------------------------------------------+
|[WrappedArray(1, 2), WrappedArray(3), WrappedArray(4, 5, 6)]|
+------------------------------------------------------------+

If we cast them before prints, we could get more simpler forms like;

scala> Seq(Seq(Seq(1, 2), Seq(3), Seq(4, 5, 6))).toDF("a").show(false)
+------------------------+
|a                       |
+------------------------+
|[[1, 2], [3], [4, 5, 6]]|
+------------------------+

I'm not sure through, is this acceptable? (Probably, we might need to add a option to keep the old behaviour)

@SparkQA
Copy link

SparkQA commented Jan 6, 2018

Test build #85751 has finished for PR 20166 at commit fb11796.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

I think this is a bug, in Dataset.showString I see code like

case seq: Seq[_] => seq.mkString("[", ", ", "]")

Which means we do want to show strings like [[1, 2], [3], [4, 5, 6]]

Anyway let's fix in another PR, I'm merging this PR first

@cloud-fan
Copy link
Contributor

merging to master/2.3!

asfgit pushed a commit that referenced this pull request Jan 7, 2018
## What changes were proposed in this pull request?
This pr fixed the issue when casting maps into strings;
```
scala> Seq(Map(1 -> "a", 2 -> "b")).toDF("a").write.saveAsTable("t")
scala> sql("SELECT cast(a as String) FROM t").show(false)
+----------------------------------------------------------------+
|a                                                               |
+----------------------------------------------------------------+
|org.apache.spark.sql.catalyst.expressions.UnsafeMapData38bdd75d|
+----------------------------------------------------------------+
```
This pr modified the result into;
```
+----------------+
|a               |
+----------------+
|[1 -> a, 2 -> b]|
+----------------+
```

## How was this patch tested?
Added tests in `CastSuite`.

Author: Takeshi Yamamuro <yamamuro@apache.org>

Closes #20166 from maropu/SPARK-22973.

(cherry picked from commit 18e9414)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@asfgit asfgit closed this in 18e9414 Jan 7, 2018
@maropu
Copy link
Member Author

maropu commented Jan 7, 2018

ok, I'll fix struct in a next following pr first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants