-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-43384][SQL] Make df.show
print a nice string for MapType
.
#41065
Conversation
cc @cloud-fan |
@@ -45,6 +45,10 @@ case class ToPrettyString(child: Expression, timeZoneId: Option[String] = None) | |||
override protected def leftBracket: String = "{" | |||
override protected def rightBracket: String = "}" | |||
|
|||
override def kvPairSeparator: String = ":" | |||
|
|||
override protected def elementSeparator: String = "," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
override protected def elementSeparator: String = "," | |
override protected def elementSeparator: String = ", " |
Add space between kv maybe better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I think ,
is better, But spark-sql use ,
, Any suggestions? @cloud-fan
@@ -828,7 +828,7 @@ abstract class CastSuiteBase extends SparkFunSuite with ExpressionEvalHelper { | |||
val ret2 = cast( | |||
Literal.create(Map("1" -> "a".getBytes, "2" -> null, "3" -> "c".getBytes)), | |||
StringType) | |||
checkEvaluation(ret2, s"${lb}1 -> a, 2 ->${if (legacyCast) "" else " null"}, 3 -> c$rb") | |||
checkEvaluation(ret2, s"${lb}1 -> a, 2 -> ${if (legacyCast) "" else "null"}, 3 -> c$rb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid changing the cast behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a reasonable change, MapToString has three elements to print, key, value, and separator. that is k separator v
, this approach is more unified. For Cast, the separator is " -> "
, and for ToPrettyString, it is ":"
.
This affects only one behavior of the Cast, that is, when v of the first element is an empty string, it was printed as k ->
before and now is k ->
. I think k ->
is more reasonable, it is consistent with the other elements except the first element.
For example, map(k1,"",k2,v2), before it's k1 ->, k2 -> v2
, now it's k1 -> , k2 -> v2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks reasonable. @sadikovi what do you think?
cc @sadikovi FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a controversial issue. For example, I'm not sure why this is a nice string for MapType
, @yikf . To be honest, I prefer the original style because it resembles a Scala style.
Map(1 -> 2, 2 -> 3)
Is this PR aiming to change Spark for more Python-user-friendly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need this change, why don't we make this configurable? I believe the default should be the same with old Spark behavior while the user can use new output via enabling the configuration manually.
Thanks @dongjoon-hyun for your point : ) The intent of this PR is to change the way Adding configuration is also a good idea. |
Any update for adding config, @yikf ? |
Add a config to control style. Should we use the new output by default and restore the old output when the legacy configuration is enabled? |
Since this is not a bug fix, please use the existing behavior as the default. The new feature can be enabled and tested at least during Spark 3.5.0. Then, we can switch it at Spark 3.6.0. |
@dongjoon-hyun Thank you, that sounds reasonable. And i restored the default behavior of #40699. for new representations of null, mapType, or other future data types, we use the existing behavior as the default. the new representation can be enabled by configuration. |
@@ -5313,7 +5313,7 @@ class VectorAssembler( | |||
+---+---+----+-------------+ | |||
| a| b| c| features| | |||
+---+---+----+-------------+ | |||
|1.0|2.0|NULL|[1.0,2.0,NaN]| | |||
|1.0|2.0|null|[1.0,2.0,NaN]| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another difference from the existing behavior, @yikf .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing behavior should be null
, NULL
is changed in #40699, I think we should do the same with map type, default to null
instead of NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the pointer. I'm not sure I can agree with #40699 there. I'm tracking the comment on that PR.
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This PR aims to make
df.show
print a nice string forMapType
.Let's say have an example like this:
Before, it print
Now, it prints as follows, that's consistent with
spark-sql
CLI.Why are the changes needed?
Make
df.show
print a nice string forMapType
.Does this PR introduce any user-facing change?
Yes, They will face better nice strings representation for MapType.
How was this patch tested?
Exist tests.