-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31615][SQL] Pretty string output for sql method of RuntimeReplaceable expressions #28420
Conversation
…aceable expressions
@@ -1206,7 +1207,7 @@ case class DatetimeSub( | |||
interval: Expression, | |||
child: Expression) extends RuntimeReplaceable { | |||
override def toString: String = s"$start - $interval" | |||
override def sql: String = s"${start.sql} - ${interval.sql}" | |||
override def sql: String = s"${toPrettySQL(start)} - ${toPrettySQL(interval)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means RuntimeReplaceable.sql
always use pretty sql, while other expressions only use pretty sql when the caller side needs to (by calling toPrettySQL
).
Is it possible to make them consistent? Like make start
and interval
as innerChildren
and handle them in usePrettyExpression
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let me see
Test build #122122 has finished for PR 28420 at commit
|
Test build #122123 has finished for PR 28420 at commit
|
Test build #122142 has finished for PR 28420 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the followup, @yaooqinn
override def innerChildren: Seq[Expression] = sys.error("RuntimeReplaceable must implement" + | ||
" innerChildren with the original parameters") | ||
|
||
protected val sqlStrSeparator: String = ", " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this variable? It seems only Extract
uses this.
@@ -323,6 +323,20 @@ trait RuntimeReplaceable extends UnaryExpression with Unevaluable { | |||
// two `RuntimeReplaceable` are considered to be semantically equal if their "child" expressions | |||
// are semantically equal. | |||
override lazy val canonicalized: Expression = child.canonicalized | |||
|
|||
override def innerChildren: Seq[Expression] = sys.error("RuntimeReplaceable must implement" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since compilers cannot catch the case where one forget to override this, how about it like this?
// RuntimeReplaceable
def exprsReplaced: Seq[Expression]
override def innerChildren: Seq[Expression] = exprsReplaced
// e.g., ParseToTimestamp
override def exprsReplaced: Seq[Expression] = Seq(Some(left), format).flatten
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Point!
override def sql: String = RuntimeReplaceable.this.prettyName + | ||
prettyChildren.map(_.sql).mkString("(", sqlStrSeparator, ")") | ||
|
||
protected var prettyChildren: Seq[Expression] = innerChildren |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its better not to use var
where possible, so how about it like this?
// RuntimeReplaceable
def newInstance(innerChildren: Seq[Expression]): RuntimeReplaceable
// e.g., ParseToTimestamp
override def newInstance(innerChildren: Seq[Expression]): ParseToTimestamp = {
innerChildren match {
case Seq(left) => ParseToTimestamp(left, None, child)
case Seq(left, format) => ParseToTimestamp(left, Some(format), child)
}
}
Then, update the toPrettySQL
in utils.package
;
def toPrettySQL(e: Expression): String = e match {
case r: RuntimeReplaceable =>
r.newInstance(r.innerChildren.map(usePrettyExpression)).sql
case _ => usePrettyExpression(e).sql
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newInstance
seems not robust to make a clone with Seq[Expression]
I think we can just add def prettySQL: String
in RuntimeReplaceable for utils.package
to call, and override it wherever needed without creating new instances.
Seems a better and simpler idea that comes up based on your suggestions, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel having the 'pretty' functionality in RuntimeReplaceable
is a bit intrusive. Dropping my idea above is okay, but I think we need a more general solution for getting pretty strings outside RuntimeReplaceable
. WDYT? @cloud-fan
*/ | ||
def exprsReplaced: Seq[Expression] | ||
|
||
override def sql: String = RuntimeReplaceable.this.prettyName + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: seems we can remove RuntimeReplaceable.this.
?
override def prettyName: String = "extract" | ||
override def exprsReplaced: Seq[Expression] = Seq(field, source) | ||
|
||
override def sql: String = super.sql.replace(", ", " FROM ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is tricky and hard to read, can we just generate the sql string?
@@ -142,7 +142,10 @@ package object util extends Logging { | |||
"`" + name.replace("`", "``") + "`" | |||
} | |||
|
|||
def toPrettySQL(e: Expression): String = usePrettyExpression(e).sql | |||
def toPrettySQL(e: Expression): String = e match { | |||
case r: RuntimeReplaceable => r.prettySQL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work if RuntimeReplaceable
is not the root node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah.. no..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c42a283#diff-6f4edc80e2cc973e748705e85a6053b4R786-R807
fixed and added some nested tests, thanks, good catch!
Test build #122160 has finished for PR 28420 at commit
|
Test build #122172 has finished for PR 28420 at commit
|
Test build #122176 has finished for PR 28420 at commit
|
Test build #122224 has finished for PR 28420 at commit
|
retest this please |
Test build #122225 has finished for PR 28420 at commit
|
retest this please |
Test build #122228 has finished for PR 28420 at commit
|
protected def prettySQL: String = | ||
prettyName + exprsReplaced.map(toPrettySQL).mkString("(", ", ", ")") | ||
|
||
def asPrettyAttribute(): PrettyAttribute = PrettyAttribute(prettySQL, dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not add a def withPrettyAttributes: this.type
? The current way requires fewer changes as we can have a default implementation, but it's hacky to convert RuntimeReplaceable
to a single PrettyAttribute
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to move this to util.package
Test build #122315 has finished for PR 28420 at commit
|
*/ | ||
protected def exprsReplaced: Seq[Expression] | ||
|
||
override def sql: String = prettyName + exprsReplaced.map(_.sql).mkString("(", ", ", ")") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another idea: maybe we can define a mkString
method?
def exprsReplaced: Seq[Expression]
def mkString(childrenString: Seq[String]): String = childrenString.mkString("(", ", ", ")")
override def sql: String = prettyName + mkString(exprsReplaced.map(_.sql))
then in util
case r: RuntimeReplaceable => PrettyAttribute(r.mkString(r.exprsReplaced.map(toPrettySQL)), r.dataType)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, this way is much better. We should put the prettyName
in mkString
though.
Test build #122363 has finished for PR 28420 at commit
|
retest this please |
Test build #122375 has finished for PR 28420 at commit
|
We need to backport this into branch-2.4, too? I personally think printing debug info looks like a minor bug. @cloud-fan @dongjoon-hyun |
It's a minor bug, but the diff is not small. Maybe master-only is good enough? |
Looks okay to me. |
Thanks! Merged to master. |
@@ -245,54 +245,54 @@ true | |||
-- !query | |||
select to_date('2009-07-30 04:17:52') <= to_date('2009-07-30 04:17:52') | |||
-- !query schema | |||
struct<(to_date('2009-07-30 04:17:52') <= to_date('2009-07-30 04:17:52')):boolean> | |||
struct<(to_date(2009-07-30 04:17:52) <= to_date(2009-07-30 04:17:52)):boolean> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I happened to leave a comment late but why was this changed? Seems the previous one looks better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check this
+-- !query
+select date_format('2019-10-06', 'yyyy-MM-dd uuuu')
+-- !query schema
+struct<date_format(CAST(2019-10-06 AS TIMESTAMP), yyyy-MM-dd uuuu):string>
+-- !query output
+2019-10-06 Sunday
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we discuss about changing ^ case instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet. This PR just makes the RuntimeReplaceable
-> PrettyAttribute
. I agree with you that the PrettyAttribute
is not beautiful in itself.
|
||
def mkString(childrenString: Seq[String]): String = { | ||
prettyName + childrenString.mkString("(", ", ", ")") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need two extra interface methods just to make the column name pretty? Seems the output isn't even completely pretty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not about pretty or not. It's about consistency. If we call a function with its alias name, then it will not use PrettyAttribute
and the string representation is different from if we call the function with the original name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The consistency is good but the readability here is the problem. See the implementation side:
override def sql: String = s"$prettyName(${field.sql} FROM ${source.sql})"
override def prettyName: String = "extract"
vs
override def exprsReplaced: Seq[Expression] = Seq(field, source)
override def mkString(childrenString: Seq[String]): String = {
prettyName + childrenString.mkString("(", " FROM ", ")")
}
Previously, it was fine to only know sql
and prettyName
but now you should know what exprsReplaced
and mkString
are. From the naming, it's confusing what that means too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just document that sql
and prettyName
should be properly overridden at RuntimeReplaceable
just like we have the restriction about the child
. We could have an util for it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller side decides if he wants pretty string or not, by calling Expression.sql
or toPrettySQL(Expression)
. So simply overridden sql
doesn't work, unless we add a boolean parameter.
@@ -134,6 +134,8 @@ package object util extends Logging { | |||
PrettyAttribute(usePrettyExpression(e.child).sql + "." + name, e.dataType) | |||
case e: GetArrayStructFields => | |||
PrettyAttribute(usePrettyExpression(e.child) + "." + e.field.name, e.dataType) | |||
case r: RuntimeReplaceable => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And strictly RuntimeReplaceable
itself isn't an attribute ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already did the same to GetStructField
:)
What changes were proposed in this pull request?
The RuntimeReplaceable ones are runtime replaceable, thus, their original parameters are not going to be resolved to PrettyAttribute and remain debug style string if we directly implement their
sql
methods with their parameters'sql
methods.This PR is raised with suggestions by @maropu and @cloud-fan https://github.com/apache/spark/pull/28402/files#r417656589. In this PR, we re-implement the
sql
methods of the RuntimeReplaceable ones with toPettySQLWhy are the changes needed?
Consistency of schema output between RuntimeReplaceable expressions and normal ones.
For example,
date_format
vsto_timestamp
, before this PR, they output differentlyBefore
After
Does this PR introduce any user-facing change?
Yes, the schema output style changed for the runtime replaceable expressions as shown in the above example
How was this patch tested?
regenerate all related tests