New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29237][SQL] Prevent real function names in expression example template #25924
Conversation
@@ -1732,9 +1732,9 @@ case class Left(str: Expression, len: Expression, child: Expression) extends Run | |||
Examples: | |||
> SELECT _FUNC_('Spark SQL '); | |||
10 | |||
> SELECT CHAR_LENGTH('Spark SQL '); | |||
> SELECT _FUNC_('Spark SQL '); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better show examples for binary data instead of demonstrating synonyms of the function name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it would be nice to show to users all synonyms in a separate list somewhere in the description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
Outdated
Show resolved
Hide resolved
Test build #111331 has finished for PR 25924 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
Outdated
Show resolved
Hide resolved
> SELECT CHARACTER_LENGTH('Spark SQL '); | ||
> SELECT _FUNC_('Spark SQL'); | ||
9 | ||
> SELECT _FUNC_(binary('Spark SQL\\000')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last two examples seems to show to users we also support CHAR_LENGTH
and CHARACTER_LENGTH
:
https://issues.apache.org/jira/browse/SPARK-20749
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I will revert this but #25924 (comment) instead of hard coded names in examples.
Test build #111334 has finished for PR 25924 at commit
|
@@ -56,6 +56,8 @@ public String getArguments() { | |||
return arguments; | |||
} | |||
|
|||
public String getOriginalExamples() { return examples; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that this is worth adding just for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or can we use @VisibleForTesting
at least?
Test build #111336 has finished for PR 25924 at commit
|
"org.apache.spark.sql.catalyst.expressions.Remainder", | ||
// Examples demonstrate alternative names, see SPARK-20749 | ||
"org.apache.spark.sql.catalyst.expressions.Length") | ||
spark.sessionState.functionRegistry.listFunction().foreach { funcId => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @MaxGekk . This is super useful.
Test build #111342 has finished for PR 25924 at commit
|
I do believe it is worth because it is added not only for particular test, this check will prevent mistakes in docs.
|
Test build #111352 has finished for PR 25924 at commit
|
jenkins, retest this, please |
Test build #111362 has finished for PR 25924 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changes were proposed in this pull request?
In the PR, I propose to replace function names in some expression examples by
_FUNC_
, and add a test to check that_FUNC_
always present in all examples.Why are the changes needed?
Binding of a function name to an expression is performed in
FunctionRegistry
which is single source of truth. Expression examples should avoid using function name directly because this can make the examples invalid in the future.Does this PR introduce any user-facing change?
No
How was this patch tested?
Added new test to
SQLQuerySuite
which analyses expression example, and check presence of_FUNC_
.