New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation #29577
Conversation
…e/notes of expressions to the remarks field of GetFunctionsOperation
…e/notes of expressions to the remarks field of GetFunctionsOperation
cc @cloud-fan @juliuszsompolski @wangyum @bogdanghit @maropu thanks a lot and truely sorry to bother you on weekends |
Test build #128015 has finished for PR 29577 at commit
|
Test build #128017 has finished for PR 29577 at commit
|
@@ -91,7 +88,7 @@ private[hive] class SparkGetFunctionsOperation( | |||
DEFAULT_HIVE_CATALOG, // FUNCTION_CAT | |||
db, // FUNCTION_SCHEM | |||
funcIdentifier.funcName, // FUNCTION_NAME | |||
info.getUsage, // REMARKS | |||
" Usage:\n " + info.getUsage.trim + "\n" + info.getExtended, // REMARKS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s"Usage: ${info.getUsage}\nExtended Usage:${info.getExtended}"
? In order to match DescribeFunctionCommand
:
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
Lines 144 to 148 in 7048fff
Row(s"Usage: ${info.getUsage}") :: Nil | |
if (isExtended) { | |
result :+ | |
Row(s"Extended Usage:${info.getExtended}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I will follow it.
Test build #128039 has finished for PR 29577 at commit
|
@@ -541,7 +541,8 @@ object Overlay { | |||
Spark ANSI SQL | |||
> SELECT _FUNC_(encode('Spark SQL', 'utf-8') PLACING encode('tructured', 'utf-8') FROM 2 FOR 4); | |||
Structured SQL | |||
""") | |||
""", | |||
since = "3.0.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change related to this PR? btw, could we add tests to check if a since
field is defined in all the expressions? I think it is important to define this field when adding a new expr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In master there are currently 67 expressions without the since tag:
test("Since has a valid value") {
val badExpressions = spark.sessionState.functionRegistry.listFunction()
.map(spark.sessionState.catalog.lookupFunctionInfo)
.filter(funcInfo => !funcInfo.getSince.matches("[0-9]+\\.[0-9]+\\.[0-9]+"))
.map(_.getClassName)
.distinct
.sorted
if (badExpressions.nonEmpty) {
fail(s"${badExpressions.length} expressions with invalid 'since':\n"
+ badExpressions.mkString("\n"))
}
}
[info] - Since has a valid value *** FAILED *** (16 milliseconds)
[info] 67 expressions with invalid 'since':
[info] org.apache.spark.sql.catalyst.expressions.Abs
[info] org.apache.spark.sql.catalyst.expressions.Add
[info] org.apache.spark.sql.catalyst.expressions.And
[info] org.apache.spark.sql.catalyst.expressions.ArrayContains
[info] org.apache.spark.sql.catalyst.expressions.AssertTrue
[info] org.apache.spark.sql.catalyst.expressions.BitwiseAnd
[info] org.apache.spark.sql.catalyst.expressions.BitwiseNot
[info] org.apache.spark.sql.catalyst.expressions.BitwiseOr
[info] org.apache.spark.sql.catalyst.expressions.BitwiseXor
[info] org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection
[info] org.apache.spark.sql.catalyst.expressions.CaseWhen
[info] org.apache.spark.sql.catalyst.expressions.Cast
[info] org.apache.spark.sql.catalyst.expressions.Concat
[info] org.apache.spark.sql.catalyst.expressions.Crc32
[info] org.apache.spark.sql.catalyst.expressions.CreateArray
[info] org.apache.spark.sql.catalyst.expressions.CreateMap
[info] org.apache.spark.sql.catalyst.expressions.CreateNamedStruct
[info] org.apache.spark.sql.catalyst.expressions.CurrentDatabase
[info] org.apache.spark.sql.catalyst.expressions.Divide
[info] org.apache.spark.sql.catalyst.expressions.EqualNullSafe
[info] org.apache.spark.sql.catalyst.expressions.EqualTo
[info] org.apache.spark.sql.catalyst.expressions.Explode
[info] org.apache.spark.sql.catalyst.expressions.GetJsonObject
[info] org.apache.spark.sql.catalyst.expressions.GreaterThan
[info] org.apache.spark.sql.catalyst.expressions.GreaterThanOrEqual
[info] org.apache.spark.sql.catalyst.expressions.Greatest
[info] org.apache.spark.sql.catalyst.expressions.If
[info] org.apache.spark.sql.catalyst.expressions.In
[info] org.apache.spark.sql.catalyst.expressions.Inline
[info] org.apache.spark.sql.catalyst.expressions.InputFileBlockLength
[info] org.apache.spark.sql.catalyst.expressions.InputFileBlockStart
[info] org.apache.spark.sql.catalyst.expressions.InputFileName
[info] org.apache.spark.sql.catalyst.expressions.JsonTuple
[info] org.apache.spark.sql.catalyst.expressions.Least
[info] org.apache.spark.sql.catalyst.expressions.LessThan
[info] org.apache.spark.sql.catalyst.expressions.LessThanOrEqual
[info] org.apache.spark.sql.catalyst.expressions.MapKeys
[info] org.apache.spark.sql.catalyst.expressions.MapValues
[info] org.apache.spark.sql.catalyst.expressions.Md5
[info] org.apache.spark.sql.catalyst.expressions.MonotonicallyIncreasingID
[info] org.apache.spark.sql.catalyst.expressions.Multiply
[info] org.apache.spark.sql.catalyst.expressions.Murmur3Hash
[info] org.apache.spark.sql.catalyst.expressions.Not
[info] org.apache.spark.sql.catalyst.expressions.Or
[info] org.apache.spark.sql.catalyst.expressions.Overlay
[info] org.apache.spark.sql.catalyst.expressions.Pmod
[info] org.apache.spark.sql.catalyst.expressions.PosExplode
[info] org.apache.spark.sql.catalyst.expressions.Remainder
[info] org.apache.spark.sql.catalyst.expressions.Sha1
[info] org.apache.spark.sql.catalyst.expressions.Sha2
[info] org.apache.spark.sql.catalyst.expressions.Size
[info] org.apache.spark.sql.catalyst.expressions.SortArray
[info] org.apache.spark.sql.catalyst.expressions.SparkPartitionID
[info] org.apache.spark.sql.catalyst.expressions.Stack
[info] org.apache.spark.sql.catalyst.expressions.Subtract
[info] org.apache.spark.sql.catalyst.expressions.TimeWindow
[info] org.apache.spark.sql.catalyst.expressions.UnaryMinus
[info] org.apache.spark.sql.catalyst.expressions.UnaryPositive
[info] org.apache.spark.sql.catalyst.expressions.Uuid
[info] org.apache.spark.sql.catalyst.expressions.xml.XPathBoolean
[info] org.apache.spark.sql.catalyst.expressions.xml.XPathDouble
[info] org.apache.spark.sql.catalyst.expressions.xml.XPathFloat
[info] org.apache.spark.sql.catalyst.expressions.xml.XPathInt
[info] org.apache.spark.sql.catalyst.expressions.xml.XPathList
[info] org.apache.spark.sql.catalyst.expressions.xml.XPathLong
[info] org.apache.spark.sql.catalyst.expressions.xml.XPathShort
[info] org.apache.spark.sql.catalyst.expressions.xml.XPathString (ExpressionInfoSuite.scala:204)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the check, @tanelk. Ah, I see. I think its better to add a since
tag for the expressions above and tests in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a bunch for the check~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanelk You're working on that? Could you file a jira for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged to master. |
What changes were proposed in this pull request?
This PR adds extended information of a function including arguments, examples, notes and the since field to the SparkGetFunctionOperation
Why are the changes needed?
better user experience, it will help JDBC users to have a better understanding of our builtin functions
Does this PR introduce any user-facing change?
Yes, BI tools and JDBC users will get full information on a spark function instead of only fragmentary usage info.
e.g. date_part
before
after
How was this patch tested?
New tests