[SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation #29577

yaooqinn · 2020-08-29T08:41:04Z

What changes were proposed in this pull request?

This PR adds extended information of a function including arguments, examples, notes and the since field to the SparkGetFunctionOperation

Why are the changes needed?

better user experience, it will help JDBC users to have a better understanding of our builtin functions

Does this PR introduce any user-facing change?

Yes, BI tools and JDBC users will get full information on a spark function instead of only fragmentary usage info.

e.g. date_part

before

date_part(field, source) - Extracts a part of the date/timestamp or interval source.

after

    Usage:
      date_part(field, source) - Extracts a part of the date/timestamp or interval source.

    Arguments:
      * field - selects which part of the source should be extracted, and supported string values are as same as the fields of the equivalent function `EXTRACT`.
      * source - a date/timestamp or interval column from where `field` should be extracted
  
    Examples:
      > SELECT date_part('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456');
       2019
      > SELECT date_part('week', timestamp'2019-08-12 01:00:00.123456');
       33
      > SELECT date_part('doy', DATE'2019-08-12');
       224
      > SELECT date_part('SECONDS', timestamp'2019-10-01 00:00:01.000001');
       1.000001
      > SELECT date_part('days', interval 1 year 10 months 5 days);
       5
      > SELECT date_part('seconds', interval 5 hours 30 seconds 1 milliseconds 1 microseconds);
       30.001001
  
    Note:
      The date_part function is equivalent to the SQL-standard function `EXTRACT(field FROM source)`

    Since: 3.0.0

How was this patch tested?

New tests

…e/notes of expressions to the remarks field of GetFunctionsOperation

yaooqinn · 2020-08-29T08:50:23Z

cc @cloud-fan @juliuszsompolski @wangyum @bogdanghit @maropu thanks a lot and truely sorry to bother you on weekends

SparkQA · 2020-08-29T13:10:45Z

Test build #128015 has finished for PR 29577 at commit fd94b09.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-08-29T13:46:16Z

Test build #128017 has finished for PR 29577 at commit 980eaa2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2020-08-29T16:23:51Z

...erver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala

@@ -91,7 +88,7 @@ private[hive] class SparkGetFunctionsOperation(
              DEFAULT_HIVE_CATALOG, // FUNCTION_CAT
              db, // FUNCTION_SCHEM
              funcIdentifier.funcName, // FUNCTION_NAME
-              info.getUsage, // REMARKS
+              "    Usage:\n      " + info.getUsage.trim + "\n" + info.getExtended, // REMARKS


s"Usage: ${info.getUsage}\nExtended Usage:${info.getExtended}"? In order to match DescribeFunctionCommand:

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala

Lines 144 to 148 in 7048fff

Row(s"Usage: ${info.getUsage}") :: Nil

if (isExtended) {

result :+

Row(s"Extended Usage:${info.getExtended}")

LGTM, I will follow it.

SparkQA · 2020-08-30T12:15:24Z

Test build #128039 has finished for PR 29577 at commit 8ca2af7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-08-30T12:58:42Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala

@@ -541,7 +541,8 @@ object Overlay {
       Spark ANSI SQL
      > SELECT _FUNC_(encode('Spark SQL', 'utf-8') PLACING encode('tructured', 'utf-8') FROM 2 FOR 4);
       Structured SQL
-  """)
+  """,
+  since = "3.0.0")


Is this change related to this PR? btw, could we add tests to check if a since field is defined in all the expressions? I think it is important to define this field when adding a new expr.

In master there are currently 67 expressions without the since tag:

test("Since has a valid value") { val badExpressions = spark.sessionState.functionRegistry.listFunction() .map(spark.sessionState.catalog.lookupFunctionInfo) .filter(funcInfo => !funcInfo.getSince.matches("[0-9]+\\.[0-9]+\\.[0-9]+")) .map(_.getClassName) .distinct .sorted if (badExpressions.nonEmpty) { fail(s"${badExpressions.length} expressions with invalid 'since':\n" + badExpressions.mkString("\n")) } }

[info] - Since has a valid value *** FAILED *** (16 milliseconds) [info] 67 expressions with invalid 'since': [info] org.apache.spark.sql.catalyst.expressions.Abs [info] org.apache.spark.sql.catalyst.expressions.Add [info] org.apache.spark.sql.catalyst.expressions.And [info] org.apache.spark.sql.catalyst.expressions.ArrayContains [info] org.apache.spark.sql.catalyst.expressions.AssertTrue [info] org.apache.spark.sql.catalyst.expressions.BitwiseAnd [info] org.apache.spark.sql.catalyst.expressions.BitwiseNot [info] org.apache.spark.sql.catalyst.expressions.BitwiseOr [info] org.apache.spark.sql.catalyst.expressions.BitwiseXor [info] org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection [info] org.apache.spark.sql.catalyst.expressions.CaseWhen [info] org.apache.spark.sql.catalyst.expressions.Cast [info] org.apache.spark.sql.catalyst.expressions.Concat [info] org.apache.spark.sql.catalyst.expressions.Crc32 [info] org.apache.spark.sql.catalyst.expressions.CreateArray [info] org.apache.spark.sql.catalyst.expressions.CreateMap [info] org.apache.spark.sql.catalyst.expressions.CreateNamedStruct [info] org.apache.spark.sql.catalyst.expressions.CurrentDatabase [info] org.apache.spark.sql.catalyst.expressions.Divide [info] org.apache.spark.sql.catalyst.expressions.EqualNullSafe [info] org.apache.spark.sql.catalyst.expressions.EqualTo [info] org.apache.spark.sql.catalyst.expressions.Explode [info] org.apache.spark.sql.catalyst.expressions.GetJsonObject [info] org.apache.spark.sql.catalyst.expressions.GreaterThan [info] org.apache.spark.sql.catalyst.expressions.GreaterThanOrEqual [info] org.apache.spark.sql.catalyst.expressions.Greatest [info] org.apache.spark.sql.catalyst.expressions.If [info] org.apache.spark.sql.catalyst.expressions.In [info] org.apache.spark.sql.catalyst.expressions.Inline [info] org.apache.spark.sql.catalyst.expressions.InputFileBlockLength [info] org.apache.spark.sql.catalyst.expressions.InputFileBlockStart [info] org.apache.spark.sql.catalyst.expressions.InputFileName [info] org.apache.spark.sql.catalyst.expressions.JsonTuple [info] org.apache.spark.sql.catalyst.expressions.Least [info] org.apache.spark.sql.catalyst.expressions.LessThan [info] org.apache.spark.sql.catalyst.expressions.LessThanOrEqual [info] org.apache.spark.sql.catalyst.expressions.MapKeys [info] org.apache.spark.sql.catalyst.expressions.MapValues [info] org.apache.spark.sql.catalyst.expressions.Md5 [info] org.apache.spark.sql.catalyst.expressions.MonotonicallyIncreasingID [info] org.apache.spark.sql.catalyst.expressions.Multiply [info] org.apache.spark.sql.catalyst.expressions.Murmur3Hash [info] org.apache.spark.sql.catalyst.expressions.Not [info] org.apache.spark.sql.catalyst.expressions.Or [info] org.apache.spark.sql.catalyst.expressions.Overlay [info] org.apache.spark.sql.catalyst.expressions.Pmod [info] org.apache.spark.sql.catalyst.expressions.PosExplode [info] org.apache.spark.sql.catalyst.expressions.Remainder [info] org.apache.spark.sql.catalyst.expressions.Sha1 [info] org.apache.spark.sql.catalyst.expressions.Sha2 [info] org.apache.spark.sql.catalyst.expressions.Size [info] org.apache.spark.sql.catalyst.expressions.SortArray [info] org.apache.spark.sql.catalyst.expressions.SparkPartitionID [info] org.apache.spark.sql.catalyst.expressions.Stack [info] org.apache.spark.sql.catalyst.expressions.Subtract [info] org.apache.spark.sql.catalyst.expressions.TimeWindow [info] org.apache.spark.sql.catalyst.expressions.UnaryMinus [info] org.apache.spark.sql.catalyst.expressions.UnaryPositive [info] org.apache.spark.sql.catalyst.expressions.Uuid [info] org.apache.spark.sql.catalyst.expressions.xml.XPathBoolean [info] org.apache.spark.sql.catalyst.expressions.xml.XPathDouble [info] org.apache.spark.sql.catalyst.expressions.xml.XPathFloat [info] org.apache.spark.sql.catalyst.expressions.xml.XPathInt [info] org.apache.spark.sql.catalyst.expressions.xml.XPathList [info] org.apache.spark.sql.catalyst.expressions.xml.XPathLong [info] org.apache.spark.sql.catalyst.expressions.xml.XPathShort [info] org.apache.spark.sql.catalyst.expressions.xml.XPathString (ExpressionInfoSuite.scala:204)

Thanks for the check, @tanelk. Ah, I see. I think its better to add a since tag for the expressions above and tests in a separate PR.

thanks a bunch for the check~

@tanelk You're working on that? Could you file a jira for it?

Filed: https://issues.apache.org/jira/browse/SPARK-32780

HyukjinKwon · 2020-08-31T02:02:57Z

Merged to master.

yaooqinn added 4 commits August 28, 2020 21:03

tmp

9f7deee

Merge branch 'master' into getfunctions

a5f771f

[SPARK-32733][SQL] Add extended information - arguments/examples/sinc…

9346b21

…e/notes of expressions to the remarks field of GetFunctionsOperation

nit

15fd31a

probot-autolabeler bot added the SQL label Aug 29, 2020

yaooqinn added 3 commits August 29, 2020 16:41

nit

8bfc7c9

[SPARK-32733][SQL] Add extended information - arguments/examples/sinc…

ad4befe

…e/notes of expressions to the remarks field of GetFunctionsOperation

refine test

fd94b09

nit

980eaa2

wangyum reviewed Aug 29, 2020

View reviewed changes

follow desc function

8ca2af7

maropu reviewed Aug 30, 2020

View reviewed changes

HyukjinKwon approved these changes Aug 31, 2020

View reviewed changes

wangyum approved these changes Aug 31, 2020

View reviewed changes

HyukjinKwon closed this in 6dacba7 Aug 31, 2020

maropu mentioned this pull request Sep 4, 2020

[SPARK-XXXXX][SQL][TEST] Add tests to check if since fields are set correctly in ExpressionInfo #29646

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation #29577

[SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation #29577

yaooqinn commented Aug 29, 2020

yaooqinn commented Aug 29, 2020

SparkQA commented Aug 29, 2020

SparkQA commented Aug 29, 2020

wangyum Aug 29, 2020

yaooqinn Aug 30, 2020

SparkQA commented Aug 30, 2020

maropu Aug 30, 2020

tanelk Aug 30, 2020

maropu Aug 31, 2020

yaooqinn Aug 31, 2020

maropu Aug 31, 2020

maropu Sep 2, 2020

HyukjinKwon commented Aug 31, 2020

	Row(s"Usage: ${info.getUsage}") :: Nil

	if (isExtended) {
	result :+
	Row(s"Extended Usage:${info.getExtended}")

[SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation #29577

[SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation #29577

Conversation

yaooqinn commented Aug 29, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

before

after

How was this patch tested?

yaooqinn commented Aug 29, 2020

SparkQA commented Aug 29, 2020

SparkQA commented Aug 29, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Aug 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HyukjinKwon commented Aug 31, 2020