Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation #29577

Closed
wants to merge 9 commits into from

Conversation

yaooqinn
Copy link
Member

What changes were proposed in this pull request?

This PR adds extended information of a function including arguments, examples, notes and the since field to the SparkGetFunctionOperation

Why are the changes needed?

better user experience, it will help JDBC users to have a better understanding of our builtin functions

Does this PR introduce any user-facing change?

Yes, BI tools and JDBC users will get full information on a spark function instead of only fragmentary usage info.

e.g. date_part

before

date_part(field, source) - Extracts a part of the date/timestamp or interval source.

after

    Usage:
      date_part(field, source) - Extracts a part of the date/timestamp or interval source.

    Arguments:
      * field - selects which part of the source should be extracted, and supported string values are as same as the fields of the equivalent function `EXTRACT`.
      * source - a date/timestamp or interval column from where `field` should be extracted
  
    Examples:
      > SELECT date_part('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456');
       2019
      > SELECT date_part('week', timestamp'2019-08-12 01:00:00.123456');
       33
      > SELECT date_part('doy', DATE'2019-08-12');
       224
      > SELECT date_part('SECONDS', timestamp'2019-10-01 00:00:01.000001');
       1.000001
      > SELECT date_part('days', interval 1 year 10 months 5 days);
       5
      > SELECT date_part('seconds', interval 5 hours 30 seconds 1 milliseconds 1 microseconds);
       30.001001
  
    Note:
      The date_part function is equivalent to the SQL-standard function `EXTRACT(field FROM source)`

    Since: 3.0.0

How was this patch tested?

New tests

…e/notes of expressions to the remarks field of GetFunctionsOperation
@yaooqinn
Copy link
Member Author

cc @cloud-fan @juliuszsompolski @wangyum @bogdanghit @maropu thanks a lot and truely sorry to bother you on weekends

@SparkQA
Copy link

SparkQA commented Aug 29, 2020

Test build #128015 has finished for PR 29577 at commit fd94b09.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 29, 2020

Test build #128017 has finished for PR 29577 at commit 980eaa2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -91,7 +88,7 @@ private[hive] class SparkGetFunctionsOperation(
DEFAULT_HIVE_CATALOG, // FUNCTION_CAT
db, // FUNCTION_SCHEM
funcIdentifier.funcName, // FUNCTION_NAME
info.getUsage, // REMARKS
" Usage:\n " + info.getUsage.trim + "\n" + info.getExtended, // REMARKS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s"Usage: ${info.getUsage}\nExtended Usage:${info.getExtended}"? In order to match DescribeFunctionCommand:

Row(s"Usage: ${info.getUsage}") :: Nil
if (isExtended) {
result :+
Row(s"Extended Usage:${info.getExtended}")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I will follow it.

@SparkQA
Copy link

SparkQA commented Aug 30, 2020

Test build #128039 has finished for PR 29577 at commit 8ca2af7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -541,7 +541,8 @@ object Overlay {
Spark ANSI SQL
> SELECT _FUNC_(encode('Spark SQL', 'utf-8') PLACING encode('tructured', 'utf-8') FROM 2 FOR 4);
Structured SQL
""")
""",
since = "3.0.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to this PR? btw, could we add tests to check if a since field is defined in all the expressions? I think it is important to define this field when adding a new expr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In master there are currently 67 expressions without the since tag:

  test("Since has a valid value") {
    val badExpressions = spark.sessionState.functionRegistry.listFunction()
      .map(spark.sessionState.catalog.lookupFunctionInfo)
      .filter(funcInfo => !funcInfo.getSince.matches("[0-9]+\\.[0-9]+\\.[0-9]+"))
      .map(_.getClassName)
      .distinct
      .sorted

    if (badExpressions.nonEmpty) {
      fail(s"${badExpressions.length} expressions with invalid 'since':\n"
        + badExpressions.mkString("\n"))
    }
  }
[info] - Since has a valid value *** FAILED *** (16 milliseconds)
[info]   67 expressions with invalid 'since':
[info]   org.apache.spark.sql.catalyst.expressions.Abs
[info]   org.apache.spark.sql.catalyst.expressions.Add
[info]   org.apache.spark.sql.catalyst.expressions.And
[info]   org.apache.spark.sql.catalyst.expressions.ArrayContains
[info]   org.apache.spark.sql.catalyst.expressions.AssertTrue
[info]   org.apache.spark.sql.catalyst.expressions.BitwiseAnd
[info]   org.apache.spark.sql.catalyst.expressions.BitwiseNot
[info]   org.apache.spark.sql.catalyst.expressions.BitwiseOr
[info]   org.apache.spark.sql.catalyst.expressions.BitwiseXor
[info]   org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection
[info]   org.apache.spark.sql.catalyst.expressions.CaseWhen
[info]   org.apache.spark.sql.catalyst.expressions.Cast
[info]   org.apache.spark.sql.catalyst.expressions.Concat
[info]   org.apache.spark.sql.catalyst.expressions.Crc32
[info]   org.apache.spark.sql.catalyst.expressions.CreateArray
[info]   org.apache.spark.sql.catalyst.expressions.CreateMap
[info]   org.apache.spark.sql.catalyst.expressions.CreateNamedStruct
[info]   org.apache.spark.sql.catalyst.expressions.CurrentDatabase
[info]   org.apache.spark.sql.catalyst.expressions.Divide
[info]   org.apache.spark.sql.catalyst.expressions.EqualNullSafe
[info]   org.apache.spark.sql.catalyst.expressions.EqualTo
[info]   org.apache.spark.sql.catalyst.expressions.Explode
[info]   org.apache.spark.sql.catalyst.expressions.GetJsonObject
[info]   org.apache.spark.sql.catalyst.expressions.GreaterThan
[info]   org.apache.spark.sql.catalyst.expressions.GreaterThanOrEqual
[info]   org.apache.spark.sql.catalyst.expressions.Greatest
[info]   org.apache.spark.sql.catalyst.expressions.If
[info]   org.apache.spark.sql.catalyst.expressions.In
[info]   org.apache.spark.sql.catalyst.expressions.Inline
[info]   org.apache.spark.sql.catalyst.expressions.InputFileBlockLength
[info]   org.apache.spark.sql.catalyst.expressions.InputFileBlockStart
[info]   org.apache.spark.sql.catalyst.expressions.InputFileName
[info]   org.apache.spark.sql.catalyst.expressions.JsonTuple
[info]   org.apache.spark.sql.catalyst.expressions.Least
[info]   org.apache.spark.sql.catalyst.expressions.LessThan
[info]   org.apache.spark.sql.catalyst.expressions.LessThanOrEqual
[info]   org.apache.spark.sql.catalyst.expressions.MapKeys
[info]   org.apache.spark.sql.catalyst.expressions.MapValues
[info]   org.apache.spark.sql.catalyst.expressions.Md5
[info]   org.apache.spark.sql.catalyst.expressions.MonotonicallyIncreasingID
[info]   org.apache.spark.sql.catalyst.expressions.Multiply
[info]   org.apache.spark.sql.catalyst.expressions.Murmur3Hash
[info]   org.apache.spark.sql.catalyst.expressions.Not
[info]   org.apache.spark.sql.catalyst.expressions.Or
[info]   org.apache.spark.sql.catalyst.expressions.Overlay
[info]   org.apache.spark.sql.catalyst.expressions.Pmod
[info]   org.apache.spark.sql.catalyst.expressions.PosExplode
[info]   org.apache.spark.sql.catalyst.expressions.Remainder
[info]   org.apache.spark.sql.catalyst.expressions.Sha1
[info]   org.apache.spark.sql.catalyst.expressions.Sha2
[info]   org.apache.spark.sql.catalyst.expressions.Size
[info]   org.apache.spark.sql.catalyst.expressions.SortArray
[info]   org.apache.spark.sql.catalyst.expressions.SparkPartitionID
[info]   org.apache.spark.sql.catalyst.expressions.Stack
[info]   org.apache.spark.sql.catalyst.expressions.Subtract
[info]   org.apache.spark.sql.catalyst.expressions.TimeWindow
[info]   org.apache.spark.sql.catalyst.expressions.UnaryMinus
[info]   org.apache.spark.sql.catalyst.expressions.UnaryPositive
[info]   org.apache.spark.sql.catalyst.expressions.Uuid
[info]   org.apache.spark.sql.catalyst.expressions.xml.XPathBoolean
[info]   org.apache.spark.sql.catalyst.expressions.xml.XPathDouble
[info]   org.apache.spark.sql.catalyst.expressions.xml.XPathFloat
[info]   org.apache.spark.sql.catalyst.expressions.xml.XPathInt
[info]   org.apache.spark.sql.catalyst.expressions.xml.XPathList
[info]   org.apache.spark.sql.catalyst.expressions.xml.XPathLong
[info]   org.apache.spark.sql.catalyst.expressions.xml.XPathShort
[info]   org.apache.spark.sql.catalyst.expressions.xml.XPathString (ExpressionInfoSuite.scala:204)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the check, @tanelk. Ah, I see. I think its better to add a since tag for the expressions above and tests in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a bunch for the check~

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanelk You're working on that? Could you file a jira for it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants