New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-45232][SQL][DOCS] Add missing function groups to SQL references #43011
Conversation
@@ -34,6 +34,8 @@ | |||
"math_funcs", "conditional_funcs", "generator_funcs", | |||
"predicate_funcs", "string_funcs", "misc_funcs", | |||
"bitwise_funcs", "conversion_funcs", "csv_funcs", | |||
"xml_funcs", "lambda_funcs", "collection_funcs", | |||
"url_funcs", "hash_funcs", "struct_funcs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check against
spark/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java
Lines 43 to 48 in 37ab190
private static final Set<String> validGroups = | |
new HashSet<>(Arrays.asList("agg_funcs", "array_funcs", "binary_funcs", "bitwise_funcs", | |
"collection_funcs", "predicate_funcs", "conditional_funcs", "conversion_funcs", | |
"csv_funcs", "datetime_funcs", "generator_funcs", "hash_funcs", "json_funcs", | |
"lambda_funcs", "map_funcs", "math_funcs", "misc_funcs", "string_funcs", "struct_funcs", | |
"window_funcs", "xml_funcs", "table_funcs", "url_funcs")); |
two difference:
1, table_funcs
: not support in gen-sql-functions-docs.py
;
2, binary_funcs
: I can not find any function using this group;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: For generator_funcs, do we have documentation for them when used in the FROM clause of a query? Functions like explode are typically considered table-valued generator functions.
@@ -34,6 +34,8 @@ | |||
"math_funcs", "conditional_funcs", "generator_funcs", | |||
"predicate_funcs", "string_funcs", "misc_funcs", | |||
"bitwise_funcs", "conversion_funcs", "csv_funcs", | |||
"xml_funcs", "lambda_funcs", "collection_funcs", | |||
"url_funcs", "hash_funcs", "struct_funcs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: For generator_funcs, do we have documentation for them when used in the FROM clause of a query? Functions like explode are typically considered table-valued generator functions.
@allisonwang-db I am not sure, I don't see document for FROM clause, you may check 3 places: |
cc @srielau |
we can check the documents built in the GA of this PR, https://github.com/zhengruifeng/spark/actions/runs/6249096629 however, it expires after one day |
What is the purpose of "lambda function"? All others are type-specific or "functionality"-specific. |
lambda functions were already exposed to end users (e.g. https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.array_sort.html#pyspark.sql.functions.array_sort). I think if we document other functions here, it is better to add lambda functions as well.
I think this could be an example: when a user try to sort array of structs by a specific order, he may refer to the document of |
Just to be clear, this is automatic documentation based on the current documentation. If the grouping is wrong, or to be fixed, we should fix |
@HyukjinKwon this page is not built from spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala Line 1313 in 6b747ab
|
yeah, I mean individual |
If I try to find a function that sorts arrays I will try to find that function under collection functions. |
Got it. |
e148681
to
91f66d0
Compare
How about having our cake and eat it to? Can a function be in more than one group? |
probably we can. I will try to map I think making a function in more than one group would be much complex. |
91f66d0
to
55866b5
Compare
@srielau I have put |
thanks @srielau @allisonwang-db @HyukjinKwon merged to master |
What changes were proposed in this pull request?
Add missing function groups to SQL references:
Note that this PR doesn't fix
table_funcs
:1,
gen-sql-functions-docs.py
doesn't work properly withTableFunctionRegistry
, I took a cursory look but fail to fix it;2, table functions except
range
(e.g.explode
) were already contained inGenerator Functions
, not sure we need to show them twice.Why are the changes needed?
when referring to the SQL references, I find many functions are missing https://spark.apache.org/docs/latest/sql-ref-functions.html.
Does this PR introduce any user-facing change?
yes
How was this patch tested?
manually check
Was this patch authored or co-authored using generative AI tooling?
no