New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32552][SQL][DOCS]Complete the documentation for Table-valued Function #29355
Conversation
|
||
```sql | ||
function_name ( expression [ , ... ] ) [ table_alias ] | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ table_alias ]
doesn't work with the 2nd type of TVF. Seems it doesn't have much value to have a syntax that only has function_name ( expression [ , ... ] )
, so I deleted this section.
cc @maropu |
Test build #127069 has finished for PR 29355 at commit
|
Thanks! I'll check it in hours. |
Ah, could you file a new JIRA ticket just in case? |
Here is the link for HIVE Built-in Table-Generating Functions (UDTF). Should we also include json_tuple and parse_url? |
| 1| 2| | ||
| 3|null| | ||
+----+----+ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove the blank.
@@ -49,6 +31,13 @@ function_name ( expression [ , ... ] ) [ table_alias ] | |||
|**range** ( *start, end* )|Long, Long|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with step value 1.| | |||
|**range** ( *start, end, step* )|Long, Long, Long|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with *step* value.| | |||
|**range** ( *start, end, step, numPartitions* )|Long, Long, Long, Int|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with *step* value, with partition number *numPartitions* specified.| | |||
|**explode** ( *expr* )|Expression|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making a new table for these new entries?
| 1| 2| | ||
| 3|null| | ||
+----+----+ | ||
|
||
``` | ||
|
||
### Related Statements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a link to sql-ref-syntax-qry-select-lateral-view.md
?
@@ -98,6 +87,39 @@ SELECT * FROM range(5, 8) AS test; | |||
| 6| | |||
| 7| | |||
+---+ | |||
|
|||
SELECT explode(array(10, 20)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding examples with LATERAL VIEW?
Specifies a temporary name with an optional column name list. | ||
|
||
**Syntax:** `[ AS ] table_name [ ( column_name [ , ... ] ) ]` | ||
A table-valued function (TVF) is a function that returns a relation or a set of rows. There are two types of TVFs in Spark SQL: 1) A TVF that can be specified in a FROM clause, e.g. range; 2) A TVF that can be specified in a SELECT clause, e.g. explode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: A TVF
-> a TVF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a SELECT clause
-> SELECT/LATERAL VIEW clauses
?
@@ -49,6 +31,13 @@ function_name ( expression [ , ... ] ) [ table_alias ] | |||
|**range** ( *start, end* )|Long, Long|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with step value 1.| | |||
|**range** ( *start, end, step* )|Long, Long, Long|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with *step* value.| | |||
|**range** ( *start, end, step, numPartitions* )|Long, Long, Long, Int|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with *step* value, with partition number *numPartitions* specified.| | |||
|**explode** ( *expr* )|Expression|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expression
-> array or map
?
yea, I think we should. |
@maropu Thanks for your review! I have addressed the comments. Could you please take another look? |
Test build #127161 has finished for PR 29355 at commit
|
Thanks for the update, @huaxingao ! Yea, I'l check it later (in hours). |
|
||
**Syntax:** `[ AS ] table_name [ ( column_name [ , ... ] ) ]` | ||
A table-valued function (TVF) is a function that returns a relation or a set of rows. There are two types of TVFs in Spark SQL: | ||
1. a TVF that can be specified in a FROM clause, e.g. range; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about simply saying 1. A function that can be ...
``` | ||
|
||
### Related Statements | ||
|
||
* [SELECT](sql-ref-syntax-qry-select.html) | ||
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a link in the LATERAL VIEW side to this page?
https://github.com/apache/spark/blame/master/docs/sql-ref-syntax-qry-select-lateral-view.md#L114
|Function|Argument Type(s)|Description| | ||
|--------|----------------|-----------| | ||
|**range** ( *end* )|Long|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from 0 to *end* (exclusive) with step value 1.| | ||
|**range** ( *start, end* )|Long, Long|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with step value 1.| | ||
|**range** ( *start, end, step* )|Long, Long, Long|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with *step* value.| | ||
|**range** ( *start, end, step, numPartitions* )|Long, Long, Long, Int|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with *step* value, with partition number *numPartitions* specified.| | ||
|
||
#### TVFs that can be specified in SELECT/LATERAL VIEW clauses: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a link to this section in https://github.com/apache/spark/blame/master/docs/sql-ref-syntax-qry-select-lateral-view.md#L40 ?
|--------|----------------|-----------| | ||
|**explode** ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.| | ||
|**explode_outer** <br> ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.| | ||
|**inline** ( *expr* )|Expression|Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Expression
is not a type, so we should put a suitable type name here, struct? https://github.com/apache/spark/blame/master/docs/sql-ref-datatypes.md#L59
|
||
|Function|Argument Type(s)|Description| | ||
|--------|----------------|-----------| | ||
|**explode** ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, did you copy&pasge it from ExpressionDescription
? To be honest, its the best to automatically generate this table from it though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe delete this page and add table-valued functions under scalar functions in sql-ref-functions page?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. The suggestion sounds reasonable. Could you try it? cc: @gatorsmile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu
I tried to generate table for tvf, but have two questions:
range
is not inexpressions
package, I can't generate table for it, right? I think I will still keep this page and keep the currentrange
table, but generate table forexplode
.json_tuple
also belongs toJSONFunctions
. Is there a way to put this function in two groups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Thanks for the check. I think it is difficult to automatically generate it now. So, its okay as it is.
retest this please |
…Function # What changes were proposed in this pull request? There are two types of TVF. We only documented one type. Adding the doc for the 2nd type. ### Why are the changes needed? complete Table-valued Function doc ### Does this PR introduce _any_ user-facing change? <img width="1099" alt="Screen Shot 2020-08-06 at 5 30 25 PM" src="https://user-images.githubusercontent.com/13592258/89595926-c5eae680-d80a-11ea-918b-0c3646f9930e.png"> <img width="1100" alt="Screen Shot 2020-08-06 at 5 30 49 PM" src="https://user-images.githubusercontent.com/13592258/89595929-c84d4080-d80a-11ea-9803-30eb502ccd05.png"> <img width="1101" alt="Screen Shot 2020-08-06 at 5 31 19 PM" src="https://user-images.githubusercontent.com/13592258/89595931-ca170400-d80a-11ea-8812-2f009746edac.png"> <img width="1100" alt="Screen Shot 2020-08-06 at 5 31 40 PM" src="https://user-images.githubusercontent.com/13592258/89595934-cb483100-d80a-11ea-9e18-9357aa9f2c5c.png"> ### How was this patch tested? Manually build and check Closes #29355 from huaxingao/tvf. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org> (cherry picked from commit db74fd0) Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
Thanks, @huaxingao ! Merged to master/branch-3.0. cc: @gatorsmile |
Test build #127822 has finished for PR 29355 at commit
|
Thanks a lot! @maropu |
What changes were proposed in this pull request?
There are two types of TVF. We only documented one type. Adding the doc for the 2nd type.
Why are the changes needed?
complete Table-valued Function doc
Does this PR introduce any user-facing change?
How was this patch tested?
Manually build and check