Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32552][SQL][DOCS]Complete the documentation for Table-valued Function #29355

Closed
wants to merge 2 commits into from

Conversation

huaxingao
Copy link
Contributor

@huaxingao huaxingao commented Aug 4, 2020

What changes were proposed in this pull request?

There are two types of TVF. We only documented one type. Adding the doc for the 2nd type.

Why are the changes needed?

complete Table-valued Function doc

Does this PR introduce any user-facing change?

Screen Shot 2020-08-06 at 5 30 25 PM

Screen Shot 2020-08-06 at 5 30 49 PM

Screen Shot 2020-08-06 at 5 31 19 PM

Screen Shot 2020-08-06 at 5 31 40 PM

How was this patch tested?

Manually build and check


```sql
function_name ( expression [ , ... ] ) [ table_alias ]
```
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ table_alias ] doesn't work with the 2nd type of TVF. Seems it doesn't have much value to have a syntax that only has function_name ( expression [ , ... ] ), so I deleted this section.

@huaxingao
Copy link
Contributor Author

cc @maropu

@SparkQA
Copy link

SparkQA commented Aug 4, 2020

Test build #127069 has finished for PR 29355 at commit fda6ecf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Aug 6, 2020

Thanks! I'll check it in hours.

@maropu
Copy link
Member

maropu commented Aug 6, 2020

Ah, could you file a new JIRA ticket just in case?

@huaxingao huaxingao changed the title [SPARK-31419][SQL][DOCS][FOLLOW-UP]Complete the documentation for Table-valued Function [SPARK-32552][SQL][DOCS]Complete the documentation for Table-valued Function Aug 6, 2020
@huaxingao
Copy link
Contributor Author

@huaxingao
Copy link
Contributor Author

Here is the link for HIVE Built-in Table-Generating Functions (UDTF). Should we also include json_tuple and parse_url?

| 1| 2|
| 3|null|
+----+----+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove the blank.

@@ -49,6 +31,13 @@ function_name ( expression [ , ... ] ) [ table_alias ]
|**range** ( *start, end* )|Long, Long|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with step value 1.|
|**range** ( *start, end, step* )|Long, Long, Long|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with *step* value.|
|**range** ( *start, end, step, numPartitions* )|Long, Long, Long, Int|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with *step* value, with partition number *numPartitions* specified.|
|**explode** ( *expr* )|Expression|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making a new table for these new entries?

| 1| 2|
| 3|null|
+----+----+

```

### Related Statements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a link to sql-ref-syntax-qry-select-lateral-view.md?

@@ -98,6 +87,39 @@ SELECT * FROM range(5, 8) AS test;
| 6|
| 7|
+---+

SELECT explode(array(10, 20));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding examples with LATERAL VIEW?

Specifies a temporary name with an optional column name list.

**Syntax:** `[ AS ] table_name [ ( column_name [ , ... ] ) ]`
A table-valued function (TVF) is a function that returns a relation or a set of rows. There are two types of TVFs in Spark SQL: 1) A TVF that can be specified in a FROM clause, e.g. range; 2) A TVF that can be specified in a SELECT clause, e.g. explode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: A TVF -> a TVF

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a SELECT clause -> SELECT/LATERAL VIEW clauses?

@@ -49,6 +31,13 @@ function_name ( expression [ , ... ] ) [ table_alias ]
|**range** ( *start, end* )|Long, Long|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with step value 1.|
|**range** ( *start, end, step* )|Long, Long, Long|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with *step* value.|
|**range** ( *start, end, step, numPartitions* )|Long, Long, Long, Int|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with *step* value, with partition number *numPartitions* specified.|
|**explode** ( *expr* )|Expression|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expression -> array or map?

@maropu
Copy link
Member

maropu commented Aug 6, 2020

Here is the link for HIVE Built-in Table-Generating Functions (UDTF). Should we also include json_tuple and parse_url?

yea, I think we should.

@huaxingao
Copy link
Contributor Author

@maropu Thanks for your review! I have addressed the comments. Could you please take another look?

@SparkQA
Copy link

SparkQA commented Aug 7, 2020

Test build #127161 has finished for PR 29355 at commit 292eb02.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Aug 7, 2020

@maropu Thanks for your review! I have addressed the comments. Could you please take another look?

Thanks for the update, @huaxingao ! Yea, I'l check it later (in hours).


**Syntax:** `[ AS ] table_name [ ( column_name [ , ... ] ) ]`
A table-valued function (TVF) is a function that returns a relation or a set of rows. There are two types of TVFs in Spark SQL:
1. a TVF that can be specified in a FROM clause, e.g. range;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about simply saying 1. A function that can be ...

```

### Related Statements

* [SELECT](sql-ref-syntax-qry-select.html)
* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|Function|Argument Type(s)|Description|
|--------|----------------|-----------|
|**range** ( *end* )|Long|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from 0 to *end* (exclusive) with step value 1.|
|**range** ( *start, end* )|Long, Long|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with step value 1.|
|**range** ( *start, end, step* )|Long, Long, Long|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with *step* value.|
|**range** ( *start, end, step, numPartitions* )|Long, Long, Long, Int|Creates a table with a single *LongType* column named *id*, <br/> containing rows in a range from *start* to *end* (exclusive) with *step* value, with partition number *numPartitions* specified.|

#### TVFs that can be specified in SELECT/LATERAL VIEW clauses:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|--------|----------------|-----------|
|**explode** ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.|
|**explode_outer** <br> ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.|
|**inline** ( *expr* )|Expression|Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise.|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Expression is not a type, so we should put a suitable type name here, struct? https://github.com/apache/spark/blame/master/docs/sql-ref-datatypes.md#L59


|Function|Argument Type(s)|Description|
|--------|----------------|-----------|
|**explode** ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, did you copy&pasge it from ExpressionDescription? To be honest, its the best to automatically generate this table from it though...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe delete this page and add table-valued functions under scalar functions in sql-ref-functions page?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. The suggestion sounds reasonable. Could you try it? cc: @gatorsmile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu
I tried to generate table for tvf, but have two questions:

  1. range is not in expressions package, I can't generate table for it, right? I think I will still keep this page and keep the current range table, but generate table for explode.
  2. json_tuple also belongs to JSONFunctions. Is there a way to put this function in two groups?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Thanks for the check. I think it is difficult to automatically generate it now. So, its okay as it is.

@maropu
Copy link
Member

maropu commented Aug 24, 2020

retest this please

@maropu maropu closed this in db74fd0 Aug 24, 2020
maropu pushed a commit that referenced this pull request Aug 24, 2020
…Function

# What changes were proposed in this pull request?
There are two types of TVF. We only documented one type. Adding the doc for the 2nd type.

### Why are the changes needed?
complete Table-valued Function doc

### Does this PR introduce _any_ user-facing change?
<img width="1099" alt="Screen Shot 2020-08-06 at 5 30 25 PM" src="https://user-images.githubusercontent.com/13592258/89595926-c5eae680-d80a-11ea-918b-0c3646f9930e.png">

<img width="1100" alt="Screen Shot 2020-08-06 at 5 30 49 PM" src="https://user-images.githubusercontent.com/13592258/89595929-c84d4080-d80a-11ea-9803-30eb502ccd05.png">

<img width="1101" alt="Screen Shot 2020-08-06 at 5 31 19 PM" src="https://user-images.githubusercontent.com/13592258/89595931-ca170400-d80a-11ea-8812-2f009746edac.png">

<img width="1100" alt="Screen Shot 2020-08-06 at 5 31 40 PM" src="https://user-images.githubusercontent.com/13592258/89595934-cb483100-d80a-11ea-9e18-9357aa9f2c5c.png">

### How was this patch tested?
Manually build and check

Closes #29355 from huaxingao/tvf.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(cherry picked from commit db74fd0)
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
@maropu
Copy link
Member

maropu commented Aug 24, 2020

Thanks, @huaxingao ! Merged to master/branch-3.0. cc: @gatorsmile

@SparkQA
Copy link

SparkQA commented Aug 24, 2020

Test build #127822 has finished for PR 29355 at commit 292eb02.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@huaxingao
Copy link
Contributor Author

Thanks a lot! @maropu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants