Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33877][SQL] SQL reference documents for INSERT w/ a column list #30888

Closed
wants to merge 3 commits into from
Closed

[SPARK-33877][SQL] SQL reference documents for INSERT w/ a column list #30888

wants to merge 3 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Dec 22, 2020

We support a column list of INSERT for Spark v3.1.0 (See: SPARK-32976 (#29893)). So, this PR targets at documenting it in the SQL documents.

What changes were proposed in this pull request?

improve doc

Why are the changes needed?

Does this PR introduce any user-facing change?

doc

How was this patch tested?

passing GA doc gen.

image
image

image
image

@yaooqinn
Copy link
Member Author

cc @maropu @HyukjinKwon @gatorsmile @cloud-fan thanks for reviewing, and thanks @maropu for the JIRA ticket for kindly reminding

@github-actions github-actions bot added the DOCS label Dec 22, 2020
@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Test build #133215 has finished for PR 30888 at commit bb7f271.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37813/

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating it, @yaooqinn !

@@ -26,7 +26,7 @@ The `INSERT INTO` statement inserts new rows into a table. The inserted rows can
### Syntax

```sql
INSERT INTO [ TABLE ] table_identifier [ partition_spec ]
INSERT INTO [ TABLE ] table_identifier [ partition_spec ] [ column_list ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ column_list ] -> [ ( column_list ) ]

@@ -45,6 +45,13 @@ INSERT INTO [ TABLE ] table_identifier [ partition_spec ]

**Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )`

* **column_list**

An optional parameter that specifies a comma separated list of columns belong to the `table_identifier`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

  • comma separated -> comma-separated
  • belong -> belonging
  • the table_identifier -> the table_identifier table

An optional parameter that specifies a comma separated list of columns belong to the `table_identifier`.
All specified columns should exist in the `table_identifier` and not be duplicated from each other. It includes all columns except the static partition columns.
The size of the column list should be exactly the size of the data from `VALUES` clause or query.
The order of the column list is alterable and determines how the data from `VALUES` clause or query to be inserted by position.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about organizing it like this (it seems some statements above describe the current limitations):

  * **colunn_list**
  <param description>
 
  **Note**
  The current behaviour has some limitations:
  1. The column list should contain all the column names in the `table_identifier` table.
  2. ...

(ref: https://github.com/apache/spark/blame/master/docs/sql-ref-syntax-qry-select-having.md#L43-L48)

+------------+----------------------+----------+
|Kent Yao Jr.|Hangzhou, China | 11215017|
+------------+----------------------+----------+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this blank.

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37813/

@yaooqinn
Copy link
Member Author

comments addressed thank you @maropu

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Test build #133218 has finished for PR 30888 at commit 5d22049.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37816/

@SparkQA
Copy link

SparkQA commented Dec 22, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37816/


```sql
INSERT INTO students PARTITION (student_id = 11215017) (address, name) VALUES
('Hangzhou, China', 'Kent Yao Jr.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 Kent Yao Jr.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu
Copy link
Member

maropu commented Dec 22, 2020

For reviews, could you put the screenshot of the updated doc in the PR description?

dongjoon-hyun pushed a commit that referenced this pull request Dec 23, 2020
We support a column list of INSERT for Spark v3.1.0 (See: SPARK-32976 (#29893)). So, this PR targets at documenting it in the SQL documents.

### What changes were proposed in this pull request?

improve doc
### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

doc
### How was this patch tested?

passing GA doc gen.

![image](https://user-images.githubusercontent.com/8326978/102954876-8994fa00-450f-11eb-81f9-931af6d1f69b.png)
![image](https://user-images.githubusercontent.com/8326978/102954900-99acd980-450f-11eb-9733-115ad37d2319.png)

![image](https://user-images.githubusercontent.com/8326978/102954935-af220380-450f-11eb-9aaa-fdae0725d41e.png)
![image](https://user-images.githubusercontent.com/8326978/102954949-bc3ef280-450f-11eb-8a0d-d7b688efa7bb.png)

Closes #30888 from yaooqinn/SPARK-33877.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit a3dd8da)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun
Copy link
Member

Merged to master/3.1. Thank you, @yaooqinn and @maropu !

@SparkQA
Copy link

SparkQA commented Dec 23, 2020

Test build #133258 has finished for PR 30888 at commit 8249fc5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 23, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37856/

@SparkQA
Copy link

SparkQA commented Dec 23, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37856/

**Note:**The current behaviour has some limitations:
- All specified columns should exist in the table and not be duplicated from each other. It includes all columns except the static partition columns.
- The size of the column list should be exactly the size of the data from `VALUES` clause or query.
- The order of the column list is alterable and determines how the data from `VALUES` clause or query to be inserted by position.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a limitation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, It more sounds just like describing its behaviour instead of a limitation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about removing: The current behavior has some limitations:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the last point to the description?

An optional parameter that specifies .... Spark will reorder the columns of the input query to match the table schema according to the specified column list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made #30909

dongjoon-hyun pushed a commit that referenced this pull request Dec 23, 2020
…column list

### What changes were proposed in this pull request?

followup of a3dd8da via suggestion #30888 (comment)
### Why are the changes needed?

doc improvement
### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

passing GA doc

Closes #30909 from yaooqinn/SPARK-33877-F.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request Dec 23, 2020
…column list

### What changes were proposed in this pull request?

followup of a3dd8da via suggestion #30888 (comment)
### Why are the changes needed?

doc improvement
### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

passing GA doc

Closes #30909 from yaooqinn/SPARK-33877-F.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 368a2c3)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
xuanyuanking pushed a commit to xuanyuanking/spark that referenced this pull request Sep 29, 2021
We support a column list of INSERT for Spark v3.1.0 (See: SPARK-32976 (apache#29893)). So, this PR targets at documenting it in the SQL documents.

### What changes were proposed in this pull request?

improve doc
### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

doc
### How was this patch tested?

passing GA doc gen.

![image](https://user-images.githubusercontent.com/8326978/102954876-8994fa00-450f-11eb-81f9-931af6d1f69b.png)
![image](https://user-images.githubusercontent.com/8326978/102954900-99acd980-450f-11eb-9733-115ad37d2319.png)

![image](https://user-images.githubusercontent.com/8326978/102954935-af220380-450f-11eb-9aaa-fdae0725d41e.png)
![image](https://user-images.githubusercontent.com/8326978/102954949-bc3ef280-450f-11eb-8a0d-d7b688efa7bb.png)

Closes apache#30888 from yaooqinn/SPARK-33877.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
xuanyuanking pushed a commit to xuanyuanking/spark that referenced this pull request Sep 29, 2021
…column list

### What changes were proposed in this pull request?

followup of apache@a3dd8da via suggestion apache#30888 (comment)
### Why are the changes needed?

doc improvement
### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

passing GA doc

Closes apache#30909 from yaooqinn/SPARK-33877-F.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants