Skip to content

[SPARK-31110][DOCS][SQL] refine sql doc for SELECT#27866

Closed
cloud-fan wants to merge 4 commits intoapache:masterfrom
cloud-fan:doc
Closed

[SPARK-31110][DOCS][SQL] refine sql doc for SELECT#27866
cloud-fan wants to merge 4 commits intoapache:masterfrom
cloud-fan:doc

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Mar 10, 2020

What changes were proposed in this pull request?

A few improvements to the sql ref SELECT doc:

  1. correct the syntax of SELECT query
  2. correct the default of null sort order
  3. correct the GROUP BY syntax
  4. several minor fixes

Why are the changes needed?

refine document

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

N/A

(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My editor removes trailing spaces automatically

{% highlight sql %}
GROUP BY [ GROUPING SETS grouping_sets ] group_expression [ , group_expression [ , ... ] ]
[ ( WITH ROLLUP | WITH CUBE | GROUPING SETS grouping_sets ) ) ]
[ ( WITH ROLLUP | WITH CUBE | GROUPING SETS grouping_sets ) ]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the extra )

<code> ((warehouse, product), (warehouse), ())</code>.
Specifies multiple levels of aggregations in a single statement. This clause is used to compute aggregations
based on multiple grouping sets. <code>ROLLUP</code> is a shorthand for <code>GROUPING SETS</code>. For example,
<code>GROUP BY warehouse, product WITH ROLLUP</code> is equivalent to <code>GROUP BY GROUPING SETS
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GROUP BY GROUPING SETS ... is the same as GROUP BY a, b GROUPING SETS but shorter.

sort order is <code>DESC</code>.<br><br>
Optionally specifies whether NULL values are returned before/after non-NULL values. If
<code>null_sort_order</code> is not specified, then NULLs sort first if sort order is
<code>ASC</code> and NULLS sort last if sort order is <code>DESC</code>.<br><br>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to mention In Spark, NULL values are considered to be lower than any non-NULL values, as it's not related to the user-facing behavior.

<code>ASC</code> and NULLS sort last if sort order is <code>DESC</code>.<br><br>
<ol>
<li> If <code>NULLS FIRST</code> (the default) is specified, then NULL values are returned first
<li> If <code>NULLS FIRST</code> is specified, then NULL values are returned first
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default value depends on the sort order.

{% highlight sql %}
[ WITH with_query [ , ... ] ]
select_statement [ { UNION | INTERSECT | EXCEPT } select_statement, ... ]
[ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to our g4 file, a query can have multiple SELECT but only one ORDER BY/SORT BY/...

@cloud-fan
Copy link
Contributor Author

@SparkQA
Copy link

SparkQA commented Mar 10, 2020

Test build #119622 has finished for PR 27866 at commit 7ba555e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

### Syntax
{% highlight sql %}
[ WITH with_query [ , ... ] ]
select_statement [ { UNION | INTERSECT | EXCEPT } select_statement, ... ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ ALL | DISTINCT ] seems to be missed while transition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @cloud-fan . They are not the same. I mean UNION ALL, not SELECT ALL.

scala> sql("select 1 union all (select all 1)").show
+---+
|  1|
+---+
|  1|
|  1|
+---+

[ WITH with_query [ , ... ] ]
select_statement [ { UNION | INTERSECT | EXCEPT } select_statement, ... ]
[ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
[ SORT BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- SORT  BY
+ SORT BY

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (with two minor comments.)
Thank you, @cloud-fan .

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-31110][SQL] refine sql doc for SELECT [SPARK-31110][DOCS][SQL] refine sql doc for SELECT Mar 10, 2020
|Shone S| 16|
+-------+---+

-- A non-foldable expression as an input to limit is not allowed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: limit -> LIMIT?

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change can make the docs cleaner! LGTM except the existing comments.

<b>Syntax:</b>
<code>
( () | ( expression [ , ... ] ) )
( '()' | ( expression [ , ... ] ) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syntax here indicates one group set or more? In above the syntax is GROUP BY [ GROUPING SETS grouping_sets ] group_expression [ , group_expression [ , ... ] ].

For example, group_expression here is for just one group expression.

@SparkQA
Copy link

SparkQA commented Mar 11, 2020

Test build #119646 has finished for PR 27866 at commit d40afe2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


### Syntax
{% highlight sql %}
GROUP BY [ GROUPING SETS grouping_sets ] group_expression [ , group_expression [ , ... ] ]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya can you help review this file? thanks!

Copy link
Member

@viirya viirya Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late. This change LGTM.

@SparkQA
Copy link

SparkQA commented Mar 11, 2020

Test build #119663 has finished for PR 27866 at commit 987034e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

@cloud-fan . Please don't forget the following

@SparkQA
Copy link

SparkQA commented Mar 11, 2020

Test build #119678 has finished for PR 27866 at commit 5742080.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master/3.0. Thank you all!

dongjoon-hyun pushed a commit that referenced this pull request Mar 11, 2020
### What changes were proposed in this pull request?

A few improvements to the sql ref SELECT doc:
1. correct the syntax of SELECT query
2. correct the default of null sort order
3. correct the GROUP BY syntax
4. several minor fixes

### Why are the changes needed?

refine document

### Does this PR introduce any user-facing change?

N/A

### How was this patch tested?

N/A

Closes #27866 from cloud-fan/doc.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 0f0ccda)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?

A few improvements to the sql ref SELECT doc:
1. correct the syntax of SELECT query
2. correct the default of null sort order
3. correct the GROUP BY syntax
4. several minor fixes

### Why are the changes needed?

refine document

### Does this PR introduce any user-facing change?

N/A

### How was this patch tested?

N/A

Closes apache#27866 from cloud-fan/doc.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments