-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-28801][DOC] Document SELECT statement in SQL Reference (Main page) #27216
Conversation
Test build #116770 has finished for PR 27216 at commit
|
limitations under the License. | ||
--- | ||
|
||
**This page is under construction** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there value in adding these placeholders, vs just adding/linking them when available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen Thanks. Actually, thats the approach we followed when we started this work a few months back. I guess we didn't want to have a link broken at any point. Also, makes it easier for others to contribute without having to rebase ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I know, I'm not sure that was great as not all are filled out. Unless it would be hard to link them everywhere later, I wonder why they need to be here now? I don't see a difference w.r.t merging. I don't feel strongly but think we are just going to end up with more dummy pages that nobody fills out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen OK. Let me remove the links and push.
docs/sql-ref-syntax-qry-select.md
Outdated
|
||
### Related clauses | ||
- [FROM clause](sql-ref-syntax-qry-select-from.html) | ||
- [WHERE clause](sql-ref-syntax-qry-select-where.html) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen Are you okay to keep these or we want to remove the related clauses section add add as we go ? Please let me know.
docs/sql-ref-syntax-qry-select.md
Outdated
<dd> | ||
Hints can be specified to help spark optimizer make better planning decisions. Currently spark supports hints | ||
that influence selection of join strategies and repartitioning of the data. For a detailed explanation, please | ||
refer to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refer to where?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu Thanks a lot for reviewing and catching this. I am going to remove that sentence for now. I wanted to refer to the Hints page from here. I will add it when i have it ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
</dd> | ||
<dt><code><em>ORDER BY</em></code></dt> | ||
<dd> | ||
Specifies an ordering of the rows of the complete result set of the query. The output rows are ordered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about writing the default behaviour here (e.g., direction and null order)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu I have a page written up for ORDER BY where i have explained the sort direction and null order in detail with examples. In this page, i wanted to just briefly introduce the params. what do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
docs/sql-ref-syntax-qry-select.md
Outdated
@@ -18,8 +18,132 @@ license: | | |||
See the License for the specific language governing permissions and | |||
limitations under the License. | |||
--- | |||
Spark supports `SELECT` statement and conforms to ANSI SQL standard. Queries are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SELECT statement
=> a SELECT statement
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ANSI SQL standard
-> the ANSI SQL standard
?
docs/sql-ref-syntax-qry-select.md
Outdated
@@ -18,8 +18,132 @@ license: | | |||
See the License for the specific language governing permissions and | |||
limitations under the License. | |||
--- | |||
Spark supports `SELECT` statement and conforms to ANSI SQL standard. Queries are | |||
used to retrieve result sets from one or more table. The following section |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more table
-> more tables
?
docs/sql-ref-syntax-qry-select.md
Outdated
</dd> | ||
<dt><code><em>boolean_expression</em></code></dt> | ||
<dd> | ||
Specifies a expression with a return type of boolean. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a expression
-> an expression
docs/sql-ref-syntax-qry-select.md
Outdated
<dt><code><em>LIMIT</em></code></dt> | ||
<dd> | ||
Specifies the maximum number of rows that can be returned by a statement or subquery. This clause | ||
is mostly used in the conjunction with <code>ORDER BY</code> to produce deterministic result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deterministic result
-> a deterministic result
?
docs/sql-ref-syntax-qry-select.md
Outdated
<dt><code><em>SORT BY</em></code></dt> | ||
<dd> | ||
Specifies an ordering by which the rows are ordered within each partition. This parameter is mutually | ||
exclusive with <code>ORDER BY</code>, <code>CLUSTER BY</code> and can not be specified together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exclusive with <code>ORDER BY</code>, <code>CLUSTER BY</code> and
=> exclusive with <code>ORDER BY</code> or <code>CLUSTER BY</code>, and
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu yeah.. sounds good. perhaps say "and" as opposed to "or" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, it looks fine to me. (but, I'm not a good English writer, so better to follow the others, hahaha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
de627aa
to
bd4c0ae
Compare
Test build #116903 has finished for PR 27216 at commit
|
Specifies a set of expressions that is used to repartition and sort the rows. Using this clause has | ||
the same effect of using <code>DISTRIBUTE BY</code> and <code>SORT BY</code> together. | ||
</dd> | ||
<dt><code><em>DISTRIBUTE BY</em></code></dt> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy I think we also need a dedicated file for these clauses [SORT BY, CLUSTER BY and DISTRIBUTE BY].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three clauses are very special. It is from Hive. Could we have a simple SELECT and then a full SELECT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile the way i have at the moment is to have separate links for each of the clauses each having its syntax, parameters and examples. So yes, i will have separate links for the clauses you have mentioned. What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you file jiras (sub-tickets) for planned dedicated files having these syntaxes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Also, can you update the screenshot in the PR description? That looks stale. |
docs/sql-ref-syntax-qry-select.md
Outdated
<dd> | ||
Specifies the common table expressions (CTEs) before the main <code>SELECT</code> query block. | ||
These table expressions are allowed to be referenced later in the main query. This is useful to abstract | ||
out repeated sub query blocks in the main query and improves readability of the query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is OK to use either sub query
or subquery
, but it might be better to pick one and keep consistent.
docs/sql-ref-syntax-qry-select.md
Outdated
</dd> | ||
<dt><code><em>from_item</em></code></dt> | ||
<dd> | ||
Specifies a source of input for the query. It can be one of the following. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:
instead of .
in the end of the sentence.
docs/sql-ref-syntax-qry-select.md
Outdated
<dt><code><em>GROUP BY</em></code></dt> | ||
<dd> | ||
Specifies the expressions that are used to group the rows. This is used in conjunction with aggregate functions | ||
(MIN, MAX, COUNT, SUM, AVG) to group rows bsed on the grouping expressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bsed
typo
docs/sql-ref-syntax-qry-select.md
Outdated
<dt><code><em>HAVING</em></code></dt> | ||
<dd> | ||
Specifies the predicates by which the rows produced by GROUP BY are filtered. The HAVING clause is used to | ||
filter rows after the grouping is performed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
in the end of the sentence.
docs/sql-ref-syntax-qry-select.md
Outdated
along with usage examples when applicable. | ||
### Syntax | ||
{% highlight sql %} | ||
[WITH with_query [, ...]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess always have a space between symbol and text, and between symbol and symbol?
[ WITH with_query [ , ... ] ]
Test build #117091 has finished for PR 27216 at commit
|
docs/sql-ref-syntax-qry-select.md
Outdated
</dd> | ||
<dt><code><em>named_expression</em></code></dt> | ||
<dd> | ||
A expression with an assigned name. In general, it denotes a column expression.<br><br> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: An expression
Test build #117131 has finished for PR 27216 at commit
|
Merged to master |
What changes were proposed in this pull request?
Document SELECT statement in SQL Reference Guide. In this PR includes the main
entry page for SELECT. I will open follow-up PRs for different clauses.
Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand the
usage. This is aimed at addressing this issue.
Does this PR introduce any user-facing change?
Yes.
Before:
There was no documentation for this.
After.
![Screen Shot 2020-01-19 at 11 20 41 PM](https://user-images.githubusercontent.com/14225158/72706257-6c42f900-3b12-11ea-821a-171ff035443f.png)
![Screen Shot 2020-01-19 at 11 21 55 PM](https://user-images.githubusercontent.com/14225158/72706313-91d00280-3b12-11ea-90e4-be7174b4593d.png)
![Screen Shot 2020-01-19 at 11 22 16 PM](https://user-images.githubusercontent.com/14225158/72706323-97c5e380-3b12-11ea-99e5-e7aaa3b4df68.png)
How was this patch tested?
Tested using jykyll build --serve