Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31383][SQL][DOC] Clean up the SQL documents in docs/sql-ref* #28151

Closed
wants to merge 3 commits into from

Conversation

maropu
Copy link
Member

@maropu maropu commented Apr 8, 2020

What changes were proposed in this pull request?

This PR intends to clean up the SQL documents in doc/sql-ref*.
Main changes are as follows;

  • Fixes wrong syntaxes and capitalize sub-titles
  • Adds some DDL queries in Examples so that users can run examples there
  • Makes query output in Examples follows the Dataset.showString (right-aligned) format
  • Adds/Removes spaces, Indents, or blank lines to follow the format below;
---
license...
---

### Description

Writes what's the syntax is.

### Syntax

{% highlight sql %}
SELECT...
    WHERE... // 4 indents after the second line
    ...
{% endhighlight %}

### Parameters

<dl>
  <!-- 2 indents in this section -->
  <dt><code><em>Param Name</em></code></dt>
  <dd>
    Param Description
  </dd>
  ...
</dl>

### Examples

{% highlight sql %}
-- It is better that users are able to execute example queries here.
-- So, we prepare test data in the first section if possible.
CREATE TABLE t (key STRING, value DOUBLE);
INSERT INTO t VALUES
    ('a', 1.0), ('a', 2.0), ('b', 3.0), ('c', 4.0);

-- query output has 2 indents and it follows the `Dataset.showString`
-- format (right-aligned).
SELECT * FROM t;
  +---+-----+
  |key|value|
  +---+-----+
  |  a|  1.0|
  |  a|  2.0|
  |  b|  3.0|
  |  c|  4.0|
  +---+-----+

-- Query statements after the second line have 4 indents.
SELECT key, SUM(value)
    FROM t
    GROUP BY key;
  +---+----------+                                                                
  |key|sum(value)|
  +---+----------+
  |  c|       4.0|
  |  b|       3.0|
  |  a|       3.0|
  +---+----------+
...
{% endhighlight %}

### Related Statements

 * [XXX](xxx.html)
 * ...

Why are the changes needed?

The most changes of this PR are pretty minor, but I think the consistent formats/rules to write documents are important for long-term maintenance in our community

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

Manually checked.

@maropu
Copy link
Member Author

maropu commented Apr 8, 2020

@gatorsmile @huaxingao @kevinyu98 @srowen I've roughly checked the current whole SQL docs and fixed minor&trivial issues for consistency across the docs. How about this update?

@SparkQA
Copy link

SparkQA commented Apr 8, 2020

Test build #120959 has finished for PR 28151 at commit 0c56949.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's just cleanup, fine. This is for 3.0?
I skimmed the changes and it looks indeed like just formatting, etc.

|Fred|50 |
|Dan |50 |
|Fred| 50|
| Da | 50|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Dan

| col_name| data_type|comment|
+--------------------+--------------------+-------+
| name| string| null|
| student_id| int| null|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null -> NULL?
Shall we change all the null to NULL in tables?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, null looks ok because Dataset.showString does so;

scala> Seq(Some(1), None).toDF("a").show()
+----+
|   a|
+----+
|   1|
|null|
+----+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are SQL examples. SQL uses NULL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not faimilar with that though, the output format looks implementation-specific. For example, PostgreSQL uses an empty cell for null by default. Anyway, I like the simple document rule like Query output in Examples should follow the Dataset.showString (right-aligned) format. Since we can easily replace it later, I leave it as it is in this PR. If necessary, please do follow-up later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

|name|rollno|age|
+----+------+---+
+----+------+---+
No rows selected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This line is a comment, right? put -- in front of it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. Probably, we can remove this. Thanks!

@huaxingao
Copy link
Contributor

@maropu Thanks a lot for doing this!

@huaxingao
Copy link
Contributor

This is for 3.0. @srowen

@maropu
Copy link
Member Author

maropu commented Apr 8, 2020

Yea, this is just a clean-up for 3.0.


#### Examples

{% highlight sql %}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huaxingao I moved the examples into the seb-sections of EXCEPT/INTERSECT/UNION for readability. Could you check this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks!

@maropu
Copy link
Member Author

maropu commented Apr 8, 2020

@huaxingao @kevinyu98 If no problem, could you apply the same cleanup for your open PRs at your convenience?

@SparkQA
Copy link

SparkQA commented Apr 9, 2020

Test build #120988 has finished for PR 28151 at commit a7c5e23.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@huaxingao
Copy link
Contributor

I will reformat my open PRs.

|-- name: string (nullable = true)
|-- grade: integer (nullable = true)
+--------+---------+-----------+--------------------------------------------------------------+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this output need to be right-aligned? also the following SHOW TABLE output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea right, but the information cell itself assumes to be left-aligned, so I left it as it is. Probably, this is an exception case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

| age=11|
| age=12|
| age=15|
+---------+

ALTER TABLE StudentInfo ADD IF NOT EXISTS PARTITION (age=18);

-- After adding a new partition to the table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to leave space after --? I saw some have one space, some didn't have.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, you're right. I'll remove that. thanks.

@SparkQA
Copy link

SparkQA commented Apr 9, 2020

Test build #121008 has finished for PR 28151 at commit 5f6677a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Apr 9, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Apr 9, 2020

Test build #121012 has finished for PR 28151 at commit 5f6677a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


--Default Output
{% highlight sql %}
-- Default Output

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need leave space here? also the following -- in this file.

@SparkQA
Copy link

SparkQA commented Apr 10, 2020

Test build #121058 has finished for PR 28151 at commit 945b28c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Apr 10, 2020

@srowen Could you check?

@huaxingao
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Apr 13, 2020

Test build #121156 has finished for PR 28151 at commit 658fc77.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 13, 2020

Test build #121158 has finished for PR 28151 at commit 2022c42.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen srowen closed this in 179289f Apr 13, 2020
srowen pushed a commit that referenced this pull request Apr 13, 2020
### What changes were proposed in this pull request?

This PR intends to clean up the SQL documents in `doc/sql-ref*`.
Main changes are as follows;

 - Fixes wrong syntaxes and capitalize sub-titles
 - Adds some DDL queries in `Examples` so that users can run examples there
 - Makes query output in `Examples` follows the `Dataset.showString` (right-aligned) format
 - Adds/Removes spaces, Indents, or blank lines to follow the format below;

```
---
license...
---

### Description

Writes what's the syntax is.

### Syntax

{% highlight sql %}
SELECT...
    WHERE... // 4 indents after the second line
    ...
{% endhighlight %}

### Parameters

<dl>

  <dt><code><em>Param Name</em></code></dt>
  <dd>
    Param Description
  </dd>
  ...
</dl>

### Examples

{% highlight sql %}
-- It is better that users are able to execute example queries here.
-- So, we prepare test data in the first section if possible.
CREATE TABLE t (key STRING, value DOUBLE);
INSERT INTO t VALUES
    ('a', 1.0), ('a', 2.0), ('b', 3.0), ('c', 4.0);

-- query output has 2 indents and it follows the `Dataset.showString`
-- format (right-aligned).
SELECT * FROM t;
  +---+-----+
  |key|value|
  +---+-----+
  |  a|  1.0|
  |  a|  2.0|
  |  b|  3.0|
  |  c|  4.0|
  +---+-----+

-- Query statements after the second line have 4 indents.
SELECT key, SUM(value)
    FROM t
    GROUP BY key;
  +---+----------+
  |key|sum(value)|
  +---+----------+
  |  c|       4.0|
  |  b|       3.0|
  |  a|       3.0|
  +---+----------+
...
{% endhighlight %}

### Related Statements

 * [XXX](xxx.html)
 * ...
```

### Why are the changes needed?

The most changes of this PR are pretty minor, but I think the consistent formats/rules to write documents are important for long-term maintenance in our community

### Does this PR introduce any user-facing change?

Yes.

### How was this patch tested?

Manually checked.

Closes #28151 from maropu/MakeRightAligned.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Sean Owen <srowen@gmail.com>
(cherry picked from commit 179289f)
Signed-off-by: Sean Owen <srowen@gmail.com>
@srowen
Copy link
Member

srowen commented Apr 13, 2020

Merged to master/3.0

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?

This PR intends to clean up the SQL documents in `doc/sql-ref*`.
Main changes are as follows;

 - Fixes wrong syntaxes and capitalize sub-titles
 - Adds some DDL queries in `Examples` so that users can run examples there
 - Makes query output in `Examples` follows the `Dataset.showString` (right-aligned) format
 - Adds/Removes spaces, Indents, or blank lines to follow the format below;

```
---
license...
---

### Description

Writes what's the syntax is.

### Syntax

{% highlight sql %}
SELECT...
    WHERE... // 4 indents after the second line
    ...
{% endhighlight %}

### Parameters

<dl>

  <dt><code><em>Param Name</em></code></dt>
  <dd>
    Param Description
  </dd>
  ...
</dl>

### Examples

{% highlight sql %}
-- It is better that users are able to execute example queries here.
-- So, we prepare test data in the first section if possible.
CREATE TABLE t (key STRING, value DOUBLE);
INSERT INTO t VALUES
    ('a', 1.0), ('a', 2.0), ('b', 3.0), ('c', 4.0);

-- query output has 2 indents and it follows the `Dataset.showString`
-- format (right-aligned).
SELECT * FROM t;
  +---+-----+
  |key|value|
  +---+-----+
  |  a|  1.0|
  |  a|  2.0|
  |  b|  3.0|
  |  c|  4.0|
  +---+-----+

-- Query statements after the second line have 4 indents.
SELECT key, SUM(value)
    FROM t
    GROUP BY key;
  +---+----------+
  |key|sum(value)|
  +---+----------+
  |  c|       4.0|
  |  b|       3.0|
  |  a|       3.0|
  +---+----------+
...
{% endhighlight %}

### Related Statements

 * [XXX](xxx.html)
 * ...
```

### Why are the changes needed?

The most changes of this PR are pretty minor, but I think the consistent formats/rules to write documents are important for long-term maintenance in our community

### Does this PR introduce any user-facing change?

Yes.

### How was this patch tested?

Manually checked.

Closes apache#28151 from maropu/MakeRightAligned.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Sean Owen <srowen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants