[SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference #28451

huaxingao · 2020-05-04T18:54:26Z

What changes were proposed in this pull request?

Remove the unneeded embedded inline HTML markup by using the basic markdown syntax.
Please see #28414

Why are the changes needed?

Make the doc cleaner and easily editable by MD editors.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manually build and check

SparkQA · 2020-05-04T21:01:31Z

Test build #122290 has finished for PR 28451 at commit 0b463cf.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-04T22:02:42Z

Test build #122287 has finished for PR 28451 at commit d3033f1.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

huaxingao · 2020-05-04T22:40:39Z

docs/sql-ref-identifier.md

 -- This CREATE TABLE fails with ParseException because of the illegal identifier name a.b
 CREATE TABLE test (a.b int);
-org.apache.spark.sql.catalyst.parser.ParseException:
-no viable alternative at input 'CREATE TABLE test (a.'(line 1, pos 20)
+  org.apache.spark.sql.catalyst.parser.ParseException:


Since we indent 2 spaces for error messages in other place, I will do the same here.

Rather, we should remove indents in the other places for following the result format?

SparkQA · 2020-05-04T22:52:56Z

Test build #122292 has finished for PR 28451 at commit b1c6a4a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

huaxingao · 2020-05-04T23:13:57Z

cc @dilipbiswal @maropu @gatorsmile

dilipbiswal · 2020-05-05T00:49:57Z

docs/sql-ref-ansi-compliance.md

@@ -66,7 +66,7 @@ This means that in case an operation causes overflows, the result is the same wi
 On the other hand, Spark SQL returns null for decimal overflows.
 When `spark.sql.ansi.enabled` is set to `true` and an overflow occurs in numeric and interval arithmetic operations, it throws an arithmetic exception at runtime.

-{% highlight sql %}
+```sql
 -- `spark.sql.ansi.enabled=true`


@huaxingao I know that it's not related to the format change that you r doing in this PR. But shouldn't we have a SET statement here, so users can cut-paste the command in their shell to see the behavior ? Perhaps we discussed it in the pr that added this clause. Just a question :-)

I don't have a strong opinion on this. seems to me comment is OK too.

maropu · 2020-05-04T23:39:05Z

docs/_data/menu-sql.yaml

                - text: JOIN
                  url: sql-ref-syntax-qry-select-join.html
                - text: Join Hints
                  url: sql-ref-syntax-qry-select-hints.html
+                - text: LIKE Predicate
+                  url: sql-ref-syntax-qry-select-like.html
                - text: Set Operators


Why do we need the changes in this file?

I didn't change the order of the first 8 clauses. I think these should be grouped together. But I changed the rest to make them alphabetical order.

maropu · 2020-05-05T00:50:36Z

docs/sql-ref-functions-udf-aggregate.md

+--------------+
+|        3750.0|
+--------------+
+```
 </div>


We cannot avoid this tag, too?

I will take a look at this.

This is for examples <div class="codetabs">. I prefer to keep this since we use this format for all the examples.

maropu · 2020-05-05T00:52:43Z

docs/sql-ref-identifier.md

 -- This CREATE TABLE fails with ParseException because of the illegal identifier name a.b
 CREATE TABLE test (a.b int);
-org.apache.spark.sql.catalyst.parser.ParseException:
-no viable alternative at input 'CREATE TABLE test (a.'(line 1, pos 20)
+  org.apache.spark.sql.catalyst.parser.ParseException:


Rather, we should remove indents in the other places for following the result format?

huaxingao · 2020-05-05T01:27:02Z

Rather, we should remove indents in the other places for following the result format?

It's better to remove indents. Will spend some time to find all the error messages.

SparkQA · 2020-05-05T01:38:01Z

Test build #122296 has finished for PR 28451 at commit 66d82ca.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2020-05-05T02:00:29Z

docs/sql-ref-literals.md

@@ -35,22 +35,19 @@ A string literal is used to specify a character string value.

 #### Syntax

-{% highlight sql %}
+```sql
 'c [ ... ]' | "c [ ... ]"


@huaxingao the parameter c kind of looks weird especially in new format ? What do you think of character or any_char or something like that ?

changed to char

dilipbiswal · 2020-05-05T02:05:36Z

docs/sql-ref-literals.md


-<dl>
-  <dt><code><em>c</em></code></dt>
-  <dd>
    One character from the character set.


I believe there is limitation on the chars that are allowed in the binary literal ?
for example, i tried :
SELECT X'zzzzzz' AS col and got an exception ?

seems to be hexadecimal. Changed to the following:

#### Syntax X { 'num [ ... ]' | "num [ ... ]" } #### Parameters * **num** Any hexadecimal number from 0 to F.

cc @yaooqinn

dilipbiswal · 2020-05-05T05:42:56Z

docs/sql-ref-identifier.md

 { letter | digit | '_' } [ , ... ]
-{% endhighlight %}
+```
 Note: If `spark.sql.ansi.enabled` is set to true, ANSI SQL reserved keywords cannot be used as identifiers. For more details, please refer to [ANSI Compliance](sql-ref-ansi-compliance.html).


@huaxingao Should we bold "Note" ? I see that in other places we do bold it.

SparkQA · 2020-05-05T05:48:50Z

Test build #122301 has finished for PR 28451 at commit 289e5ae.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2020-05-05T06:09:50Z

docs/sql-ref-syntax-aux-show-partitions.md

-      </code>
-  </dd>
-</dl>
+    for partitions. When specified, the partitions that match the partition spec are returned.


@huaxingao just for consistency lets change "partition spec" to "partition specification" ?

SparkQA · 2020-05-05T06:14:46Z

Test build #122303 has finished for PR 28451 at commit 804d15d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2020-05-05T06:18:10Z

Nice @huaxingao . LGTM - had some very minor comments.

SparkQA · 2020-05-05T06:59:23Z

Test build #122308 has finished for PR 28451 at commit 5726a1f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

huaxingao · 2020-05-05T07:01:43Z

cc @srowen

srowen

I generally like the HTML simplification to markdown. I can't think of a reason we need to keep the HTML form; maybe an early markdown renderer didn't support it. This still render OK as expected when built currently?

srowen · 2020-05-05T13:24:20Z

docs/sql-ref-ansi-compliance.md

@@ -64,15 +64,15 @@ On the other hand, `INSERT INTO` syntax throws an analysis exception when the AN
 Currently, the ANSI mode affects explicit casting and assignment casting only.
 In future releases, the behaviour of type coercion might change along with the other two type conversion rules.

-{% highlight sql %}
+```sql


Seems OK, is there any behavior difference?
I'm on the fence about whether it's worth changing across all files.

Both of them highlight SQL keywords the same way. The only difference I noticed is that ```sql doesn't indent the code block:

but {% hightlight sql %} indents

kiszk · 2020-05-05T15:06:56Z

docs/sql-ref-literals.md

+
+* **L**
+
+    Case insensitive, indicates `BIGINT`, which is a 8-byte signed integer number.


nit: an 8-byte

kiszk · 2020-05-05T15:09:02Z

docs/sql-ref-literals.md

-#### Examples
+* **D**
+
+    Case insensitive, indicates `DOUBLE`, which is a 8-byte double-precision floating point number.


kiszk · 2020-05-05T15:28:38Z

docs/sql-ref-syntax-ddl-create-database.md


-    <dt><code><em>database_comment</em></code></dt>
-    <dd>Specifies the description for the database.</dd>
+    Creates a database with the given name if it doesn't exists. If a database with the same name already exists, nothing will happen.


nit: doesn't exists -> doesn't exist

kiszk · 2020-05-05T15:34:52Z

docs/sql-ref-syntax-ddl-create-table-hiveformat.md

+
+* **STORED AS**
+
+    File format for table storage, could be TEXTFILE, ORC, PARQUET,etc.


nit: ,etc. -> , etc.

kiszk · 2020-05-05T15:35:38Z

docs/sql-ref-syntax-ddl-create-table-like.md

+
+* **STORED AS**
+
+    File format for table storage, could be TEXTFILE, ORC, PARQUET,etc.


kiszk · 2020-05-05T15:37:44Z

docs/sql-ref-syntax-ddl-drop-table.md

-</dl>
+* **IF EXISTS**
+
+    If specified, no exception is thrown when the table does not exists.


nit: exists -> exist

kiszk · 2020-05-05T15:38:02Z

docs/sql-ref-syntax-ddl-drop-view.md

-</dl>
+* **IF EXISTS**
+
+    If specified, no exception is thrown when the view does not exists.


Fixed. Thanks for checking. @kiszk

huaxingao · 2020-05-05T15:39:37Z

This still render OK as expected when built currently?

Yes. This still works OK as expected. @srowen

SparkQA · 2020-05-05T16:32:22Z

Test build #122319 has finished for PR 28451 at commit bd82fdd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2020-05-09T20:14:58Z

If there are no more comments, I'll merge tomorrow. This is for 3.1 only?

huaxingao · 2020-05-09T20:21:18Z

@srowen this is for 3.0.

### What changes were proposed in this pull request? Remove the unneeded embedded inline HTML markup by using the basic markdown syntax. Please see #28414 ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually build and check Closes #28451 from huaxingao/html_cleanup. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com> (cherry picked from commit a75dc80) Signed-off-by: Sean Owen <srowen@gmail.com>

srowen · 2020-05-10T17:57:48Z

Merged to master/3.0

huaxingao · 2020-05-10T23:30:52Z

Thanks all!

maropu · 2020-05-11T01:11:01Z

All the document works for 3.0 have been done? https://issues.apache.org/jira/browse/SPARK-28588

huaxingao · 2020-05-11T04:16:12Z

@gatorsmile @dilipbiswal Anything else you want to add in 3.0?

probot-autolabeler bot added the DOCS label May 4, 2020

huaxingao force-pushed the html_cleanup branch from d3033f1 to 0b463cf Compare May 4, 2020 20:35

huaxingao commented May 4, 2020

View reviewed changes

dilipbiswal reviewed May 5, 2020

View reviewed changes

maropu reviewed May 5, 2020

View reviewed changes

dilipbiswal reviewed May 5, 2020

View reviewed changes

maropu mentioned this pull request May 5, 2020

[SPARK-31429][SQL][DOC] Automatically generates a SQL document for built-in functions #28224

Closed

maropu mentioned this pull request May 5, 2020

[SPARK-31030][SQL][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table #28433

Closed

huaxingao added 8 commits May 4, 2020 23:37

rebase

5f5c0c4

fix

f7bf428

fix

3654928

fix

0af3791

more changes

37ebf02

address comments

0a8df51

bold Note

0575f06

address comments

5726a1f

huaxingao force-pushed the html_cleanup branch from 804d15d to 5726a1f Compare May 5, 2020 06:45

srowen reviewed May 5, 2020

View reviewed changes

kiszk reviewed May 5, 2020

View reviewed changes

a -> an before 8

bd0b900

kiszk reviewed May 5, 2020

View reviewed changes

fix errors

bd82fdd

srowen closed this in a75dc80 May 10, 2020

huaxingao deleted the html_cleanup branch May 10, 2020 23:30


		* L

		Case insensitive, indicates `BIGINT`, which is a 8-byte signed integer number.


		* STORED AS

		File format for table storage, could be TEXTFILE, ORC, PARQUET,etc.

[SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference #28451

[SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference #28451

Conversation

huaxingao commented May 4, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented May 4, 2020

SparkQA commented May 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented May 4, 2020

huaxingao commented May 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huaxingao commented May 5, 2020

SparkQA commented May 5, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huaxingao May 5, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented May 5, 2020

Choose a reason for hiding this comment

SparkQA commented May 5, 2020

dilipbiswal commented May 5, 2020

SparkQA commented May 5, 2020

huaxingao commented May 5, 2020

srowen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kiszk May 5, 2020 • edited

Choose a reason for hiding this comment

kiszk May 5, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huaxingao commented May 5, 2020

SparkQA commented May 5, 2020

srowen commented May 9, 2020

huaxingao commented May 9, 2020

srowen commented May 10, 2020

huaxingao commented May 10, 2020

maropu commented May 11, 2020

huaxingao commented May 11, 2020

huaxingao May 5, 2020 •

edited

kiszk May 5, 2020 •

edited

kiszk May 5, 2020 •

edited