Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference #28451

Closed
wants to merge 10 commits into from

Conversation

huaxingao
Copy link
Contributor

What changes were proposed in this pull request?

Remove the unneeded embedded inline HTML markup by using the basic markdown syntax.
Please see #28414

Why are the changes needed?

Make the doc cleaner and easily editable by MD editors.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manually build and check

@SparkQA
Copy link

SparkQA commented May 4, 2020

Test build #122290 has finished for PR 28451 at commit 0b463cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 4, 2020

Test build #122287 has finished for PR 28451 at commit d3033f1.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

-- This CREATE TABLE fails with ParseException because of the illegal identifier name a.b
CREATE TABLE test (a.b int);
org.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input 'CREATE TABLE test (a.'(line 1, pos 20)
org.apache.spark.sql.catalyst.parser.ParseException:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we indent 2 spaces for error messages in other place, I will do the same here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather, we should remove indents in the other places for following the result format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@SparkQA
Copy link

SparkQA commented May 4, 2020

Test build #122292 has finished for PR 28451 at commit b1c6a4a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@huaxingao
Copy link
Contributor Author

cc @dilipbiswal @maropu @gatorsmile

@@ -66,7 +66,7 @@ This means that in case an operation causes overflows, the result is the same wi
On the other hand, Spark SQL returns null for decimal overflows.
When `spark.sql.ansi.enabled` is set to `true` and an overflow occurs in numeric and interval arithmetic operations, it throws an arithmetic exception at runtime.

{% highlight sql %}
```sql
-- `spark.sql.ansi.enabled=true`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huaxingao I know that it's not related to the format change that you r doing in this PR. But shouldn't we have a SET statement here, so users can cut-paste the command in their shell to see the behavior ? Perhaps we discussed it in the pr that added this clause. Just a question :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this. seems to me comment is OK too.

- text: JOIN
url: sql-ref-syntax-qry-select-join.html
- text: Join Hints
url: sql-ref-syntax-qry-select-hints.html
- text: LIKE Predicate
url: sql-ref-syntax-qry-select-like.html
- text: Set Operators
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the changes in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't change the order of the first 8 clauses. I think these should be grouped together. But I changed the rest to make them alphabetical order.
Screen Shot 2020-05-04 at 6 22 05 PM

+--------------+
| 3750.0|
+--------------+
```
</div>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot avoid this tag, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a look at this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for examples <div class="codetabs">. I prefer to keep this since we use this format for all the examples.

-- This CREATE TABLE fails with ParseException because of the illegal identifier name a.b
CREATE TABLE test (a.b int);
org.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input 'CREATE TABLE test (a.'(line 1, pos 20)
org.apache.spark.sql.catalyst.parser.ParseException:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather, we should remove indents in the other places for following the result format?

@huaxingao
Copy link
Contributor Author

Rather, we should remove indents in the other places for following the result format?

It's better to remove indents. Will spend some time to find all the error messages.

@SparkQA
Copy link

SparkQA commented May 5, 2020

Test build #122296 has finished for PR 28451 at commit 66d82ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -35,22 +35,19 @@ A string literal is used to specify a character string value.

#### Syntax

{% highlight sql %}
```sql
'c [ ... ]' | "c [ ... ]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huaxingao the parameter c kind of looks weird especially in new format ? What do you think of character or any_char or something like that ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to char


<dl>
<dt><code><em>c</em></code></dt>
<dd>
One character from the character set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is limitation on the chars that are allowed in the binary literal ?
for example, i tried :
SELECT X'zzzzzz' AS col and got an exception ?

Copy link
Contributor Author

@huaxingao huaxingao May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to be hexadecimal. Changed to the following:

#### Syntax

X { 'num [ ... ]' | "num [ ... ]" }

#### Parameters

* **num**

    Any hexadecimal number from 0 to F.

cc @yaooqinn

{ letter | digit | '_' } [ , ... ]
{% endhighlight %}
```
Note: If `spark.sql.ansi.enabled` is set to true, ANSI SQL reserved keywords cannot be used as identifiers. For more details, please refer to [ANSI Compliance](sql-ref-ansi-compliance.html).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huaxingao Should we bold "Note" ? I see that in other places we do bold it.

@SparkQA
Copy link

SparkQA commented May 5, 2020

Test build #122301 has finished for PR 28451 at commit 289e5ae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

</code>
</dd>
</dl>
for partitions. When specified, the partitions that match the partition spec are returned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huaxingao just for consistency lets change "partition spec" to "partition specification" ?

@SparkQA
Copy link

SparkQA commented May 5, 2020

Test build #122303 has finished for PR 28451 at commit 804d15d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor

Nice @huaxingao . LGTM - had some very minor comments.

@SparkQA
Copy link

SparkQA commented May 5, 2020

Test build #122308 has finished for PR 28451 at commit 5726a1f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@huaxingao
Copy link
Contributor Author

cc @srowen

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally like the HTML simplification to markdown. I can't think of a reason we need to keep the HTML form; maybe an early markdown renderer didn't support it. This still render OK as expected when built currently?

@@ -64,15 +64,15 @@ On the other hand, `INSERT INTO` syntax throws an analysis exception when the AN
Currently, the ANSI mode affects explicit casting and assignment casting only.
In future releases, the behaviour of type coercion might change along with the other two type conversion rules.

{% highlight sql %}
```sql
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK, is there any behavior difference?
I'm on the fence about whether it's worth changing across all files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of them highlight SQL keywords the same way. The only difference I noticed is that ```sql doesn't indent the code block:
Screen Shot 2020-05-05 at 8 34 07 AM

but {% hightlight sql %} indents
Screen Shot 2020-05-05 at 8 35 18 AM


* **L**

Case insensitive, indicates `BIGINT`, which is a 8-byte signed integer number.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: an 8-byte

#### Examples
* **D**

Case insensitive, indicates `DOUBLE`, which is a 8-byte double-precision floating point number.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


<dt><code><em>database_comment</em></code></dt>
<dd>Specifies the description for the database.</dd>
Creates a database with the given name if it doesn't exists. If a database with the same name already exists, nothing will happen.
Copy link
Member

@kiszk kiszk May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: doesn't exists -> doesn't exist


* **STORED AS**

File format for table storage, could be TEXTFILE, ORC, PARQUET,etc.
Copy link
Member

@kiszk kiszk May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ,etc. -> , etc.


* **STORED AS**

File format for table storage, could be TEXTFILE, ORC, PARQUET,etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

</dl>
* **IF EXISTS**

If specified, no exception is thrown when the table does not exists.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: exists -> exist

</dl>
* **IF EXISTS**

If specified, no exception is thrown when the view does not exists.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks for checking. @kiszk

@huaxingao
Copy link
Contributor Author

This still render OK as expected when built currently?

Yes. This still works OK as expected. @srowen

@SparkQA
Copy link

SparkQA commented May 5, 2020

Test build #122319 has finished for PR 28451 at commit bd82fdd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented May 9, 2020

If there are no more comments, I'll merge tomorrow. This is for 3.1 only?

@huaxingao
Copy link
Contributor Author

@srowen this is for 3.0.

@srowen srowen closed this in a75dc80 May 10, 2020
srowen pushed a commit that referenced this pull request May 10, 2020
### What changes were proposed in this pull request?
Remove the unneeded embedded inline HTML markup by using the basic markdown syntax.
Please see #28414

### Why are the changes needed?
Make the doc cleaner and easily editable by MD editors.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually build and check

Closes #28451 from huaxingao/html_cleanup.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(cherry picked from commit a75dc80)
Signed-off-by: Sean Owen <srowen@gmail.com>
@srowen
Copy link
Member

srowen commented May 10, 2020

Merged to master/3.0

@huaxingao
Copy link
Contributor Author

Thanks all!

@huaxingao huaxingao deleted the html_cleanup branch May 10, 2020 23:30
@maropu
Copy link
Member

maropu commented May 11, 2020

All the document works for 3.0 have been done? https://issues.apache.org/jira/browse/SPARK-28588

@huaxingao
Copy link
Contributor Author

@gatorsmile @dilipbiswal Anything else you want to add in 3.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants