Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference #28332

Closed
wants to merge 4 commits into from

Conversation

huaxingao
Copy link
Contributor

@huaxingao huaxingao commented Apr 24, 2020

What changes were proposed in this pull request?

Document LIKE clause in SQL Reference

Why are the changes needed?

To make SQL Reference complete

Does this PR introduce any user-facing change?

Yes

Screen Shot 2020-04-25 at 5 49 57 PM

Screen Shot 2020-04-25 at 5 50 24 PM

Screen Shot 2020-04-25 at 5 50 42 PM

How was this patch tested?

Manually build and check

@SparkQA
Copy link

SparkQA commented Apr 24, 2020

Test build #121786 has finished for PR 28332 at commit 89fc5d6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

<dd>
Specifies a string pattern that is used to match the databases in the system. In
the specified string pattern <code>'*'</code> matches any number of characters.
Specifies a regular expression pattern that is used to limit the results of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: limit -> filter?

statement.
<ul>
<li>Only <code>*</code> and <code>|</code> are allowed as wildcard pattern.</li>
<li>Excluding <code>*</code> and <code>|</code> the remaining pattern follows the regex semantics.</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

  • Excluding <code>*</code> and <code>|</code> -> Excluding <code>*</code> and <code>|</code>,
  • regex -> regular expression

<ul>
<li>Only <code>*</code> and <code>|</code> are allowed as wildcard pattern.</li>
<li>Excluding <code>*</code> and <code>|</code> the remaining pattern follows the regex semantics.</li>
<li>The leading and trailing blanks are trimmed in the input pattern before processing.</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about case sensitivity?

+---+----+---+
|200|Mary|null|
|300|Mike| 80|
+---+----+---+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove the leading spaces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a example for LIKE ... ESCAPE..., too?

<b>Syntax:</b><br>
<code>
NOT boolean_expression | EXISTS ( query ) | column_name LIKE regex_pattern | value_expression |<br>
boolean_expression AND boolean_expression | boolean_expression OR boolean_expression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add RLIKE and LIKE ... ESCAPE ... ?

<dd>
Specifies the regular expression pattern that is used to filter out unwanted views.
<ul>
<li> Except for `*` and `|` character, the pattern works like a regex.</li>
<li> `*` alone matches 0 or more characters and `|` is used to separate multiple different regexes,
<li> Except for <code>*</code> and <code>|</code> character, the pattern works like a regex.</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove the space <li>Except...

### Related Statements

* [SELECT](sql-ref-syntax-qry-select.html)
* [WHERE Clause](sql-ref-syntax-qry-select-where.html)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I separated the LIKE predicate from the WHERE clause page. It seems better to me this way. It's also easier to add LIKE ANY and LIKE ALL later.

Copy link
Member

@maropu maropu Apr 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You referred some existing documents in the other systems? For example, it seems PgSQL has a single page for patten-matching syntaxes. https://www.postgresql.org/docs/current/functions-matching.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LIKE ALL and LIKE ANY can be added to this page too later. I don't want to include these two now because this PR is for 3.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, we should not include them in this PR.

@SparkQA
Copy link

SparkQA commented Apr 25, 2020

Test build #121819 has finished for PR 28332 at commit 488684b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

<dl>
<dt><code><em>esc_char</em></code></dt>
<dd>
Specifies the escape character.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you describe the default escape char?

@maropu
Copy link
Member

maropu commented Apr 25, 2020

Looks fine.

|200|Mary|null|
+---+----+----+

SELECT * FROM person WHERE name LIKE '%\_%';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to put %\_% or %\\_%
spark-sql takes one \
while spark shell requires two \

scala> sql("SELECT * FROM person WHERE name LIKE '%\_%'").show()
<console>:1: error: invalid escape character
       sql("SELECT * FROM person WHERE name LIKE '%\_%'").show()
                                                    ^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see... I think a single \ is ok now. If we get user's feedbacks, we will update it then.

@SparkQA
Copy link

SparkQA commented Apr 26, 2020

Test build #121820 has finished for PR 28332 at commit c5cbabc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@huaxingao
Copy link
Contributor Author

cc @srowen

@maropu maropu closed this in d34cb59 Apr 29, 2020
maropu pushed a commit that referenced this pull request Apr 29, 2020
### What changes were proposed in this pull request?
Document LIKE clause in SQL Reference

### Why are the changes needed?
To make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes

<img width="1050" alt="Screen Shot 2020-04-25 at 5 49 57 PM" src="https://user-images.githubusercontent.com/13592258/80294346-5babab80-871d-11ea-8ac9-51bbab0aca88.png">

<img width="1050" alt="Screen Shot 2020-04-25 at 5 50 24 PM" src="https://user-images.githubusercontent.com/13592258/80294347-5ea69c00-871d-11ea-8c51-7a90ee20f7da.png">

<img width="1050" alt="Screen Shot 2020-04-25 at 5 50 42 PM" src="https://user-images.githubusercontent.com/13592258/80294351-61a18c80-871d-11ea-9e75-e3345d2f52f5.png">

### How was this patch tested?
Manually build and check

Closes #28332 from huaxingao/where_clause.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(cherry picked from commit d34cb59)
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
@maropu
Copy link
Member

maropu commented Apr 29, 2020

ok, thanks. Merged to master/3.0.

@huaxingao
Copy link
Contributor Author

Thanks! @maropu @srowen

@huaxingao huaxingao deleted the where_clause branch April 29, 2020 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants