LIKE clause #67

ricardocchaves · 2022-03-01T23:25:28Z

Adds a LIKE function to the WHERE clause for easier matching. Closes #17

Some notes

Right now it can match not only with strings but also with everything else, as the values are stringified before matching. Let me know if this is not intended and if it should be for strings only.
It only supports the % wildcard operator.
Let me know if the commit messages should follow a specific format

Future work

Either in the scope of this PR or in a future one, it is still missing other basic matching functions, like _ for single character matching or [] to specify a range.

ricardocchaves · 2022-03-01T23:26:54Z

How can I get a syntax error like the following:

$ python spyql 'SELECT * from range(3) WHERE'
ERROR	could not compile FROM clause
  File "<from>", line 1
    range(3) WHERE
                 ^
SyntaxError: unexpected EOF while parsing

But for LIKE? Should I change this or can it be kept as is? Desired output / behavior:

$ python spyql 'SELECT * from range(3) WHERE col1 LIKE'
ERROR	could not compile WHERE clause
  File "<from>", line 1
    col1 LIKE
            ^
SyntaxError: unexpected EOF while parsing

Current:

$ python spyql 'SELECT * from range(3) WHERE col1 LIKE'
ERROR	col1 LIKE
SyntaxError: unexpected EOF while parsing

codecov · 2022-03-01T23:27:24Z

Codecov Report

Merging #67 (0e0a830) into master (c379afa) will increase coverage by 0.07%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #67      +/-   ##
==========================================
+ Coverage   95.91%   95.98%   +0.07%     
==========================================
  Files          10       10              
  Lines        1076     1096      +20     
==========================================
+ Hits         1032     1052      +20     
  Misses         44       44

Impacted Files	Coverage Δ
spyql/cli.py	`99.06% <100.00%> (+0.08%)`	⬆️
spyql/quotes_handler.py	`97.29% <100.00%> (+0.15%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c379afa...0e0a830. Read the comment docs.

dcmoura

Thank you @ricardocchaves !

I think we need to change the approach... what you are trying to do seems to difficult... so, moving to a like function or a like pseudo-operator seems to be the way to go.

I also think that the LIKE function should be available in the SELECT and other clauses, and not limited to the WHERE clause.

Supporting _ is important, but I would stop here. PostgreSQL, for instance, only supports % and _. We have regex for fancier stuff :-)

Let me know what you think!

dcmoura · 2022-03-03T07:18:00Z

README.md

@@ -93,7 +93,7 @@ Right now, the focus is on building a command-line tool that follows these core
 SELECT [ DISTINCT | PARTIALS ] 
    [ * | python_expression [ AS output_column_name ] [, ...] ]
    [ FROM csv | spy | text | python_expression | json [ EXPLODE path ] ]
-    [ WHERE python_expression ]
+    [ WHERE python_expression [ [NOT] LIKE string] ]


I wouldn't change the description of the query structure. Let's assume we are adding a Python operator (and therefore it fits into a python_expression). We would then highlight this in the documentation.

I would broaden the use of the LIKE to any python_expression. Example of a use of LIKE outside of the WHERE clause: SELECT 'error' if msg like 'error%' else 'OK'

I would consider also adding ILIKE, which has the same behaviour as LIKE but it is case-insensitive

dcmoura · 2022-03-03T09:42:35Z

spyql/cli.py

+        return clause
+
+    # Supports words containing [a-zA-Z0-9_\-]
+    expr_pattern = re.compile(r"([\w-]+)(?:\s+(NOT))?\s+LIKE\s+([\w-]+)", re.IGNORECASE)


I think that this regex does not do the trick. It fails in cases like:

col1 + col2 LIKE 'constant%'

'constant%' LIKE col1

func(col1) LIKE col1 + '%'

I think we have one of the following options:

do not implement a LIKE operator, but simple make available a function like(a, b). Instead of col1 LIKE 'constant%' we would write like(col1, 'constant%')

detect occurrences of LIKE and replace them in the query string by something like col1 | like_op | 'constant%'. like_op would be a class that overloads the or operator that does the LIKE magic: given 2 strings, parses the strings for detecting '%' and '_' does the comparison and returns a boolean. This would be a little more trouble and requires particular attention to operator prioritisation and a compete test suite. We would need 4 operators: LIKE, NOT LIKE, ILIKE, NOT ILIKE. Look here: https://stackoverflow.com/a/56739916/9522280

I am fine with both approaches. I would be happy on having the first at short-term and the second at longer term.

dcmoura · 2022-03-03T09:52:04Z

spyql/cli.py

+        )
+
+    # Replacing SQL wildcard '%' for regex wildcard '.*' if not preceded by '\'
+    pattern = strings.put_strings_back(groups[2])


Does this mean that we only accept LIKE wildcards in the right side?
I think we can implement a single side LIKE because it might be simpler to implement. One option would be to test if the wildcards are on the left or right side and swap if needed, while raising an error when there are wildcards on both sides.

dcmoura · 2022-03-03T09:54:48Z

spyql/cli.py

+
+    # Replacing SQL wildcard '%' for regex wildcard '.*' if not preceded by '\'
+    pattern = strings.put_strings_back(groups[2])
+    pattern = re.compile(r"(?<!\\)%").sub(r".*" , pattern)


shouldn't we first escape any regex special character/command? For instance col1 like '123.456.%' would accept '1233456oops' because of the meaning of . in regex patterns...

dcmoura · 2022-03-03T21:02:30Z

spyql/quotes_handler.py

+    def __iter__(self):
+        return iter(self.strings)


ricardocchaves · 2022-03-04T15:37:57Z

As discussed offline:

I'll look into escaping regex characters that may appear innocently in the string
I'll implement the like function that takes two strings (whether they're constants, columns, etc.) and returns boolean. Will look into that stack overflow URL, overriding operators.

dcmoura · 2022-12-09T09:49:24Z

Archiving due to lack of activity. Happy to reopen when you are ready!

ricardocchaves added 5 commits March 1, 2022 21:29

added simple like function support

57432f5

make QuotesHandler iterable

9d9fee2

added support for SQL wildcard in LIKE statement

2152e20

add tests for like function

f7ccc90

add like function documentation

0e0a830

ricardocchaves added the enhancement New feature or request label Mar 1, 2022

ricardocchaves requested review from dcmoura and recharte March 1, 2022 23:25

ricardocchaves self-assigned this Mar 1, 2022

dcmoura requested changes Mar 3, 2022

View reviewed changes

dcmoura force-pushed the master branch from bbbac0b to bb8ff70 Compare October 30, 2022 20:44

dcmoura closed this Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LIKE clause #67

LIKE clause #67

ricardocchaves commented Mar 1, 2022 •

edited

ricardocchaves commented Mar 1, 2022 •

edited

codecov bot commented Mar 1, 2022

dcmoura left a comment

dcmoura Mar 3, 2022

dcmoura Mar 3, 2022

dcmoura Mar 3, 2022

dcmoura Mar 3, 2022

dcmoura Mar 3, 2022

dcmoura Mar 3, 2022

ricardocchaves commented Mar 4, 2022

dcmoura commented Dec 9, 2022

LIKE clause #67

LIKE clause #67

Conversation

ricardocchaves commented Mar 1, 2022 • edited

Some notes

Future work

ricardocchaves commented Mar 1, 2022 • edited

codecov bot commented Mar 1, 2022

Codecov Report

dcmoura left a comment

Choose a reason for hiding this comment

dcmoura Mar 3, 2022

Choose a reason for hiding this comment

dcmoura Mar 3, 2022

Choose a reason for hiding this comment

dcmoura Mar 3, 2022

Choose a reason for hiding this comment

dcmoura Mar 3, 2022

Choose a reason for hiding this comment

dcmoura Mar 3, 2022

Choose a reason for hiding this comment

dcmoura Mar 3, 2022

Choose a reason for hiding this comment

ricardocchaves commented Mar 4, 2022

dcmoura commented Dec 9, 2022

ricardocchaves commented Mar 1, 2022 •

edited

ricardocchaves commented Mar 1, 2022 •

edited