Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIKE clause #67

Closed
wants to merge 5 commits into from
Closed

LIKE clause #67

wants to merge 5 commits into from

Conversation

ricardocchaves
Copy link
Collaborator

@ricardocchaves ricardocchaves commented Mar 1, 2022

Adds a LIKE function to the WHERE clause for easier matching. Closes #17

Some notes

  • Right now it can match not only with strings but also with everything else, as the values are stringified before matching. Let me know if this is not intended and if it should be for strings only.
  • It only supports the % wildcard operator.
  • Let me know if the commit messages should follow a specific format

Future work

Either in the scope of this PR or in a future one, it is still missing other basic matching functions, like _ for single character matching or [] to specify a range.

@ricardocchaves ricardocchaves added the enhancement New feature or request label Mar 1, 2022
@ricardocchaves ricardocchaves self-assigned this Mar 1, 2022
@ricardocchaves
Copy link
Collaborator Author

ricardocchaves commented Mar 1, 2022

How can I get a syntax error like the following:

$ python spyql 'SELECT * from range(3) WHERE'
ERROR	could not compile FROM clause
  File "<from>", line 1
    range(3) WHERE
                 ^
SyntaxError: unexpected EOF while parsing

But for LIKE? Should I change this or can it be kept as is? Desired output / behavior:

$ python spyql 'SELECT * from range(3) WHERE col1 LIKE'
ERROR	could not compile WHERE clause
  File "<from>", line 1
    col1 LIKE
            ^
SyntaxError: unexpected EOF while parsing

Current:

$ python spyql 'SELECT * from range(3) WHERE col1 LIKE'
ERROR	col1 LIKE
SyntaxError: unexpected EOF while parsing

@codecov
Copy link

codecov bot commented Mar 1, 2022

Codecov Report

Merging #67 (0e0a830) into master (c379afa) will increase coverage by 0.07%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #67      +/-   ##
==========================================
+ Coverage   95.91%   95.98%   +0.07%     
==========================================
  Files          10       10              
  Lines        1076     1096      +20     
==========================================
+ Hits         1032     1052      +20     
  Misses         44       44              
Impacted Files Coverage Δ
spyql/cli.py 99.06% <100.00%> (+0.08%) ⬆️
spyql/quotes_handler.py 97.29% <100.00%> (+0.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c379afa...0e0a830. Read the comment docs.

Copy link
Owner

@dcmoura dcmoura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ricardocchaves !

I think we need to change the approach... what you are trying to do seems to difficult... so, moving to a like function or a like pseudo-operator seems to be the way to go.

I also think that the LIKE function should be available in the SELECT and other clauses, and not limited to the WHERE clause.

Supporting _ is important, but I would stop here. PostgreSQL, for instance, only supports % and _. We have regex for fancier stuff :-)

Let me know what you think!

@@ -93,7 +93,7 @@ Right now, the focus is on building a command-line tool that follows these core
SELECT [ DISTINCT | PARTIALS ]
[ * | python_expression [ AS output_column_name ] [, ...] ]
[ FROM csv | spy | text | python_expression | json [ EXPLODE path ] ]
[ WHERE python_expression ]
[ WHERE python_expression [ [NOT] LIKE string] ]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I wouldn't change the description of the query structure. Let's assume we are adding a Python operator (and therefore it fits into a python_expression). We would then highlight this in the documentation.
  2. I would broaden the use of the LIKE to any python_expression. Example of a use of LIKE outside of the WHERE clause: SELECT 'error' if msg like 'error%' else 'OK'
  3. I would consider also adding ILIKE, which has the same behaviour as LIKE but it is case-insensitive

return clause

# Supports words containing [a-zA-Z0-9_\-]
expr_pattern = re.compile(r"([\w-]+)(?:\s+(NOT))?\s+LIKE\s+([\w-]+)", re.IGNORECASE)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this regex does not do the trick. It fails in cases like:

  • col1 + col2 LIKE 'constant%'
  • 'constant%' LIKE col1
  • func(col1) LIKE col1 + '%'

I think we have one of the following options:

  • do not implement a LIKE operator, but simple make available a function like(a, b). Instead of col1 LIKE 'constant%' we would write like(col1, 'constant%')
  • detect occurrences of LIKE and replace them in the query string by something like col1 | like_op | 'constant%'. like_op would be a class that overloads the or operator that does the LIKE magic: given 2 strings, parses the strings for detecting '%' and '_' does the comparison and returns a boolean. This would be a little more trouble and requires particular attention to operator prioritisation and a compete test suite. We would need 4 operators: LIKE, NOT LIKE, ILIKE, NOT ILIKE. Look here: https://stackoverflow.com/a/56739916/9522280

I am fine with both approaches. I would be happy on having the first at short-term and the second at longer term.

)

# Replacing SQL wildcard '%' for regex wildcard '.*' if not preceded by '\'
pattern = strings.put_strings_back(groups[2])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that we only accept LIKE wildcards in the right side?
I think we can implement a single side LIKE because it might be simpler to implement. One option would be to test if the wildcards are on the left or right side and swap if needed, while raising an error when there are wildcards on both sides.


# Replacing SQL wildcard '%' for regex wildcard '.*' if not preceded by '\'
pattern = strings.put_strings_back(groups[2])
pattern = re.compile(r"(?<!\\)%").sub(r".*" , pattern)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice :-)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we first escape any regex special character/command? For instance col1 like '123.456.%' would accept '1233456oops' because of the meaning of . in regex patterns...

Comment on lines +12 to +13
def __iter__(self):
return iter(self.strings)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!

@ricardocchaves
Copy link
Collaborator Author

As discussed offline:

  • I'll look into escaping regex characters that may appear innocently in the string
  • I'll implement the like function that takes two strings (whether they're constants, columns, etc.) and returns boolean. Will look into that stack overflow URL, overriding operators.

@dcmoura
Copy link
Owner

dcmoura commented Dec 9, 2022

Archiving due to lack of activity. Happy to reopen when you are ready!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

like function and regex function shortcut
2 participants