New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LIKE clause #67
LIKE clause #67
Conversation
How can I get a syntax error like the following:
But for LIKE? Should I change this or can it be kept as is? Desired output / behavior:
Current:
|
Codecov Report
@@ Coverage Diff @@
## master #67 +/- ##
==========================================
+ Coverage 95.91% 95.98% +0.07%
==========================================
Files 10 10
Lines 1076 1096 +20
==========================================
+ Hits 1032 1052 +20
Misses 44 44
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ricardocchaves !
I think we need to change the approach... what you are trying to do seems to difficult... so, moving to a like function or a like pseudo-operator seems to be the way to go.
I also think that the LIKE function should be available in the SELECT and other clauses, and not limited to the WHERE clause.
Supporting _
is important, but I would stop here. PostgreSQL, for instance, only supports %
and _
. We have regex for fancier stuff :-)
Let me know what you think!
@@ -93,7 +93,7 @@ Right now, the focus is on building a command-line tool that follows these core | |||
SELECT [ DISTINCT | PARTIALS ] | |||
[ * | python_expression [ AS output_column_name ] [, ...] ] | |||
[ FROM csv | spy | text | python_expression | json [ EXPLODE path ] ] | |||
[ WHERE python_expression ] | |||
[ WHERE python_expression [ [NOT] LIKE string] ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I wouldn't change the description of the query structure. Let's assume we are adding a Python operator (and therefore it fits into a
python_expression
). We would then highlight this in the documentation. - I would broaden the use of the
LIKE
to anypython_expression
. Example of a use ofLIKE
outside of the WHERE clause:SELECT 'error' if msg like 'error%' else 'OK'
- I would consider also adding
ILIKE
, which has the same behaviour asLIKE
but it is case-insensitive
return clause | ||
|
||
# Supports words containing [a-zA-Z0-9_\-] | ||
expr_pattern = re.compile(r"([\w-]+)(?:\s+(NOT))?\s+LIKE\s+([\w-]+)", re.IGNORECASE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this regex does not do the trick. It fails in cases like:
col1 + col2 LIKE 'constant%'
'constant%' LIKE col1
func(col1) LIKE col1 + '%'
I think we have one of the following options:
- do not implement a LIKE operator, but simple make available a function
like(a, b)
. Instead ofcol1 LIKE 'constant%'
we would writelike(col1, 'constant%')
- detect occurrences of
LIKE
and replace them in the query string by something likecol1 | like_op | 'constant%'
.like_op
would be a class that overloads theor
operator that does theLIKE
magic: given 2 strings, parses the strings for detecting'%'
and'_'
does the comparison and returns a boolean. This would be a little more trouble and requires particular attention to operator prioritisation and a compete test suite. We would need 4 operators: LIKE, NOT LIKE, ILIKE, NOT ILIKE. Look here: https://stackoverflow.com/a/56739916/9522280
I am fine with both approaches. I would be happy on having the first at short-term and the second at longer term.
) | ||
|
||
# Replacing SQL wildcard '%' for regex wildcard '.*' if not preceded by '\' | ||
pattern = strings.put_strings_back(groups[2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that we only accept LIKE wildcards in the right side?
I think we can implement a single side LIKE because it might be simpler to implement. One option would be to test if the wildcards are on the left or right side and swap if needed, while raising an error when there are wildcards on both sides.
|
||
# Replacing SQL wildcard '%' for regex wildcard '.*' if not preceded by '\' | ||
pattern = strings.put_strings_back(groups[2]) | ||
pattern = re.compile(r"(?<!\\)%").sub(r".*" , pattern) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we first escape any regex special character/command? For instance col1 like '123.456.%'
would accept '1233456oops' because of the meaning of .
in regex patterns...
def __iter__(self): | ||
return iter(self.strings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool!
As discussed offline:
|
Archiving due to lack of activity. Happy to reopen when you are ready! |
Adds a LIKE function to the WHERE clause for easier matching. Closes #17
Some notes
%
wildcard operator.Future work
Either in the scope of this PR or in a future one, it is still missing other basic matching functions, like
_
for single character matching or[]
to specify a range.