Conversation
This commit brings back the "|" symbol for pipeline separator while retaining the "|>" as an option. The latter is needed to exit a SQL expression back into a pipe when a SQL expression is immediately followed by a shortcut that begins with an identifier (implied-yield expression or agg function). For any valid query without shortcuts, there is never an ambiguity between the bitwise OR "|" and pipe separator because keywords cannot be used as identifiers. The argument that such ambiguity is present made in the Google SQL pipes paper presumably assumes a one-token lookahead parser, which is not the case here. Toward this end, we changed our grammar to disallow keywords as identifier and added the backtick-string syntax to escape any string as identifier. Furthermore, we arranged for the expression grammar to include "|" for bitwise OR in SQL expression but omit it in any non-SQL pipe operator. This means any valid SQL queries will continue to be valid SuperSQL queries (inclusive of bitwise OR). Moreover, expressions in pipeline operators do not have any pipe-character ambiguity with shortcuts because bitwise-OR is omitted. We can add bitwise functionality as named functions in place of the archaic C-style syntax (which can work in both SQL and pipeline operators). Best practice will be to use these functions over the old bitwise syntax. We also removed search syntax as a shortcut and now require a "?" (or "search" keyword) to signal the keyword search operator. This means typos that happen to compile into unintended searches are no longer accepted by the grammar thereby eliminating the so-called search rake.
Ok as disccused, I'm pushing these changes to remove bitwise-OR from the SQL expression syntax, thus removing the ambiguity with the SQL shortcuts. If/when users want to run SuperSQL queries on legacy SQL with bitwise OR, we can put this grammar back in place but under a flag to enable it only as needed. In the meantime, wew ill introduce bitwise_or(), bitwise_and(), etc named functions in subsequent PR.
nwt
reviewed
Nov 5, 2024
nwt
approved these changes
Nov 5, 2024
This was referenced Dec 21, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit brings back the "|" symbol for pipeline separator while retaining the "|>" as an option. The latter is needed to exit a SQL expression back into a pipe when a SQL expression is immediately followed by a shortcut that begins with an identifier (implied-yield expression or agg function).
For any valid query without shortcuts, there is never an ambiguity between the bitwise OR "|" and pipe separator because keywords cannot be used as identifiers. The argument that such ambiguity is present made in the Google SQL pipes paper presumably assumes a one-token lookahead parser, which is not the case here.
Toward this end, we changed our grammar to disallow keywords as identifier and added the backtick-string syntax to escape any string as identifier.
We also removed search syntax as a shortcut and now require a "?" (or "search" keyword) to signal the keyword search operator. This means typos that happen to compile into unintended searches are no longer accepted by the grammar thereby eliminating the so-called search rake.
Fixes #4585
Closes #4490