Skip to content

Conversation

@eddyxu
Copy link
Member

@eddyxu eddyxu commented Feb 25, 2023

Support filters like

SELECT * FROM t WHERE col1 == 2 AND col2 == 'tag';

@AugustoFKL
Copy link
Contributor

@eddyxu can you please add to the PR description:

  • An example of a query with the syntax
  • Which dialect (preferably with a link of your reference doc) are you implementing this for.

Thanks :)

@xudong963
Copy link
Member

+1, I'm curious about the usage scenario. Can't a single equal sign meet the demand?

@eddyxu
Copy link
Member Author

eddyxu commented Feb 28, 2023

Sure thing @AugustoFKL and @xudong963

We were trying to implement predicate pushdown from pyarrow to arrow-rs in Lance, and use DataFusion as the execution engine. From pyarrow side, the filter is implemented as a pyarrow.compute.Expression. Because there is no 1-to-1 mapping in the arrow-rs project of Expression, i was trying to serialize the filter to string, and pass them to the rust side.

python3
Python 3.10.10 (main, Feb 16 2023, 02:49:39) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow.compute as pc
>>> filter_expr = pa.compute.field("a") == 10
>>> str(filter_expr)
'(a == 10)'

I dont have a particular Dialect in mind actually, but I saw that DoubleEq already exists in this project but there seems be no way to parse it (for example, i tried GenericDialect, PostgresDialect and HiveDialect (which seems to be in the same PR when DoubleEq was introduced from git blame). Open to suggestions about how to approach it .

@ankrgyl
Copy link
Contributor

ankrgyl commented Feb 28, 2023

@eddyxu @AugustoFKL the only database I'm aware of that supports == is DuckDB (which is generally quite malleable with syntax). That said, I think it'll be an uphill battle to use sqlparser to support all expressions from pyarrow.compute.Expression's string representation, and it's not technically a SQL dialect.

Have you thought about writing something like pyarrow.compute.Expression.to_sql() that walks through the expression tree and generates SQL-compatible expressions?

@eddyxu
Copy link
Member Author

eddyxu commented Mar 1, 2023

@ankrgyl that's fair. sqlparser-rs does have the support for == token (https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/tokenizer.rs#L75-L76). git blame seems to indicate it was introduced in #235 (Hive QL).

Putting aside of the dialect / pyarrow support, this project has the "==" token, and token => LogicalExpr conversion (https://github.com/sqlparser-rs/sqlparser-rs/blob/0c0d088ec21d7f3883165d36c34300745816c5c7/src/parser.rs#L1603), but it lacks the string to token parsing for this syntax. I wonder whether this could be useful for HiveQL

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

@ankrgyl
Copy link
Contributor

ankrgyl commented Mar 1, 2023

That's a good point — @AugustoFKL are you comfortable given that == is valid for Hive?

@coveralls
Copy link

Pull Request Test Coverage Report for Build 4268364167

  • 12 of 12 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.01%) to 86.151%

Totals Coverage Status
Change from base Build 4216963554: 0.01%
Covered Lines: 13412
Relevant Lines: 15568

💛 - Coveralls

@alamb
Copy link
Contributor

alamb commented Mar 1, 2023

Open to suggestions about how to approach it .

Another think you might do (it is a bit of a hack) might be able to do is to modify the raw token stream directly to replace == with =:

So like following the model of
https://docs.rs/sqlparser/0.30.0/src/sqlparser/parser.rs.html#284-289

Something like this (untested)

        let mut tokenizer = Tokenizer::new(dialect, sql);
        let tokens = tokenizer.tokenize()?
           .into_iterator()
           .map(|tok| { 
             if tok == Token::DoubleEq {
               Token::Eq
             } else {
               tok
             })
        .collect();
        
        let parser = Parser::new(dialect).with_tokens(tokens)

🤔

@alamb alamb marked this pull request as draft March 6, 2023 14:53
@alamb
Copy link
Contributor

alamb commented Mar 6, 2023

Marking as draft as this PR has gotten comments and I think is waiting for us to figure out what the right next steps are (it is not waiting for more reviews).

@eddyxu
Copy link
Member Author

eddyxu commented Mar 6, 2023

Another think you might do (it is a bit of a hack) might be able to do is to modify the raw token stream directly to replace == with =:

@alamb IIUC, this was meant to build the == parsing out of the source from sqlparser-rs, right? I could definitely hack on my side following the suggestion if the community thinks this PR is controversial.

@alamb
Copy link
Contributor

alamb commented Mar 7, 2023

IIUC, this was meant to build the == parsing out of the source from sqlparser-rs, right? I could definitely hack on my side following the suggestion if the community thinks this PR is controversial.

Yes, I think the intention was to build == parsing outside of sqlparser-rs

I think the concern about parsing == as equals is that it isn't SQL (at least not that we have seen) so it probably doesn't belong in a SQL parser crate

@eddyxu
Copy link
Member Author

eddyxu commented Mar 7, 2023

It is fair. i will close it.

@eddyxu eddyxu closed this Mar 7, 2023
@eddyxu
Copy link
Member Author

eddyxu commented Mar 24, 2023

@alamb tested the solution you suggested,

        let mut tokenizer = Tokenizer::new(dialect, sql);
        let tokens = tokenizer.tokenize()?
           .into_iterator()
           .map(|tok| { 
             if tok == Token::DoubleEq {
               Token::Eq
             } else {
               tok
             })
        .collect();

Which does not work because it lacks the capability to parse to Token ==. The token::DoubleEq does exist in the codebase and used in the Hive dialect (a Hive-related PR (#235) introduced the token DoubleEq in this repo) . It is just impossible to tokenize that from a SQL string at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants