Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete support for Expr --> String #9726

Closed
8 tasks done
alamb opened this issue Mar 21, 2024 · 29 comments · Fixed by #9805
Closed
8 tasks done

Complete support for Expr --> String #9726

alamb opened this issue Mar 21, 2024 · 29 comments · Fixed by #9805
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Mar 21, 2024

Part of #9494

Is your feature request related to a problem or challenge?

This ticket tracks the remaining work to complete #9495

As @devinjdangelo says in #9494 (comment)

We now have a solid foundation for converting Exprs --> SQL (see #9495 for why this is valuable).

It should now be straightforward to add support for the remaining Expr types and doing so is a great way to get more familiar with DataFusion's core data structures and optimization algorithms without already having expertise in database internals.

Describe the solution you'd like

The basic task is to:

  1. Pick one (or a few) of the expressions below
  2. Create a PR to Implement the Expr --> AST reverse code (in this match statement)
  3. Add a test (for example, here)

Here is the remaining list of exprs (just note on the ticket which you plan to work on)

Describe alternatives you've considered

No response

Additional context

No response

@alamb alamb added enhancement New feature or request good first issue Good for newcomers labels Mar 21, 2024
@alamb alamb changed the title Implement support for Expr --> String Complete support for Expr --> String Mar 21, 2024
@Lordworms
Copy link
Contributor

take IsTrue, IsFalse, IsUnknown

@sebastian2296
Copy link
Contributor

I'd like to work on isNotNull, Not and Between

@yyy1000
Copy link
Contributor

yyy1000 commented Mar 23, 2024

take ScalarFunction and InList

@yyy1000
Copy link
Contributor

yyy1000 commented Mar 25, 2024

take Case

@Weijun-H
Copy link
Contributor

take Like

@kevinmingtarja
Copy link
Contributor

kevinmingtarja commented Mar 26, 2024

take Sort, Exists

@pratikpugalia
Copy link

pratikpugalia commented Mar 26, 2024

I'd like to work on Negative

@alamb alamb reopened this Apr 1, 2024
@alamb
Copy link
Contributor Author

alamb commented Apr 1, 2024

It looks like there are a few things not yet complete in https://github.com/apache/arrow-datafusion/blob/main/datafusion/sql/src/unparser/expr.rs#L52

  • Window functions
  • user defined functions
  • (maybe more)

@yyy1000
Copy link
Contributor

yyy1000 commented Apr 1, 2024

take SimilarTo, IsNotTrue, IsNotUnknown,Negative

@yyy1000
Copy link
Contributor

yyy1000 commented Apr 1, 2024

Here is a list that has not been done yet, for interested contributor to better view:

  • GetIndexedField
  • ScalarVariable
  • TryCast
  • Sort
  • Exists
  • Wildcard
  • GroupingSet
  • Placeholder
  • OuterReferenceColumn
  • Unnest
  • WindowFunction

@yyy1000
Copy link
Contributor

yyy1000 commented Apr 1, 2024

Also Aggregate Function only supports BuiltInAggregateFunction, and it didn't use filter field.
Maybe we should use AggregateExpressionWithFilter when there's a filter? See: https://github.com/sqlparser-rs/sqlparser-rs/blob/e747c9c2af08f4ea12e8d1692adf95998209e2a1/src/ast/mod.rs#L643-L649

@kevinmingtarja
Copy link
Contributor

I noticed that in the sqlparser-rs crate, OrderByExpr is not part of the sqlparser::ast::Expr enum (which is the return type for expr_to_sql), which is a problem when I was implementing this for Expr::Sort.

So I wanted to get some thoughts on how best to proceed, should we make a change in sqlparser-rs, or should we just create a new enum to encapsulate the two, or anything else.

@alamb
Copy link
Contributor Author

alamb commented Apr 2, 2024

I noticed that in the sqlparser-rs crate, OrderByExpr is not part of the sqlparser::ast::Expr enum (which is the return type for expr_to_sql), which is a problem when I was implementing this for Expr::Sort.

So I wanted to get some thoughts on how best to proceed, should we make a change in sqlparser-rs, or should we just create a new enum to encapsulate the two, or anything else.

I think when converting Expr::Sort to an sql::ast::Expr it should just return its inner Expr (and igore the ORDER BY information)

The ORDER BY information is required when turning the expression back into an entire query I think...

But at first just converting to sql::ast::Expr by ignoring the ORDER BY is probably correct in most cases

@devinjdangelo
Copy link
Contributor

Thank you everyone who has contributed so far 🙏. I just filed a PR to update datafusion-federation to point upstream for plan->SQL going forward datafusion-contrib/datafusion-federation#30. The additional expr implementations upstream here will enable datafusion-federation to handle many more queries than before.

@devanbenz
Copy link

devanbenz commented Apr 15, 2024

Here is a list that has not been done yet, for interested contributor to better view:

  • GetIndexedField
  • ScalarVariable
  • TryCast
  • Sort
  • Exists
  • Wildcard
  • GroupingSet
  • Placeholder
  • OuterReferenceColumn
  • Unnest
  • WindowFunction

Take ScalarVariable if this is still needed.

@alamb
Copy link
Contributor Author

alamb commented Apr 16, 2024

Thank you very much @devanbenz -- that would be awesome. Yes in general completing this list (or determining it is not possible) is very much needed

@devanbenz
Copy link

devanbenz commented Apr 23, 2024

Hi @devanbenz , I think ScalarVariable has not been done yet and you could have a try, though it may not be possible.🤔 I think maybe it should be converted to ScalarValue first, then call scalar_to_sql function.

Taking a look at this today. Will give it a try and update this thread with any thoughts, observations, or problems I run in to :)

@devanbenz
Copy link

Here are some thoughts to help people figure out what SQL matches to what expression:

  • GetIndexedField --> `SELECT col['field_name']
  • ScalarVariable --> Not sure (maybe @@foo@@?
  • TryCast * --> CAST (COL AS <datatype>>
  • Sort --> col ASC or col DESC NULLS FIRST
  • Exists --> col exists (SELECT 1 from foo)
  • Wildcard --> COUNT(*)
  • GroupingSet --> Not sure
  • Placeholder --> SELECT $1
  • OuterReferenceColumn --> not sure (I think this is the same as a column)
  • Unnest --> UNNEST(SELECT col FROM bar)
  • WindowFunction --> first_value(x ORDER BY y PARTITION BY z)

@alamb I beleive ScalarVariable would just be something like @variable.

@alamb
Copy link
Contributor Author

alamb commented Apr 23, 2024

Thanks @devanbenz !

@devanbenz
Copy link

devanbenz commented May 6, 2024

@alamb if you could please take a look at sqlparser-rs/sqlparser-rs#1260 -- it's linked to this issue :)

@alamb
Copy link
Contributor Author

alamb commented May 6, 2024

@alamb if you could please take a look at sqlparser-rs/sqlparser-rs#1260 -- it's linked to this issue :)

Thanks @devanbenz -- can you explain how this is related to this issue? (Sorry I am sure it is obvious but I can't keep everything in my head anymore )

@devanbenz
Copy link

@alamb if you could please take a look at sqlparser-rs/sqlparser-rs#1260 -- it's linked to this issue :)

Thanks @devanbenz -- can you explain how this is related to this issue? (Sorry I am sure it is obvious but I can't keep everything in my head anymore )

All good - this is for the Unparser expr_to_sql match statement

Expr::ScalarVariable(_, _) => {
I need to have a supported ast::Expr type for ScalarVariable from sql-parser. 👍

@alamb
Copy link
Contributor Author

alamb commented May 7, 2024

It seems to me that Expr::ScalarVariable currently is parsed from ast::Identifier (that starts with @)-- perhaps instead of adding a new ast node, you could just create an equivalent ast::Identifier 🤔

pub(super) fn sql_identifier_to_expr(
&self,
id: Ident,
schema: &DFSchema,
planner_context: &mut PlannerContext,
) -> Result<Expr> {
if id.value.starts_with('@') {
// TODO: figure out if ScalarVariables should be insensitive.
let var_names = vec![id.value];
let ty = self
.context_provider
.get_variable_type(&var_names)
.ok_or_else(|| {
plan_datafusion_err!("variable {var_names:?} has no type information")
})?;
Ok(Expr::ScalarVariable(ty, var_names))

@devanbenz
Copy link

It seems to me that Expr::ScalarVariable currently is parsed from ast::Identifier (that starts with @)-- perhaps instead of adding a new ast node, you could just create an equivalent ast::Identifier 🤔

pub(super) fn sql_identifier_to_expr(
&self,
id: Ident,
schema: &DFSchema,
planner_context: &mut PlannerContext,
) -> Result<Expr> {
if id.value.starts_with('@') {
// TODO: figure out if ScalarVariables should be insensitive.
let var_names = vec![id.value];
let ty = self
.context_provider
.get_variable_type(&var_names)
.ok_or_else(|| {
plan_datafusion_err!("variable {var_names:?} has no type information")
})?;
Ok(Expr::ScalarVariable(ty, var_names))

Sounds good - I'll have some time to come back to this in the next couple days! Will modify per your suggestion :)

@alamb
Copy link
Contributor Author

alamb commented May 15, 2024

I filed tickets for the remaining issues

These ones are likely straightforward.

The others are likely more involved as I don't know how to make a reproducer

@alamb
Copy link
Contributor Author

alamb commented May 20, 2024

With the completion of #10555 from @xinlifoobar I think this epic is now done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants