Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introduce convert Expr to SQL string API and basic feature #9517

Merged
merged 5 commits into from
Mar 11, 2024

Conversation

backkem
Copy link
Contributor

@backkem backkem commented Mar 9, 2024

Relates to #9495.

This is an initial attempt at porting over the SQL Unparser as described in the tracking ticket.

Open question: Do we need this to be feature complete before landing?

@github-actions github-actions bot added the sql label Mar 9, 2024
Copy link
Contributor

@devinjdangelo devinjdangelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is entirely new functionality, I think it is a good idea to merge this as a start so we can move further development to the DataFusion repo, where it will be easiest to collaborate with more interested parties.

@@ -0,0 +1,52 @@
pub trait Dialect {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is too bad we can't simply use the Dialect trait already in sqlparser... but it does appear that trait can't tell you the quote style directly.

Copy link
Contributor

@alamb alamb Mar 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we make a change upstream to sqlparser? I would be happy to review such a PR (or maybe just make it?) We could leave a comment in the PR to remove this trait when the new sqlparser cate was released

let r = self.expr_to_sql(right.as_ref())?;
let op = self.op_to_sql(op)?;

Ok(self.binary_op_to_sql(l, r, op))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found that we lose order of operations information here, but that wrapping the output in SQLExpr::Nested fixes it.

E.g.

x+1 / 2

is not the same as

(x+1)/2

I can open a follow up PR with some tests to show this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does DF represent this nesting explicitly as well? Or does it use the Expr hierarchy to capture the order?

If it's the latter, do we want to inject brackets for every operations or have a heuristic to only add them when mathematically needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on playing around with round trips (AST->Logical Plan->AST) I believe that Logical Plans encode operation ordering in the tree itself while AST represents the literal ordering in the SQL string. The easiest way to guarantee mathematical equivalence would be to always add SQLExpr::Nested. A heuristic could work but would be quite a bit more complex I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DF represents the nesting via Expr children

In the SQL parser, there are precident rules to resolve how a + b + c into a tree and then DataFusion simply gets the same tree structure

So the difference between (a + b) + c ad a + (b + c) is a function of what the left and right of the BinaryExpr are

I agree with @devinjdangelo that we should file a ticket to track this issue and work on in a subsequent PR

}
}

fn scalar_to_sql(&self, v: &ScalarValue) -> Result<ast::Value> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that some DataFusion Scalars will require mapping to a SQLExpr. For example for date scalars, I think the only way is to convert to something like CAST(quoted_string_val to DATE).

This is a private function though, so does not need to be resolved prior to merging imo.

#[test]
fn expr_to_sql_ok() -> Result<()> {
let tests: Vec<(Expr, &str)> = vec![
(col("a").gt(lit(4)), r#"a > 4"#),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these tests look good to start. It might also be a good idea to have some tests that do a full round trip using SqltoRel, e.g. String -> AST -> LogicalPlan -> AST -> String and making sure the Strings and AST match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the round trip test would be good (and will make it quite mechanical to add coverage for the missing expressions)

Something like

round_trip("a"); // parse "a" to Expr and then turn back to string
round_trip("a + b");
...

I think we could do it as a follow on PR as well

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all thank you @backkem and @devinjdangelo -- this looks great.

Open question: Do we need this to be feature complete before landing?

No

Since this is entirely new functionality, I think it is a good idea to merge this as a start so we can move further development to the DataFusion repo, where it will be easiest to collaborate with more interested parties.

I agree 100%

Typically what we do in DataFusion is incrementally develop features on main. The requirements are typically:

  1. The new code has test coverage
  2. It doesn't regress any existing functionality (aka make it impossible to do something that used to be possible)
  3. Any unimplemented features return DataFusionError::NotYetImplemented rather than, for example, panics

@alamb alamb changed the title feat: convert Expr to SQL string feat: Introduce convert Expr to SQL string API and basic eature Mar 10, 2024
@alamb alamb changed the title feat: Introduce convert Expr to SQL string API and basic eature feat: Introduce convert Expr to SQL string API and basic feature Mar 10, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @backkem and @devinjdangelo (again)!

I started the CI -- once that is clean I think this PR would be good to merge

Here are some follow on items that occur to me. How would you like to track them (perhaps I can file a ticket or we can put a list on the existing ticket)?

  • Upstream changes in datafusion/sql/src/unparser/dialect.rs to sqlparser-rs
  • Additional Expr variant support
  • Sorting out binary expr nesting
  • Add an example of how to use this API in expr_api.rs example
  • Round trip tests

@@ -0,0 +1,52 @@
pub trait Dialect {
Copy link
Contributor

@alamb alamb Mar 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we make a change upstream to sqlparser? I would be happy to review such a PR (or maybe just make it?) We could leave a comment in the PR to remove this trait when the new sqlparser cate was released

datafusion/sql/src/unparser/expr.rs Outdated Show resolved Hide resolved
datafusion/sql/src/unparser/expr.rs Outdated Show resolved Hide resolved
let r = self.expr_to_sql(right.as_ref())?;
let op = self.op_to_sql(op)?;

Ok(self.binary_op_to_sql(l, r, op))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DF represents the nesting via Expr children

In the SQL parser, there are precident rules to resolve how a + b + c into a tree and then DataFusion simply gets the same tree structure

So the difference between (a + b) + c ad a + (b + c) is a function of what the left and right of the BinaryExpr are

I agree with @devinjdangelo that we should file a ticket to track this issue and work on in a subsequent PR

#[test]
fn expr_to_sql_ok() -> Result<()> {
let tests: Vec<(Expr, &str)> = vec![
(col("a").gt(lit(4)), r#"a > 4"#),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the round trip test would be good (and will make it quite mechanical to add coverage for the missing expressions)

Something like

round_trip("a"); // parse "a" to Expr and then turn back to string
round_trip("a + b");
...

I think we could do it as a follow on PR as well

datafusion/sql/src/unparser/mod.rs Show resolved Hide resolved
@backkem
Copy link
Contributor Author

backkem commented Mar 11, 2024

I believe I addressed the points that we didn't decide to ticket out. The CI indicates that some workflow still requires approval from a maintainer

@backkem
Copy link
Contributor Author

backkem commented Mar 11, 2024

I see the CI fails on some dead code. I'll look into it. There is another CI failure but it seems related to a network timeout.

@alamb
Copy link
Contributor

alamb commented Mar 11, 2024

I added the follow on tasks to #9495. Thanks @backkem

@alamb alamb merged commit 02f7e1f into apache:main Mar 11, 2024
22 of 23 checks passed
@alamb
Copy link
Contributor

alamb commented Mar 11, 2024

Onwards and upwards (and thank you for your first contribution @backkem )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants