Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omit NULLS FIRST/LAST when unparsing ORDER BY clauses for MySQL #10625

Merged
merged 5 commits into from
May 23, 2024

Conversation

phillipleblanc
Copy link
Contributor

@phillipleblanc phillipleblanc commented May 22, 2024

Which issue does this PR close?

Closes #10624

Rationale for this change

When unparsing LogicalPlans back to SQL statements for the MySQL dialect, any plan that includes a Sort node will produce a SQL statement that looks like:

SELECT "ta"."j1_id" FROM "j1" AS "ta" ORDER BY "ta"."j1_id" ASC NULLS LAST LIMIT 10
                                                                ^--------^              

The NULLS LAST causes issues when run against a MySQL server, as it doesn't recognize that syntax.

What changes are included in this PR?

Adds a new function to the Unparser Dialect called supports_nulls_first_in_sort which is default true, but set to false for the MySQL dialect. And in the relevant unparsing code, we set the AST node to None.

Are these changes tested?

Yes, added a new test to cover dialect-specific behavior.

Are there any user-facing changes?

No breaking changes.

@phillipleblanc phillipleblanc changed the title Phillip/240523 fix mysql order by Omit NULLS FIRST/LAST when deparsing ORDER BY clauses for MySQL May 22, 2024
@github-actions github-actions bot added the sql SQL Planner label May 22, 2024
@phillipleblanc
Copy link
Contributor Author

cc @backkem @devinjdangelo to take a look

@phillipleblanc phillipleblanc changed the title Omit NULLS FIRST/LAST when deparsing ORDER BY clauses for MySQL Omit NULLS FIRST/LAST when unparsing ORDER BY clauses for MySQL May 22, 2024
Copy link
Contributor

@backkem backkem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we'll have to remember to upstream this to the SQLParser package.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @phillipleblanc and @backkem

I wonder if we should update the comments in Dialect so that we can upstream the trait once we have some more experience with what features are needed in DataFusion 🤔

@phillipleblanc
Copy link
Contributor Author

phillipleblanc commented May 23, 2024

I wonder if we should update the comments in Dialect so that we can upstream the trait once we have some more experience with what features are needed in DataFusion 🤔

I can see the argument both for upstreaming and for keeping them separate. Although the argument for upstreaming is stronger imo.

The argument for upstreaming is: We want a single Dialect trait that handles concerns from both parsing and unparsing. That centralizes the logic and makes it so that there is just "one" place to encode dialect-specific differences. 👍 Makes perfect sense.

The argument for keeping them separate is: By putting in logic to sqlparser-rs that only DataFusion cares about (or more generally, systems that want to handle unparsing) - we set a precedent that "I need this for my use-case, so now everyone is forced to have it even though they don't use it", which I'm not 100% happy with (although I do think its very minor). It would make more sense if the Dialect was taken into consideration by sqlparser-rs when writing out the AST to String, but its only considered on parsing.

I'll update the comment for Dialect.

edit: I realized after posting this, it does make sense to throw an error when parsing a MySQL dialect if the string contains NULLS FIRST, so it does make sense to include this in the sqlparser-rs Dialect. I'll create a PR to sqlparser-rs for it.

edit 2: Here is the sqlparser-rs PR: sqlparser-rs/sqlparser-rs#1284

edit 3: I closed that PR, see below discussion

@backkem
Copy link
Contributor

backkem commented May 23, 2024

Also, I think it's the idea to move the entire unparser to the SQLParser package eventually.

@alamb
Copy link
Contributor

alamb commented May 23, 2024

Also, I think it's the idea to move the entire unparser to the SQLParser package eventually.

I think this may be hard, as the Unparser relies in Exprs (which are a DataFusion struct)

@phillipleblanc
Copy link
Contributor Author

phillipleblanc commented May 23, 2024

I closed the upstream sqlparser-rs PR for now, since it is more of a semantic issue vs a syntax issue (thanks @alamb for pointing that out). But now we're back to square one on my dilemma of whether it makes sense to upstream a dialect behavior that isn't used by the parser library itself. I'll think about this some more - also would like to get people's feedback.

So for now I'll just remove the comment on the Unparser Dialect linking to the now closed PR and we can revisit later.

@alamb alamb merged commit d2f6faf into apache:main May 23, 2024
23 checks passed
@alamb
Copy link
Contributor

alamb commented May 23, 2024

Thanks @phillipleblanc and @backkem

phillipleblanc added a commit to spiceai/datafusion that referenced this pull request May 29, 2024
…he#10625)

* Omit `NULLS FIRST/LAST` when deparsing ORDER BY clauses for MySQL

* Add test and fix sort_to_sql

* Fix link for cargo doc

* Remove reference to sqlparser-rs PR
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
…he#10625)

* Omit `NULLS FIRST/LAST` when deparsing ORDER BY clauses for MySQL

* Add test and fix sort_to_sql

* Fix link for cargo doc

* Remove reference to sqlparser-rs PR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MySQL doesn't support the NULLS FIRST/LAST clause in ORDER BY statements
3 participants