-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix wrong projection 'optimization' #268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Though looks like removing the optimisation breaks a crapload of logical plan text assertions in the tests that now contain an extra top-level projection node :(
Could we retain the optimisation but only apply it when expr consists entirely of simple column refs? Not sure if that'd patch up all the tests or not.
Edit: or we could just modify the test assertions, I suppose, if we want to keep the initial planning process simple and do this kind of optimisation later?
Thanks @returnString ! I think the optimization also doesn't improve performance that much, as the projection with just the fields should be almost "free" in DF. I will add an issue to add the optimization as a rule. |
|
+1 for moving this to a separate optimization rule instead of doing it within the SQL planner. |
Codecov Report
@@ Coverage Diff @@
## master #268 +/- ##
==========================================
- Coverage 76.21% 76.20% -0.01%
==========================================
Files 140 140
Lines 23553 23556 +3
==========================================
+ Hits 17950 17952 +2
- Misses 5603 5604 +1
Continue to review full report at Codecov.
|
|
Merging this - seems a uncontroversial change / correctness improvement. Adding an issue for bringing back an optimization to get rid of unnecessary projections. |
|
Thank you for this @Dandandan 👍 |
…#268) * fix: incorrect result on Comet multiple column distinct count * Update core/src/execution/datafusion/planner.rs Co-authored-by: Andy Grove <andygrove73@gmail.com> --------- Co-authored-by: Andy Grove <andygrove73@gmail.com>
Which issue does this PR close?
Closes #264
Rationale for this change
SQL planning to logical plan had a wrong rule: whenever the schema fields names were equal, it applies an "optimization" to remove the projection.
I found this while working on the
SELECT DISTINCT.If we want to have an optimization pass like this, it should look at whether the expressions are equal instead and I think it would be better to include as a real optimization rule instead of while building the logical plan from the query.
What changes are included in this PR?
Skip the "optimization".
Are there any user-facing changes?
No.