-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Allow DISTINCT with ORDER BY and an aliased select list #5307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| .distinct() | ||
| .unwrap() | ||
| // try to sort on some value not present in input to distinct | ||
| .sort(vec![col("c2").sort(true, true)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a newly added test in #5258
I think the answer is wrong in this case - in particular you can see there are duplicate values of c1 produced.
|
|
||
| let df_results = plan.clone().collect().await?; | ||
|
|
||
| #[rustfmt::skip] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a newly added test in #5258
If you look at the previous answers, it appears clearly incorrect to me -- there are duplicate values of c1 produced. I updated the test to use c1 and avoid an error as well as added a test with sorting by c2 that shows the error
| vec![ | ||
| Arc::new(Int32Array::from_slice([1, 10, 10, 100])), | ||
| Arc::new(Int32Array::from_slice([2, 12, 12, 120])), | ||
| Arc::new(Int32Array::from_slice([2, 3, 4, 5])), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also a newly added test -- and since the values of (a, b) are distinct where (a) was, it didn't show wrong results. If you change this data, prior to this PR it does show incorrect results
I fixed the sort to be on a valid column and added a test for incorrect columns
|
cc @xiaoyong-z and @liukun4515 |
|
I would appreciate a review on this sooner rather than later because #5293 is effectively a regression for our users. Do you have time to take a look @xiaoyong-z ? |
|
sorry @alamb I'm busy with my own things recently, and I don't think I'll have free time in the short term. I'm very sorry about this. |
|
Thank you for letting me know @xiaoyong-z |
stuartcarnie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @alamb. Only notable edit is a println! in the sort function.
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
|
Benchmark runs are scheduled for baseline = a853123 and contender = 554852e. 554852e is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
* Allow DISTINCT with ORDER BY and an aliased select list * fix: update tests * Update datafusion/core/tests/sqllogictests/test_files/order.slt Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com> * Update datafusion/expr/src/logical_plan/builder.rs Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com> * update test --------- Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Which issue does this PR close?
Close #5293
Rationale for this change
resolves #5293
Basically the check added in #5132 and #5258 is overly strict in some cases
What changes are included in this PR?
LogicalPlanBuilderat the site where the problem is introduced (where new columns are added) rather than trying to figure it out beforehand.Are these changes tested?
Yes, new unit tests as well as integration tests
Are there any user-facing changes?
Some queries now pass