-
Notifications
You must be signed in to change notification settings - Fork 1.8k
fix: Maintain SUM precision during two-phase aggregation
#17815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thank you for the fix. I have a question: The original error message (in the issue) is: Is it possible to make this comparison work, and keep the precision increase behavior required by Spark? It looks simpler. |
Hey @2010YOUY01 - yes it's a valid operation. The issue in this case is that the optimizer rule inserts the new node after type coercion runs. In your example this does not happen.
Yes, but in my opinion the precision should not increase in the first place. The precision should be equivalent with two-phase aggregation so I think it's better not to change it. But happy to hear your thoughts. |
|
The code that does the comparison I believe is in arrow-rs's cmp.rs. It requires exact type matching (I've seen issues here with Timestamp types as well) |
|
Hi @2010YOUY01 - just bumping this PR! |
|
A concern I have with this approach is we might have to repeat this pattern for other functions that similarly alter the resultant decimal precision & scale. Off the top of my head, |
Which issue does this PR close?
What changes are included in this PR?
SUMs during logical optimizer ruleSingleDistinctToGroupByAre these changes tested?
Yes
Are there any user-facing changes?
Adds
SumProperties