fix: Maintain `SUM` precision during two-phase aggregation #17815

rkrishn7 · 2025-09-28T23:56:33Z

Which issue does this PR close?

Closes Unexpected failure in a subquery with a filter (SQLStorm) #17699

What changes are included in this PR?

Allows configuration of Sum aggregate UDF to maintain decimal precision
Rewrites SUMs during logical optimizer rule SingleDistinctToGroupBy

Are these changes tested?

Yes

Are there any user-facing changes?

Adds SumProperties

2010YOUY01 · 2025-09-29T09:00:36Z

Thank you for the fix. I have a question:

The original error message (in the issue) is: Arrow error: Invalid argument error: Invalid comparison operation: Decimal128(35, 2) > Decimal128(25, 2)
This operation seems valid

> CREATE TABLE orders (
    d1 decimal(25,2),
    d2 decimal(35,2));
0 row(s) fetched.
Elapsed 0.008 seconds.

> select d1 > d2 from orders;
+-----------------------+
| orders.d1 > orders.d2 |
+-----------------------+
+-----------------------+
0 row(s) fetched.
Elapsed 0.008 seconds.

Is it possible to make this comparison work, and keep the precision increase behavior required by Spark? It looks simpler.

rkrishn7 · 2025-09-29T17:06:18Z

Thank you for the fix. I have a question:

The original error message (in the issue) is: Arrow error: Invalid argument error: Invalid comparison operation: Decimal128(35, 2) > Decimal128(25, 2) This operation seems valid
> CREATE TABLE orders (
    d1 decimal(25,2),
    d2 decimal(35,2));
0 row(s) fetched.
Elapsed 0.008 seconds.

> select d1 > d2 from orders;
+-----------------------+
| orders.d1 > orders.d2 |
+-----------------------+
+-----------------------+
0 row(s) fetched.
Elapsed 0.008 seconds.

Hey @2010YOUY01 - yes it's a valid operation. The issue in this case is that the optimizer rule inserts the new node after type coercion runs. In your example this does not happen.

Is it possible to make this comparison work, and keep the precision increase behavior required by Spark? It looks simpler.

Yes, but in my opinion the precision should not increase in the first place. The precision should be equivalent with two-phase aggregation so I think it's better not to change it. But happy to hear your thoughts.

Omega359 · 2025-10-09T16:11:24Z

The code that does the comparison I believe is in arrow-rs's cmp.rs. It requires exact type matching (I've seen issues here with Timestamp types as well)

rkrishn7 · 2025-10-31T06:17:26Z

Hi @2010YOUY01 - just bumping this PR!

Jefffrey · 2025-11-05T12:26:27Z

A concern I have with this approach is we might have to repeat this pattern for other functions that similarly alter the resultant decimal precision & scale. Off the top of my head, avg() might encounter a similar issue?

fix: Maintain SUM precision during two-phase aggregation

c2771a7

github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Sep 28, 2025

fix spell check

deda33c

rkrishn7 added 2 commits October 9, 2025 09:32

Merge branch 'main' into fix/17699

9e24c9a

Merge branch 'main' into fix/17699

ca49cd3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Maintain `SUM` precision during two-phase aggregation #17815

fix: Maintain `SUM` precision during two-phase aggregation #17815

rkrishn7 commented Sep 28, 2025

Uh oh!

2010YOUY01 commented Sep 29, 2025

Uh oh!

rkrishn7 commented Sep 29, 2025

Uh oh!

Omega359 commented Oct 9, 2025

Uh oh!

rkrishn7 commented Oct 31, 2025

Uh oh!

Jefffrey commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: Maintain SUM precision during two-phase aggregation #17815

Are you sure you want to change the base?

fix: Maintain SUM precision during two-phase aggregation #17815

Conversation

rkrishn7 commented Sep 28, 2025

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

2010YOUY01 commented Sep 29, 2025

Uh oh!

rkrishn7 commented Sep 29, 2025

Uh oh!

Omega359 commented Oct 9, 2025

Uh oh!

rkrishn7 commented Oct 31, 2025

Uh oh!

Jefffrey commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: Maintain `SUM` precision during two-phase aggregation #17815

fix: Maintain `SUM` precision during two-phase aggregation #17815