Skip to content

Conversation

@rkrishn7
Copy link
Contributor

Which issue does this PR close?

What changes are included in this PR?

  • Allows configuration of Sum aggregate UDF to maintain decimal precision
  • Rewrites SUMs during logical optimizer rule SingleDistinctToGroupBy

Are these changes tested?

Yes

Are there any user-facing changes?

Adds SumProperties

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Sep 28, 2025
@2010YOUY01
Copy link
Contributor

Thank you for the fix. I have a question:

The original error message (in the issue) is: Arrow error: Invalid argument error: Invalid comparison operation: Decimal128(35, 2) > Decimal128(25, 2)
This operation seems valid

> CREATE TABLE orders (
    d1 decimal(25,2),
    d2 decimal(35,2));
0 row(s) fetched.
Elapsed 0.008 seconds.

> select d1 > d2 from orders;
+-----------------------+
| orders.d1 > orders.d2 |
+-----------------------+
+-----------------------+
0 row(s) fetched.
Elapsed 0.008 seconds.

Is it possible to make this comparison work, and keep the precision increase behavior required by Spark? It looks simpler.

@rkrishn7
Copy link
Contributor Author

Thank you for the fix. I have a question:

The original error message (in the issue) is: Arrow error: Invalid argument error: Invalid comparison operation: Decimal128(35, 2) > Decimal128(25, 2) This operation seems valid

> CREATE TABLE orders (
    d1 decimal(25,2),
    d2 decimal(35,2));
0 row(s) fetched.
Elapsed 0.008 seconds.

> select d1 > d2 from orders;
+-----------------------+
| orders.d1 > orders.d2 |
+-----------------------+
+-----------------------+
0 row(s) fetched.
Elapsed 0.008 seconds.

Hey @2010YOUY01 - yes it's a valid operation. The issue in this case is that the optimizer rule inserts the new node after type coercion runs. In your example this does not happen.

Is it possible to make this comparison work, and keep the precision increase behavior required by Spark? It looks simpler.

Yes, but in my opinion the precision should not increase in the first place. The precision should be equivalent with two-phase aggregation so I think it's better not to change it. But happy to hear your thoughts.

@Omega359
Copy link
Contributor

Omega359 commented Oct 9, 2025

The code that does the comparison I believe is in arrow-rs's cmp.rs. It requires exact type matching (I've seen issues here with Timestamp types as well)

@rkrishn7
Copy link
Contributor Author

Hi @2010YOUY01 - just bumping this PR!

@Jefffrey
Copy link
Contributor

Jefffrey commented Nov 5, 2025

A concern I have with this approach is we might have to repeat this pattern for other functions that similarly alter the resultant decimal precision & scale. Off the top of my head, avg() might encounter a similar issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected failure in a subquery with a filter (SQLStorm)

4 participants