Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-3866] "numeric field overflow" when running the generated SQL in PostgreSQL #1867

Merged
merged 1 commit into from
May 9, 2020

Conversation

wenhuitang
Copy link
Contributor

Pull request for https://issues.apache.org/jira/browse/CALCITE-3866
As for aggregate function sum, most of the time, its return type is equal to its operand's type, but if its operand's type has precision, the precision of the result may be larger than the oeprans's type. So may be we can set precision to the Max value if the oeprand's type has precision, especially for the type decimal.

@XuQianJin-Stars
Copy link
Contributor

LGTM +1

Copy link
Member

@hsyuan hsyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@hsyuan hsyuan added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label May 4, 2020
@DonnyZone
Copy link
Contributor

Hi, @wenhuitang, could you please rebase and resolve the conflict? We can merge it into 1.23.0.
Moreover, I suggest to simplify the title and commit message as:
"numeric field overflow" when running the generated SQL in PostgreSQL

@DonnyZone
Copy link
Contributor

DonnyZone commented May 7, 2020

@wenhuitang wenhuitang changed the title [CALCITE-3866] ReturnTypes.AGG_SUM may cause "numeric field overflow" on PostgreSQL when generate the sql after using the rule AggregateJoinTransposeRule.EXTENDED [CALCITE-3866] "numeric field overflow" when running the generated SQL in PostgreSQL May 9, 2020
@wenhuitang
Copy link
Contributor Author

I investigated the return type for sum function in Hive[1] and Spark[2]. It seems they both compute the precision as Math.min(MAX_PRECISION, inputPrecision + 10). Maybe we can align with them.
@wenhuitang @XuQianJin-Stars @hsyuan WDYT?

[1]https://github.com/apache/hive/blob/54b87999fa0c23f9902faae609e7441e0693a22b/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java#L247

[2]https://github.com/apache/spark/blob/272d229005b7166ab83bbb8f44a4d5e9d89424a1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala#L56

Thanks a lot. I have rebased this pr. IMO, as for computing precision as Math.min(MAX_PRECISION, inputPrecision + 10), I did not find any theory or standard for it. To avoid the problem completely, it would be ok with max precision.

@DonnyZone
Copy link
Contributor

LGTM!

@DonnyZone DonnyZone merged commit e081c5b into apache:master May 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants