[SPARK-49095][SQL] Update `DecimalType`and `decimal fields` compatible logic of `Avro` data source to avoid loss of decimal precision by wayneguow · Pull Request #47584 · apache/spark

wayneguow · 2024-08-02T10:46:56Z

What changes were proposed in this pull request?

This PR aims to enhance comparison logic to avoid precision loss in decimal parts of Decimal in Avro data source. It refers to the related logic of convert Decimal type in Parquet data source #44513 .

Before:

As long as the length of the integer part(precision - scale) matches, it can be converted.

After:

Both the integer(precision - scale) and decimal part(scale) lengths must match.

Why are the changes needed?

Fixed the issue causing missing data accuracy.

Does this PR introduce any user-facing change?

Yes, stricter matching requirements for conversions between Spark DecimalType and Avro Decimal type.

Previously, decimal(12,10) -> decimal(5,3) was allowed, but there would be some loss of precision in the decimal part, such as: 13.1234567890 -> 13.123
After that, the exception avroIncompatibleReadError will be thrown, because 3 is not greater than or equal 10. Unless both the integer and the decimal part are greater than or equal. For example, decimal(15, 13) is OK because 13>=10 and 15-13>=12-10

But users can restore the legacy behavior, set spark.sql.legacy.avro.allowIncompatibleDecimalType to true.

How was this patch tested?

Pass GA and add new test cases.

Was this patch authored or co-authored using generative AI tooling?

No.

wayneguow · 2024-08-06T14:36:56Z

cc @cloud-fan @LuciferYang when you have time :-)

cloud-fan · 2024-08-06T15:43:25Z

Can we put more information into Does this PR introduce any user-facing change? Especially what kind of queries will be broken after this change.

wayneguow · 2024-08-06T15:56:04Z

Can we put more information into Does this PR introduce any user-facing change? Especially what kind of queries will be broken after this change.

Updated it.

allisonwang-db · 2024-08-07T21:06:33Z

@wayneguow, This seems to be a breaking change since the previous working query will now throw exceptions. I would recommend adding a config (by default disabled) and migration docs.

wayneguow · 2024-08-08T09:40:28Z

@wayneguow, This seems to be a breaking change since the previous working query will now throw exceptions. I would recommend adding a config (by default disabled) and migration docs.

@allisonwang-db Thanks for your suggestions. I added a new legacy config that users can restore the old behavior. Also cc @cloud-fan

github-actions · 2024-11-17T00:27:33Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions bot added SQL AVRO labels Aug 2, 2024

wayneguow changed the title ~~[SPARK-49095][SQL] Update DecimalType compatible logic of Avro data source to avoid loss of decimal precision~~ [SPARK-49095][SQL] Update DecimalTypeand Decimal compatible logic of Avro data source to avoid loss of decimal precision Aug 2, 2024

wayneguow force-pushed the avro branch from f431aa4 to ef2664b Compare August 2, 2024 11:42

update

ec48559

wayneguow force-pushed the avro branch from ef2664b to ec48559 Compare August 8, 2024 08:43

github-actions bot added the DOCS label Aug 8, 2024

github-actions bot added the Stale label Nov 17, 2024

github-actions bot closed this Nov 18, 2024

wayneguow deleted the avro branch February 11, 2025 04:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-49095][SQL] Update `DecimalType`and `decimal fields` compatible logic of `Avro` data source to avoid loss of decimal precision#47584

[SPARK-49095][SQL] Update `DecimalType`and `decimal fields` compatible logic of `Avro` data source to avoid loss of decimal precision#47584
wayneguow wants to merge 1 commit intoapache:masterfrom
wayneguow:avro

wayneguow commented Aug 2, 2024 •

edited

Loading

Uh oh!

wayneguow commented Aug 6, 2024

Uh oh!

cloud-fan commented Aug 6, 2024

Uh oh!

wayneguow commented Aug 6, 2024

Uh oh!

allisonwang-db commented Aug 7, 2024

Uh oh!

wayneguow commented Aug 8, 2024

Uh oh!

github-actions bot commented Nov 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wayneguow commented Aug 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

wayneguow commented Aug 6, 2024

Uh oh!

cloud-fan commented Aug 6, 2024

Uh oh!

wayneguow commented Aug 6, 2024

Uh oh!

allisonwang-db commented Aug 7, 2024

Uh oh!

wayneguow commented Aug 8, 2024

Uh oh!

github-actions bot commented Nov 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wayneguow commented Aug 2, 2024 •

edited

Loading