[SPARK-49095][SQL] Update DecimalTypeand decimal fields compatible logic of Avro data source to avoid loss of decimal precision#47584
Conversation
DecimalType compatible logic of Avro data source to avoid loss of decimal precisionDecimalTypeand Decimal compatible logic of Avro data source to avoid loss of decimal precision
|
cc @cloud-fan @LuciferYang when you have time :-) |
|
Can we put more information into |
Updated it. |
|
@wayneguow, This seems to be a breaking change since the previous working query will now throw exceptions. I would recommend adding a config (by default disabled) and migration docs. |
DecimalTypeand Decimal compatible logic of Avro data source to avoid loss of decimal precisionDecimalTypeand decimal fields compatible logic of Avro data source to avoid loss of decimal precision
@allisonwang-db Thanks for your suggestions. I added a new legacy config that users can restore the old behavior. Also cc @cloud-fan |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This PR aims to enhance comparison logic to avoid precision loss in decimal parts of
Decimalin Avro data source. It refers to the related logic of convertDecimaltype inParquetdata source #44513 .Before:
precision - scale) matches, it can be converted.After:
precision - scale) and decimal part(scale) lengths must match.Why are the changes needed?
Fixed the issue causing missing data accuracy.
Does this PR introduce any user-facing change?
Yes, stricter matching requirements for conversions between Spark
DecimalTypeandAvroDecimal type.Previously, decimal(12,10) -> decimal(5,3) was allowed, but there would be some loss of precision in the decimal part, such as: 13.1234567890 -> 13.123
After that, the exception
avroIncompatibleReadErrorwill be thrown, because 3 is not greater than or equal 10. Unless both the integer and the decimal part are greater than or equal. For example, decimal(15, 13) is OK because 13>=10 and 15-13>=12-10But users can restore the legacy behavior, set
spark.sql.legacy.avro.allowIncompatibleDecimalTypetotrue.How was this patch tested?
Pass GA and add new test cases.
Was this patch authored or co-authored using generative AI tooling?
No.