Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3779: [java] any big decimal conversion #2282

Merged
merged 4 commits into from Oct 9, 2023

Conversation

clesaec
Copy link
Contributor

@clesaec clesaec commented Jun 13, 2023

What is the purpose of the change

AVRO-3779 propose a new feature for big decimal conversion. Current decimal conversion require to anticipate precision and scale, and so, to know in advance approximative value of it.
This conversions, added to existing, allow to deal with completly random big decimal value. Value is store in byte buffer, completly non human readable, but keep the value.

Verifying this change

This change added tests 'TestBigDecimalConversion' and can be verified as follows:
Run junit new test, or run all unit test of java module

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? not documented

@github-actions github-actions bot added the Java Pull Requests for Java binding label Jun 13, 2023
@opwvhk
Copy link
Contributor

opwvhk commented Jun 16, 2023

What I like: the scale is part of the value, giving greater flexibility when reading (schema resolution does not provide a widening conversion for logical types like decimal)

Personally, I would really like this to be documented in this PR.

@clesaec
Copy link
Contributor Author

clesaec commented Jun 16, 2023

thanks,
ok, i'll put it doc/content/en/docs/++version++/Specification/_index.md file

@clesaec
Copy link
Contributor Author

clesaec commented Jun 16, 2023

Doc added.

@martin-g
Copy link
Member

Just wondering: Could this be done with "type": "decimal" + "java-type": "java.math.BigDecimal ? Honestly I am not familiar with java-type.

I'm asking because the new "type": "big-decimal" looks too Java specific to me.
In Rust the closest I know of is https://crates.io/crates/rust_decimal. In Python I think it is https://docs.python.org/3/library/decimal.html, in C# - BigRational ?!, etc. If we are going to add something new to the Spec then we should try to make it as compatible as possible for most/all of the SDKs.

@clesaec
Copy link
Contributor Author

clesaec commented Jun 19, 2023

In rust, this big decimal definition is near the one of Java, as it contains constructor as
pub fn new(digits: BigInt, scale: i64) -> BigDecimal {

In C#, big rational seems even better, as it's directly a fraction of 2 big integer (so, support easily 10/3, which is not obvious in Java), but not directly compatible (but we can build seralizer / deserializer that can deal with this format).

i didn't look in other language, in any case, it should be possible to use the proposed stored format to a defined serialization (even if i first code it for Java).

For the moment, i didn't see logical type with "java-type" arguments; may be i could add "lang" arg ?
as "langs" : ["java", "rust"] (adding values when PR integrated) ? but, what to do if the logical type is not managed by the lang i used ?

@clesaec
Copy link
Contributor Author

clesaec commented Jun 21, 2023

this PR does same for rust BigDecimal


@Override
public String getLogicalTypeName() {
return "bigdecimal";
Copy link
Contributor

@opwvhk opwvhk Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name does not equals the one in the documentation and the tests. Can we please adjust so they're the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@clesaec clesaec requested a review from opwvhk September 18, 2023 06:24
@KalleOlaviNiemitalo
Copy link

For C#, the Apache.Avro library already defines an AvroDecimal type with a public AvroDecimal(BigInteger unscaledValue, int scale) constructor, so it doesn't need a new data type. AFAICT, the library does not support generating an Avro schema from C# types and thus would not need to recognize an attribute for controlling whether an property with type AvroDecimal maps to the fixed-scale "decimal" logical type or to a variable-scale decimal.

@clesaec
Copy link
Contributor Author

clesaec commented Oct 6, 2023

So, we can merge this and create a PR for C# that will map this big-decimal type with C# AvroDecimal type ?

@KalleOlaviNiemitalo
Copy link

Yes I think that would be okay.

However, I feel there should be a list of Avro spec features that are not yet supported in all languages. So that spec changes can be approved and merged without waiting for them to be implemented in all languages, but developers of schemas can avoid using features that might not be supported in languages used by their partners.

Comment on lines +813 to +826
**alternative**

As it's not always possible to fix scale and precision in advance for a decimal field, `big-decimal` is another `decimal` logical type restrict to Avro _bytes_.

_only available in Java_

```json
{
"type": "bytes",
"logicalType": "big-decimal"
}
```
Here, as scale property is stored in value itself it needs more bytes than preceding `decimal` type, but it allows more flexibility.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No comments on the Java code, but I'd like to see the specification clarified a bit.

  • Separate the "big-decimal" description under its own ### Decimal (variable scale) heading, for these purposes:

    • Gives it an ID in HTML so that it can be linked to.
    • Makes it show up in the table of contents.
    • Makes it clear that the schema-resolution matching rule of "decimal" does not apply.
  • Describe how the decimal value is encoded. AFAICT, it goes like this:

    A big-decimal logical type annotates an Avro bytes type, which stores the binary encoding of the following record:

    {
        "type": "record",
        "name": "BigDecimal",
        "doc": "Avro 'big-decimal' value encoded within 'bytes'",
        "fields": [
            {
                "name": "unscaled",
                "type": "bytes",
                "doc": "The two’s-complement representation of the unscaled integer value in big-endian byte order"
            },
            {
                "name": "scale",
                "type": "int",
                "doc": "How many digits of the unscaled value lie at the right side of the decimal point.  Must not be negative."
            }
        ]
    }

    Or describe it in prose -- as long as the specification is clear enough that the logical type can be implemented without looking at how the Java code does it.

@KalleOlaviNiemitalo
Copy link

I don't intend this comment to delay the merging of this pull request.

So far, all Avro logical types annotate fixed, bytes, string, int, or long types. From #2282 (comment), it occurs to me that "big-decimal" might be reasonable to use with a record type, similar to a CBOR decimal fraction (tag number 4):

{
    "type": "record",
    "namespace": "apache.avro.test",
    "name": "RecordOfTwoBigDecimals",
    "fields": [
        {
            "name": "dec1",
            "type": {
                "type": "record",
                "name": "apache.avro.BigDecimal",
                "fields": [
                    {
                        "name": "unscaled",
                        "type": "bytes"
                    },
                    {
                        "name": "scale",
                        "type": "int"
                    }
                ],
                "logicalType": "big-decimal"
            }
        },
        {
            "name": "dec2",
            "type": "apache.avro.BigDecimal"
        }
    ]
}
  • Pro: Would make the binary encoding a little smaller, as the length field of the "bytes" value around the "big-decimal" logical value would be omitted.
  • Pro: This would cause the JSON encoding to have "unscaled" and "scale" as separate fields, making it easier to read.
  • Con: The value of the "unscaled" field in JSON could lose precision if a library reads it as a double (especially in JavaScript).
  • Con: Schemas using "big-decimal" would become more cumbersome to write, as the underlying record type would have to be defined. This could be mitigated by having the schema parser define "apache.avro.BigDecimal" implicitly if referenced but not locally defined; but if that causes the "apache.avro.BigDecimal" definition to become embedded in a schema that is uploaded to a schema registry service, then there is a risk that the schema might be considered a derivative work of Avro and require Apache legal notices to be included.
  • Con: Would require larger changes in the implementations. The C# library does not currently support a named schema having a logical type.

Overall, the difficulties of that approach seem to outweigh the benefits.

@clesaec
Copy link
Contributor Author

clesaec commented Oct 9, 2023

Java accept named schema to have logical type, but, specially with second Con, i agree with you.
I'll merge this PR and the rust one as there is no major opposition.

@KalleOlaviNiemitalo
Copy link

  • Con: The value of the "unscaled" field in JSON could lose precision if a library reads it as a double (especially in JavaScript).

That would not actually be a problem, because the type of the "unscaled" field would be "bytes", and its value would thus be encoded in JSON as a string rather than a number.

@clesaec clesaec merged commit 3ea027a into apache:main Oct 9, 2023
22 of 23 checks passed
@rotilho-bv
Copy link

Hey guys! Do you have an ETA for the release of this change by any chance?

@clesaec
Copy link
Contributor Author

clesaec commented Nov 14, 2023

I don't know when the 1.12.0 is planned for release, i can put this in 1.11 branch for the future 1.11.4 release

@rotilho-bv
Copy link

That would be awesome @clesaec !

@clesaec
Copy link
Contributor Author

clesaec commented Nov 15, 2023

PR ready to be reviewed @rotilho-bv

RanbirK pushed a commit to RanbirK/avro that referenced this pull request May 13, 2024
* AVRO-3779: any big decimal conversion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java Pull Requests for Java binding website
Projects
None yet
5 participants