RFC: Simplify decimal (#2440) #2477

tustvold · 2022-08-17T10:46:39Z

Which issue does this PR close?

Rationale for this change

Whilst reviewing #2439 I felt something was a bit off, this was my quick attempt to simplify things. If we are going to do this, we should probably get it in before the release on Thursday.

What changes are included in this PR?

Rather than using const generic, we use named type structs. This is the same approach we use elsewhere in the codebase, with Int32Type, etc...

Are there any user-facing changes?

Yes, this changes the low-level details of how decimal arrays are handled

tustvold · 2022-08-17T10:48:26Z

arrow/src/array/iterator.rs

-    `DecimalIter` iterates `Decimal128` values as i128 values. \
-    This is kept mostly for back-compatibility purpose. Suggests to use `Decimal128Array.iter()` \
-    that returns `Decimal128Iter`.")]
-pub struct DecimalIter<'a> {


I wanted to reuse this name, as BasicDecimal is confusing. It isn't vital we remove it, but I think it is better to not be stuck on a stale name

tustvold · 2022-08-17T10:49:23Z

arrow/src/util/decimal.rs

 }

-impl<const BYTE_WIDTH: usize> BasicDecimal<BYTE_WIDTH> {
-    #[allow(clippy::type_complexity)]
-    const MAX_PRECISION_SCALE_CONSTRUCTOR_DEFAULT_TYPE: (


This was the main thing that felt a bit off, we constrain the permitted constants here, but then also seal the trait in #2439. The whole thing just felt a little esoteric and hard to follow

The weird thing I find is that, the rust compiler does the constant evaluation lazily. For example,

BasicDecimal<1>::try_new(...);

will cause a compiler error: "invalid byte length" because the constants we defined in the impl BasicDecimal are used in the try_new function. However, hackers can run

BasicDecimal<1>::new(...);

without neither compiler error nor runtime error, because no constant is used in this method, so the compiler will not evaluate these constant.

In a word, a hacker can successfully use an invalid decimal type as long as they never touch the constants defined in the impl BasicDecimal.

Agreed, this is part of what makes me feel that perhaps using const generics in this way doesn't provide the best UX

I think the truth is that, the rust compiler compiles all generic types lazily.
It will only compile the types and methods that are used to avoid large binary output.

HaoYang670 · 2022-08-17T13:12:41Z

Personally, I think what makes implementing Decimal elegantly a hard thing is that the behavior of Decimal is in the middle of trait bound and generic type:

Trait (Different structure, same behaviour).
/\
 |
\/
Decimal128/256  (very similar structure, but not similar behaviour)  
/\
 |
\/
Generic Type (silmilar structure, similar behaviour)

So far, a consensus is that whatever the choice is, we have to implement some part of Decimal128 and Decimal256 separately (although macro can help to simplify this). (For example, the validation function) As currently, Decimal128 is binding to i128 and Decimal256 is binding to [u8; 32] or BigInt.

tustvold · 2022-08-17T13:25:06Z

we have to implement some part of Decimal128 and Decimal256 separately

Perhaps, although the lower you push the differences the more code can be shared. To ground this concretely in what this PR does, the validation logic could be placed on DecimalType, with everything else generic. Or to put it another way, this PR doesn't lose any flexibility over using const generics, but is simpler and more easily extensible.

As currently, Decimal128 is binding to i128 and Decimal256 is binding to [u8; 32] or BigInt.

This is actually a perfect example of why concrete types are a better fit imo than const generics, as the concrete binding type could be expressed as an associated type. This PR doesn't currently do this, as it doesn't need to, but we could easily

HaoYang670 · 2022-08-17T13:26:59Z

Thank you, @tustvold, I am learning your implementation carefully 😁.

alamb

I think this PR makes a of sense to me, but I haven't spent a lot of time in the DecimalArray area and I don't really have a strong usecase personally or professionally for this code.

Thus I would defer to @viirya @liukun4515 and @HaoYang670 who have worked in this area more recently or have more directly usecases

alamb · 2022-08-17T16:14:59Z

arrow/src/array/array_decimal.rs

@@ -68,24 +71,24 @@ use crate::util::decimal::{BasicDecimal, Decimal256};
 ///    assert_eq!(6, decimal_array.scale());
 /// ```
 ///
-pub type Decimal128Array = BasicDecimalArray<16>;
+pub type Decimal128Array = DecimalArray<Decimal128Type>;


This is certainly a more consistent with the other Array types

liukun4515 · 2022-08-18T09:19:01Z

I will review this today later

liukun4515 · 2022-08-18T13:17:49Z

arrow/src/datatypes/types.rs

@@ -455,6 +459,68 @@ impl Date64Type {
    }
 }

+mod private {


👍, use this method to seal the decimal data type.
Other type can't implement the DecimalType

Is this a suggestion or a statement?

just statement not suggestion

tustvold · 2022-08-18T13:37:20Z

arrow/src/array/mod.rs

 pub use self::array_decimal::Decimal128Array;
 pub use self::array_decimal::Decimal256Array;
+pub use self::array_decimal::DecimalArray;


No, this is the generic representation of the array, akin to PrimitiveArray

liukun4515

LGTM
It is a great PR.
Thanks, @tustvold

liukun4515 · 2022-08-18T13:47:23Z

If this merged, I can go on this #2357 work

HaoYang670

LGTM.
I couldn't find even a better way to implement Decimal, but you have done it @tustvold. Amazing work.

Just a nit.
Maybe we could add more docs to explain the relationship between Decimal, DecimalType, Decimal128Type, Decimal256Type and NativeDecimalType, so that other developers can take less time to understand the code.

viirya

Looks a more consistent way to align with existing ones like PrimitiveArray. Nice move.

arrow/src/array/array_decimal.rs

tustvold · 2022-08-18T16:42:52Z

Thank you all for reviewing 😄

ursabot · 2022-08-18T16:52:06Z

Benchmark runs are scheduled for baseline = e60eef3 and contender = 15f42b2. 15f42b2 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Simplify decimal (apache#2440)

4d3ed30

tustvold requested review from viirya and liukun4515 and removed request for viirya August 17, 2022 10:46

tustvold mentioned this pull request Aug 17, 2022

Seal the decimal type. #2439

Closed

github-actions bot added the arrow Changes to the arrow crate label Aug 17, 2022

tustvold requested review from viirya and alamb August 17, 2022 10:47

tustvold commented Aug 17, 2022

View reviewed changes

Format

559f7c1

Fix doc

b151c14

tustvold requested a review from HaoYang670 August 17, 2022 14:28

alamb reviewed Aug 17, 2022

View reviewed changes

liukun4515 reviewed Aug 18, 2022

View reviewed changes

tustvold commented Aug 18, 2022

View reviewed changes

liukun4515 approved these changes Aug 18, 2022

View reviewed changes

HaoYang670 approved these changes Aug 18, 2022

View reviewed changes

viirya approved these changes Aug 18, 2022

View reviewed changes

viirya reviewed Aug 18, 2022

View reviewed changes

arrow/src/array/array_decimal.rs Outdated Show resolved Hide resolved

tustvold added 4 commits August 18, 2022 16:48

Review feedback

f4db98f

Merge remote-tracking branch 'upstream/master' into simplify-decimal

2917bd7

Add docs

dfa9092

Fix logical merge conflict

02d120c

tustvold merged commit 15f42b2 into apache:master Aug 18, 2022

tustvold mentioned this pull request Sep 2, 2022

Replace DecimalArray with PrimitiveArray #2637

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Simplify decimal (#2440) #2477

RFC: Simplify decimal (#2440) #2477

tustvold commented Aug 17, 2022 •

edited

tustvold Aug 17, 2022

tustvold Aug 17, 2022

HaoYang670 Aug 17, 2022

tustvold Aug 17, 2022

HaoYang670 Aug 17, 2022

HaoYang670 commented Aug 17, 2022

tustvold commented Aug 17, 2022 •

edited

HaoYang670 commented Aug 17, 2022

alamb left a comment

alamb Aug 17, 2022

liukun4515 commented Aug 18, 2022

liukun4515 Aug 18, 2022 •

edited

tustvold Aug 18, 2022

liukun4515 Aug 18, 2022

tustvold Aug 18, 2022

liukun4515 left a comment

liukun4515 commented Aug 18, 2022

HaoYang670 left a comment

viirya left a comment

tustvold commented Aug 18, 2022

ursabot commented Aug 18, 2022

RFC: Simplify decimal (#2440) #2477

RFC: Simplify decimal (#2440) #2477

Conversation

tustvold commented Aug 17, 2022 • edited

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HaoYang670 commented Aug 17, 2022

tustvold commented Aug 17, 2022 • edited

HaoYang670 commented Aug 17, 2022

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liukun4515 commented Aug 18, 2022

liukun4515 Aug 18, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liukun4515 left a comment

Choose a reason for hiding this comment

liukun4515 commented Aug 18, 2022

HaoYang670 left a comment

Choose a reason for hiding this comment

viirya left a comment

Choose a reason for hiding this comment

tustvold commented Aug 18, 2022

ursabot commented Aug 18, 2022

tustvold commented Aug 17, 2022 •

edited

tustvold commented Aug 17, 2022 •

edited

liukun4515 Aug 18, 2022 •

edited