-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace DecimalArray with PrimitiveArray #2637
Comments
@tustvold Do you means that you want to implement the
But what's the
|
Something like that, or we remove it. I suspect there will be some churn to making it work |
In arrow2, Decimal128 is based on |
use Do you have any thoughts about this @viirya ? I know you implement the most feature about decimal256 and refactor some part of decimal. |
What I find in arrow2 is that it defines its own Not sure if it is a good way, but it give us some inspiration. |
Imo it would make sense to implement the relevant arithmetic, etc... traits for Edit: oh wait that won't work as Decimal has additional fields... Darn... |
I had a play around implementing this and it appears to be very tractable. What I did is as follows:
This appears to work quite well, and I'm fairly happy this will achieve the stated aims of massively simplifying the decimal implementation, making it more consistent with other types such as timestamps, and provide an easy path forward to getting full kernel coverage. There are, however, a couple of major implications of this change:
If people are happy with this path forward, I will polish up this approach over the coming week, but I'm loathe to work on this further if the consensus isn't there. Even better would be if someone is willing to help out with this effort 😄? Thoughts @alamb @liukun4515 @HaoYang670 @kmitchener @viirya? |
Thank you @tustvold -- from my perspective it sounds like a good plan and I would be willing to help update datafusion to use this new approach. That being said, however, I don't feel my vote I should have a huge bearing in the matter as I don't use the decimal type in our projects. I would be very interested to hear @liukun4515 and @viirya 's opinions on the matter |
In order to do unified processing, I suggest that use the [u8;16] to present the i128 like the i256. We can implement the operation like comparation, arithmetic operation based on the u8 array.
Using the The decimal data type is infinite and can't be enumerated. We can't enumerate all decimal type like time stamp with limited time unit. How to determine the This is key point for me.
I think we don't need to consider operation with different scale. If the
I personally still want to implement decimal data type alone and don't combine it with all primitive data type. |
Thank you, @tustvold, for your perseverance on the decimal implementation. Personally, I prefer the idea of implementing our own And, I agree with implementing |
Hi @liukun4515, why do you concern the countability of Decimal? How does it impact the implementation? |
Conceptually, I think DecimalArray should be treated as PrimitiveArray. That being said, it sounds like a good idea to me. In C++ Arrow, Decimal128Array and Decimal256Array both are PrimitiveArray too. So such a change also makes a more consistent Arrow implementation. Another big benefit is, this can simplify decimal kernels and from user perspective there is more consistent APIs including kernels. Overall I'd vote for this direction. As we will use decimal types, kernels, etc. in our project, I will be sure to help on if we decide to move with this direction. So that being said, even we don't go for this change, I have been planning to complement decimal support. The concern from my side is, if any, significant API change. That will be many breaking cases. And seems we need to hold on the effort to add decimal related kernels into this crate (there seems a PR doing that). I think that this is good plan for long term, although in the short term, there will be pain from changing. |
I don't know enough to express an opinion about the implementation details, but I'm happy to see that it seems there's agreement on a path forward toward a better decimal implementation. |
First part of this is in #2781 PTAL I think this also serves to highlight why I don't wish to try to make decimal256 and decimal128 use the same generic implementation for arithmetic, specifically the performance of BigInt is ~25x worse than using fixed width native types such as i128, as it performs allocations and branching. I would instead propose only paying this cost for the rather esoteric decimal256, and leaving the door open to potentially optimise this in future. |
I think this is now complete, there is still work to flesh out support for decimals in the arithmetic, comparison, etc... kernels, but that can be tracked separately |
|
|
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Following #2477 we now have
Decimal128Type
andDecimal256Type
. I think it should be possible to make theseimpl ArrowPrimitiveType
and consequently remove the need forDecimalArray
entirely. Any logic for checked conversion (#2387) could be located and implemented inDecimal
This would provide a quick and easy way to increase the kernel support for decimals, as we would effectively get implementations "for free".
Describe the solution you'd like
I would like to treat decimals the same way we treat other constant width types
Describe alternatives you've considered
We could not do this, it was just an idea that occurred to me and may help to reduce the second-class citizen status that decimals currently have.
Additional context
Was noticed whilst working on #2635
FYI @liukun4515 @HaoYang670
The text was updated successfully, but these errors were encountered: