Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11741: [C++] Fix decimal casts on big endian platforms #9554

Closed

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Feb 23, 2021

No description provided.

@pitrou pitrou marked this pull request as ready for review February 23, 2021 10:49
@pitrou pitrou requested a review from bkietz February 23, 2021 10:49
@pitrou
Copy link
Member Author

pitrou commented Feb 23, 2021

cc @kiszk

@pitrou
Copy link
Member Author

pitrou commented Feb 23, 2021

Green Travis-CI build: https://travis-ci.com/github/pitrou/arrow/builds/217951780

@pitrou
Copy link
Member Author

pitrou commented Feb 23, 2021

Another possible fix would be to reorder Decimal fields based on endianness. That would only work for Decimal128, though (Decimal256 is based on a std::array).

@github-actions
Copy link

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes most sense to reorder BasicDecimal128::{low_bits_, high_bits_} based on endianness. Furthermore, BasicDecimal256 should (eventually) follow suit by replacing little_endian_array_ with native_endian_array_.

@pitrou
Copy link
Member Author

pitrou commented Feb 23, 2021

Note that one issue with the reinterpret_cast approach is forcing unaligned loads. There may be some platforms around where that's unallowed (SPARC?).

@bkietz
Copy link
Member

bkietz commented Feb 23, 2021

When would any of these elements not be aligned to a 16 byte boundary?

@pitrou
Copy link
Member Author

pitrou commented Feb 23, 2021

That would be quite unlikely indeed :-)

@bkietz
Copy link
Member

bkietz commented Feb 23, 2021

Isn't it impossible for conformant data?

@pitrou
Copy link
Member Author

pitrou commented Feb 23, 2021

No, data can be backed by a buffer that wasn't allocated by Arrow and that's not aligned naturally. That said, it's certainly extremely unlikely to happen in practice, and I don't think we actually care in other places, so I'm fine with ignoring that possibility.

@pitrou
Copy link
Member Author

pitrou commented Feb 23, 2021

As far as Decimal256 is concerned, though, adapting the code may be non-trivial to conform to a native-endian layout. And if we do it only for Decimal128, we get less orthogonality.

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing else in the compute namespace assumes data may be unaligned, so supporting it only here is suspect to me. We should at least have warnings that buffers are unaligned; even on platforms where unaligned access is not an error it does decrease performance.

I've added https://issues.apache.org/jira/browse/ARROW-11748 to track refactoring Decimal* into native-endian ordering.

In the meantime, let's merge this with a slight modification to make the intent clearer:

cpp/src/arrow/compute/kernels/codegen_internal.h Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/codegen_internal.h Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/codegen_internal.h Outdated Show resolved Hide resolved
@pitrou pitrou force-pushed the ARROW-11741-cast-decimal-big-endian branch from fe7aa32 to d1f16aa Compare February 24, 2021 14:28
@pitrou
Copy link
Member Author

pitrou commented Feb 24, 2021

I applied your suggestions @bkietz , will merge. Thank you for the review!

@pitrou pitrou closed this in 66be4c2 Feb 24, 2021
@pitrou pitrou deleted the ARROW-11741-cast-decimal-big-endian branch February 24, 2021 17:34
@kiszk
Copy link
Member

kiszk commented Feb 24, 2021

Thank you, I will watch ttps://issues.apache.org/jira/browse/ARROW-11748

GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
Closes apache#9554 from pitrou/ARROW-11741-cast-decimal-big-endian

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
Closes apache#9554 from pitrou/ARROW-11741-cast-decimal-big-endian

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants