Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPC big endian offsets are not translated #859

Closed
alamb opened this issue Oct 24, 2021 · 2 comments · Fixed by #5301
Closed

IPC big endian offsets are not translated #859

alamb opened this issue Oct 24, 2021 · 2 comments · Fixed by #5301
Labels
bug good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Oct 24, 2021

Describe the bug
../arrow-ipc-stream/integration/1.0.0-bigendian/generated_dictionary.arrow_file contains a UTF8 Arrow array somewhere encoded in big endian.

When this is read in to the arrow-rs implementation, the offsets buffer remains big endian, even though the code assumes the offsets buffer has values in native endianness (e.g. the offsets of the created arrow-rs buffer incorrect on little endian machines like x86)

To Reproduce
See test read_dictionary_be_not_implemented #810

It fails with Length spanned by offsets in Utf8 (687865856) is larger than the values array size (41)

Expected behavior
The test should pass (likely by translating offsets from big endian to native endianness)

Additional context
Found while adding validation in #810

@tustvold
Copy link
Contributor

The reference here suggests it is acceptable for implementations to simply not support files with non-native byte order

At first we will return an error when trying to read a Schema with an endianness that does not match the underlying system. The reference implementation is focused on Little Endian and provides tests for it. Eventually we may provide automatic conversion via byte swapping.

I therefore think an acceptable approach would be to return an error if attempting to read a file with a non-native byte order

@tustvold tustvold added the good first issue Good for newcomers label Apr 16, 2022
@alamb
Copy link
Contributor Author

alamb commented Apr 17, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants