You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue was found by Coverity. The RleDecoder::NextCounts method has the following code to fetch the repeated literal in repeated runs:
bool result =
bit_reader_.GetAligned<T>(static_cast<int>(BitUtil::CeilDiv(bit_width_, 8)),
reinterpret_cast<T*>(¤t_value_));
Coverity says this:
Pointer "&this->current_value_" points to an object whose effective type is "unsigned long long" (64 bits, unsigned) but is dereferenced as a narrower "unsigned int" (32 bits, unsigned). This may lead to unexpected results depending on machine endianness.
In addition, it's not obvious whether current_value_ also needs byte-swapping (presumably, at least in the Parquet file format, it's supposed to be stored in little-endian format in the RLE bitstream).
Micah Kornfield / @emkornfield:
Hmm, i think we've gone back and forth on Endianness support. I know when the project started I thought it was important because at the time it seemed like Spark was intending to support both (I don't know if it still does).
Are we actually clean in terms of endianness in other places? I would need to investigate further, but it sounds strange to be slicing a long like coverity describes have you looked to see if this is intended?
Are we actually clean in terms of endianness in other places?
Presumably no, because we're reinterpreting array bytes as larger types such as int64_t etc. And we're also serializing those bytes directly to disk or wire.
it sounds strange to be slicing a long like coverity describes have you looked to see if this is intended?
I think it's just a dirty implementation shortcut. Instead of doing:
bool result =
bit_reader_.GetAligned<T>(static_cast<int>(BitUtil::CeilDiv(bit_width_, 8)),
reinterpret_cast<T*>(¤t_value_));
The code could presumably be written as:
T value;
bool result =
bit_reader_.GetAligned<T>(static_cast<int>(BitUtil::CeilDiv(bit_width_, 8)), &value);
current_value_ = static_cast<uint64_t>(value);
T value = 0;
bool result =
bit_reader_.GetAligned<T>(static_cast<int>(BitUtil::CeilDiv(bit_width_, 8)), &value);
current_value_ = static_cast<uint64_t>(value);
This issue was found by Coverity. The
RleDecoder::NextCounts
method has the following code to fetch the repeated literal in repeated runs:Coverity says this:
In addition, it's not obvious whether
current_value_
also needs byte-swapping (presumably, at least in the Parquet file format, it's supposed to be stored in little-endian format in the RLE bitstream).Reporter: Antoine Pitrou / @pitrou
Assignee: Kazuaki Ishizaki / @kiszk
PRs and other links:
Note: This issue was originally created as ARROW-4018. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: