Describe the bug, including details regarding any error messages, version, and platform.
arrow::Decimal128::FromString silently truncates (wraps mod 2¹²⁸) when the input string has more than 38 significant digits, producing a garbage unscaled value and returning Status::OK. The out-parameter precision does reflect the true parsed precision (> 38), but the Decimal128 value itself is already corrupted, and most callers do not validate precision against Decimal128::kMaxPrecision.
The same issue applies symmetrically in Decimal256::FromString when precision > 76 - both paths share DecimalFromString<T> in cpp/src/arrow/util/decimal.cc.
Why?
DecimalFromString feeds the digit string into ShiftAndAdd, which multiplies-and-adds into a fixed-size uint64_t array sized to the target decimal's bit width:
std::array<uint64_t, Decimal::kBitWidth / 64> little_endian_array{};
ShiftAndAdd(dec.whole_digits, little_endian_array.data(),
little_endian_array.size());
ShiftAndAdd(dec.fractional_digits, little_endian_array.data(),
little_endian_array.size());
ShiftAndAdd carries high bits only through out_size limbs and drops the remaining carry on the floor - no overflow check. DecimalFromString then returns Status::OK() without validating parsed_precision <= kMaxPrecision (and also without validating parsed_scale <= kMaxScale).
Relevant code (paths as of apache/arrow main):
ShiftAndAdd - cpp/src/arrow/util/decimal.cc:747
DecimalFromString - cpp/src/arrow/util/decimal.cc:857
Repro
// arrow_decimal_repro.cc
#include <arrow/util/decimal.h>
#include <iostream>
#include <string>
int main() {
const std::string input =
"1.55555555555555555555555555555555555555555555555555";
arrow::Decimal128 dec;
int32_t precision = 0, scale = 0;
auto st = arrow::Decimal128::FromString(input, &dec, &precision, &scale);
std::cout << "input: " << input << "\n"
<< "status: " << st.ToString() << "\n"
<< "precision: " << precision << "\n"
<< "scale: " << scale << "\n"
<< "dec (raw): " << dec.ToString(0) << "\n";
return 0;
}
Observed output (Arrow main, macOS arm64, clang++ -std=c++17):
input: 1.55555555555555555555555555555555555555555555555555
status: OK
precision: 51
scale: 50
dec (raw): 151473969238762845885664163847966963939
That unscaled value is exactly the input (51-digit integer 155555555555555555555555555555555555555555555555555) reduced mod 2¹²⁸:
>>> 155555555555555555555555555555555555555555555555555 % (1 << 128)
151473969238762845885664163847966963939
Desired (?) behavior
Return Status::Invalid when parsed precision exceeds kMaxPrecision (or parsed scale exceeds kMaxScale) for the target decimal type, matching the error-reporting contract of Decimal::Rescale and friends.
Impact (breaking change?)
Any caller that trusts FromString to fail on invalid input silently produces wrong answers for user-supplied decimal strings longer than 38 significant digits. Downstream, Gandiva's castDECIMAL_utf8 - and any compute kernel that routes VARCHAR-to-DECIMAL through FromString before rescaling via Decimal128::Rescale / gandiva::decimalops::Convert - returns numerically wrong results for perfectly legal SQL casts such as:
CAST('1.55555555555555555555555555555555555555555555555555' AS DECIMAL(38,37))
-- expected: 1.5555555555555555555555555555555555556 (HALF_UP)
-- observed: 0.0000000000015147396923876284588566416
Component(s)
C++, Gandiva
Describe the bug, including details regarding any error messages, version, and platform.
arrow::Decimal128::FromStringsilently truncates (wraps mod 2¹²⁸) when the input string has more than 38 significant digits, producing a garbage unscaled value and returningStatus::OK. The out-parameterprecisiondoes reflect the true parsed precision (> 38), but theDecimal128value itself is already corrupted, and most callers do not validateprecisionagainstDecimal128::kMaxPrecision.The same issue applies symmetrically in
Decimal256::FromStringwhen precision > 76 - both paths shareDecimalFromString<T>incpp/src/arrow/util/decimal.cc.Why?
DecimalFromStringfeeds the digit string intoShiftAndAdd, which multiplies-and-adds into a fixed-sizeuint64_tarray sized to the target decimal's bit width:ShiftAndAddcarries high bits only throughout_sizelimbs and drops the remaining carry on the floor - no overflow check.DecimalFromStringthen returnsStatus::OK()without validatingparsed_precision <= kMaxPrecision(and also without validatingparsed_scale <= kMaxScale).Relevant code (paths as of apache/arrow
main):ShiftAndAdd-cpp/src/arrow/util/decimal.cc:747DecimalFromString-cpp/src/arrow/util/decimal.cc:857Repro
Observed output (Arrow
main, macOS arm64, clang++ -std=c++17):That unscaled value is exactly the input (51-digit integer
155555555555555555555555555555555555555555555555555) reduced mod 2¹²⁸:Desired (?) behavior
Return
Status::Invalidwhen parsed precision exceedskMaxPrecision(or parsed scale exceedskMaxScale) for the target decimal type, matching the error-reporting contract ofDecimal::Rescaleand friends.Impact (breaking change?)
Any caller that trusts
FromStringto fail on invalid input silently produces wrong answers for user-supplied decimal strings longer than 38 significant digits. Downstream, Gandiva'scastDECIMAL_utf8- and any compute kernel that routes VARCHAR-to-DECIMAL throughFromStringbefore rescaling viaDecimal128::Rescale/gandiva::decimalops::Convert- returns numerically wrong results for perfectly legal SQL casts such as:Component(s)
C++, Gandiva