Implement check for sticky bit to fix rounding for all types and implement for `decimal32_t` #1030

mborland · 2025-09-04T12:46:17Z

Closes: #1027

morinmorin · 2025-09-04T13:25:11Z

That was quick!

Fails with:

cout << boost::decimal::decimal32_t(8400000U, -102) << endl; // => 8.4e-96 (actual: 0)

~~Maybe off-by-one error?~~

morinmorin · 2025-09-04T13:38:36Z

I ran a subset of Intel's testcases for decimal32. Dozens of them are failing, with expected results being small non-zero numbers.
For example,

add(-0, 8.4e-100) =
(expect) 8.4e-100
(actual) 8e-100

mul(1, 1e-101) =
(expect) 1e-101
(actual) 0

div(1e-101, 1) =
(expect) 1e-101
(actual) 0

div(5.24289e-96, 1) =
(expect) 5.24289e-96
(actual) 5.2429e-96

morinmorin · 2025-09-04T14:12:32Z

Diff for decimal32_t.hpp

@@ -626,11 +626,10 @@ constexpr decimal32_t::decimal32_t(T1 coeff_init, T2 exp, bool sign) noexcept //
     // If the coeff is not in range, make it so
     // Only count the number of digits if we absolutely have to
     int coeff_digits {-1};
-    if (coeff > detail::d32_max_significand_value)
+    if (coeff > detail::d32_max_significand_value || (exp + detail::bias < 0))
     {
         bool sticky_bit {detail::find_sticky_bit(coeff, exp, detail::bias_v<decimal32_t>)};
 
-        if (!sticky_bit)
         {
             coeff_digits = detail::num_digits(coeff);
             if (coeff_digits > detail::precision + 1)
@@ -660,12 +659,6 @@ constexpr decimal32_t::decimal32_t(T1 coeff_init, T2 exp, bool sign) noexcept //
                 exp += static_cast<T2>(detail::fenv_round(coeff, sign, sticky_bit));
             }
         }
-        else
-        {
-            // This should already be handled in find_sticky_bit
-            BOOST_DECIMAL_ASSERT((coeff >= 1'000'000U && coeff <= 9'999'999U) || coeff == 0U);
-            exp += static_cast<T2>(detail::fenv_round(coeff, sign, sticky_bit));
-        }
     }
 
     auto reduced_coeff {static_cast<significand_type>(coeff)};

Diff for fenv_rounding.hpp

@@ -147,24 +147,17 @@ BOOST_DECIMAL_FORCE_INLINE constexpr auto find_sticky_bit(T1& coeff, T2& exp, co
 
     if (biased_exp < 0)
     {
-        const auto shift {static_cast<unsigned>(-biased_exp)};
+        const auto shift {static_cast<unsigned>(-biased_exp) - 1};
         const auto shift_p10 {detail::pow10<T1>(shift)};
-        const auto shift_p10_min_1 {shift_p10 / 10U};
 
         // TODO(mborland): in the uint128_t case we should expose a div_mod since the mod is already available
         const auto q {coeff / shift_p10};
         const auto rem {coeff % shift_p10};
 
-        auto guard_digits {rem / shift_p10_min_1};
-        sticky = (rem % shift_p10) != 0U;
+        sticky = (rem != 0U);
 
         coeff = q;
         exp += static_cast<int>(shift);
-
-        if (guard_digits > 5U || (guard_digits == 5U && (sticky || (static_cast<std::uint64_t>(coeff) & 1U))))
-        {
-            ++coeff;
-        }
     }
 
     return sticky;

morinmorin · 2025-09-04T14:32:46Z

I ran a subset of Intel's testcases for decimal32. Dozens of them are failing, with expected results being small non-zero numbers.

After applying the diff, only one test failure remains:

div(1, 5.24289e-96) =
(expect) 1.907345e+95
(actual) 1.907344e+95

I haven't looked into the implementation of operator/, but I suspect it might be the cause.

mborland · 2025-09-04T14:43:02Z

1.907345e+95

This case the significands being divided yields 10000000000000 / 5242890 = 1907344 which is then passed with exp = 89 to the constructor

mborland · 2025-09-04T14:56:11Z

Since the maximum significand being divided is 10^14 which is less than 2^51 this could be done with doubles and then casted down to integer with proper rounding.

morinmorin · 2025-09-04T16:34:24Z

Thanks, confirmed that the fix (for decimal32) works fine.

Since the maximum significand being divided is 10^14 which is less than 2^51 this could be done with doubles and then casted down to integer with proper rounding.

Right. Or, in this specific testcase, changing detail::pow10(static_cast<std::uint64_t>(detail::precision)) (in div_impl.hpp) to detail::pow10(static_cast<std::uint64_t>(detail::precision)+1u)}; would resolve the failure.

About the if (!BOOST_DECIMAL_IS_CONSTANT_EVALUATED(coeff)) block in find_sticky_bit: find_sticky_bit is responsible for computing sticky AND shifting coeff. So even if sticky doesn't need to be computed, shifting still needs to be done. There are two approaches to fix it.

remove this block entirely; or
add code for updating coeff and exp inside the inner if block (i.e. if (BOOST_DECIMAL_UNLIKELY(…)).

mborland · 2025-09-04T16:49:20Z

Thanks, confirmed that the fix (for decimal32) works fine.

Since the maximum significand being divided is 10^14 which is less than 2^51 this could be done with doubles and then casted down to integer with proper rounding.

Right. Or, in this specific testcase, changing detail::pow10(static_cast<std::uint64_t>(detail::precision)) (in div_impl.hpp) to detail::pow10(static_cast<std::uint64_t>(detail::precision)+1u)}; would resolve the failure.

About the if (!BOOST_DECIMAL_IS_CONSTANT_EVALUATED(coeff)) block in find_sticky_bit: find_sticky_bit is responsible for computing sticky AND shifting coeff. So even if sticky doesn't need to be computed, shifting still needs to be done. There are two approaches to fix it.

remove this block entirely; or

add code for updating coeff and exp inside the inner if block (i.e. if (BOOST_DECIMAL_UNLIKELY(…)).

This is already handled in the constructor: https://github.com/cppalliance/decimal/blob/develop/include/boost/decimal/decimal32_t.hpp#L616. If there's no need to find the sticky bit we can divide and round like normal. The method of finding the sticky bit is slower than just dividing by a power of 10 looked up from a table. We could even remove the modulo added in this PR in that case.

morinmorin · 2025-09-05T07:08:17Z

Without removing if (!BOOST_DECIMAL_IS_CONSTANT_EVALUATED(coeff)) block, this fails:

fesetround(rounding_mode::fe_dec_to_nearest_from_zero);
cout << decimal32_t(100000, -105) << endl; // => 1e-100 (actual: 0)

But anyway, this is a simpler implementation of shifting and rounding (which doesn't need find_sticky_bit):

int coeff_digits {-1};
auto biased_exp {static_cast<int>(exp + detail::bias)};
if (coeff > detail::d32_max_significand_value || biased_exp < 0) {
    coeff_digits = detail::num_digits(coeff);

    // How many digits need to be shifted?
    auto shift_for_small_exp = (-biased_exp) - 1;
    auto shift_for_large_coeff = (coeff_digits - detail::precision) - 1;
    auto shift = std::max(shift_for_small_exp, shift_for_large_coeff);

    // Do shifting
    auto [shifted_coeff, trailing_digits] = div_mod_pow10(coeff, shift);
    coeff = shifted_coeff;
    auto sticky {trailing_digits != 0u};
    exp += shift;
    biased_exp += shift;
    coeff_digits -= shift;

    // Do rounding
    auto removed_digits = detail::fenv_round(coeff, sign, sticky);
    exp += removed_digits;
    biased_exp += removed_digits;
    coeff_digits -= removed_digits;
}

In this code, two shifts are merged into one. To further improve efficiency, the code uses division by constant (which is done by div_mod_pow10(x, k)), since that's optimized well by compilers. Function div_mod_pow10(x, k) computes x / (10**k) as x / CompileTimeConstant. Here is a draft code (this is a manual coding, but it can be avoided using meta-programming):

template<typename Uint, unsigned K>
consteval Uint pow10()
{
    Uint x = 1;
    for (unsigned i = 0; i < K; ++i) x *= 10;
    return x;
}
constexpr std::pair<std::uint32_t, std::uint32_t> div_mod_pow10(std::uint32_t x, unsigned k)
{
    using T = std::uint32_t;
    switch(k) {
        case (0): return {x / pow10<T, 0u>(), x % pow10<T, 0u>()};
        case (1): return {x / pow10<T, 1u>(), x % pow10<T, 1u>()};
        case (2): return {x / pow10<T, 2u>(), x % pow10<T, 2u>()};
        case (3): return {x / pow10<T, 3u>(), x % pow10<T, 3u>()};
        case (4): return {x / pow10<T, 4u>(), x % pow10<T, 4u>()};
        case (5): return {x / pow10<T, 5u>(), x % pow10<T, 5u>()};
        case (6): return {x / pow10<T, 6u>(), x % pow10<T, 6u>()};
        case (7): return {x / pow10<T, 7u>(), x % pow10<T, 7u>()};
        case (8): return {x / pow10<T, 8u>(), x % pow10<T, 8u>()};
        case (9): return {x / pow10<T, 9u>(), x % pow10<T, 9u>()};
        default: return {0, x};
    }
}

Another option is to provide only div_pow10 and to compute the remainder as x - quotient * (10**k).

mborland · 2025-09-05T10:05:02Z

Without removing if (!BOOST_DECIMAL_IS_CONSTANT_EVALUATED(coeff)) block, this fails:

fesetround(rounding_mode::fe_dec_to_nearest_from_zero);
cout << decimal32_t(100000, -105) << endl; // => 1e-100 (actual: 0)

But anyway, this is a simpler implementation of shifting and rounding (which doesn't need find_sticky_bit):

int coeff_digits {-1};
auto biased_exp {static_cast<int>(exp + detail::bias)};
if (coeff > detail::d32_max_significand_value || biased_exp < 0) {
    coeff_digits = detail::num_digits(coeff);

    // How many digits need to be shifted?
    auto shift_for_small_exp = (-biased_exp) - 1;
    auto shift_for_large_coeff = (coeff_digits - detail::precision) - 1;
    auto shift = std::max(shift_for_small_exp, shift_for_large_coeff);

    // Do shifting
    auto [shifted_coeff, trailing_digits] = div_mod_pow10(coeff, shift);
    coeff = shifted_coeff;
    auto sticky {trailing_digits != 0u};
    exp += shift;
    biased_exp += shift;
    coeff_digits -= shift;

    // Do rounding
    auto removed_digits = detail::fenv_round(coeff, sign, sticky);
    exp += removed_digits;
    biased_exp += removed_digits;
    coeff_digits -= removed_digits;
}

In this code, two shifts are merged into one. To further improve efficiency, the code uses division by constant (which is done by div_mod_pow10(x, k)), since that's optimized well by compilers. Function div_mod_pow10(x, k) computes x / (10**k) as x / CompileTimeConstant. Here is a draft code (this is a manual coding, but it can be avoided using meta-programming):

template<typename Uint, unsigned K>
consteval Uint pow10()
{
    Uint x = 1;
    for (unsigned i = 0; i < K; ++i) x *= 10;
    return x;
}
constexpr std::pair<std::uint32_t, std::uint32_t> div_mod_pow10(std::uint32_t x, unsigned k)
{
    using T = std::uint32_t;
    switch(k) {
        case (0): return {x / pow10<T, 0u>(), x % pow10<T, 0u>()};
        case (1): return {x / pow10<T, 1u>(), x % pow10<T, 1u>()};
        case (2): return {x / pow10<T, 2u>(), x % pow10<T, 2u>()};
        case (3): return {x / pow10<T, 3u>(), x % pow10<T, 3u>()};
        case (4): return {x / pow10<T, 4u>(), x % pow10<T, 4u>()};
        case (5): return {x / pow10<T, 5u>(), x % pow10<T, 5u>()};
        case (6): return {x / pow10<T, 6u>(), x % pow10<T, 6u>()};
        case (7): return {x / pow10<T, 7u>(), x % pow10<T, 7u>()};
        case (8): return {x / pow10<T, 8u>(), x % pow10<T, 8u>()};
        case (9): return {x / pow10<T, 9u>(), x % pow10<T, 9u>()};
        default: return {0, x};
    }
}

Another option is to provide only div_pow10 and to compute the remainder as x - quotient * (10**k).

Ok. I've implemented something similar using functions we already have and C++14. I think a division followed by modulo is fine in this case since the compiler can optimize those for builtin-types. Once we get to decimal128_t some more complex handling will need to occur, probably in the backed since it has a non-zero chance of being an emulated 128-bit type (e.g. Windows).

morinmorin · 2025-09-05T13:24:11Z

Thanks, the failures in the subset of Intel's testcases are now resolved!

After adding more Intel's testcases (specifically the tests with non-to_nearest rounding modes), I found some failures. The failing testcase is essentially this:

boost::decimal::fesetround(boost::decimal::rounding_mode::fe_dec_downward);
cout << "5e+50"_DF - "4e+40"_DF << endl;
// (expect) 4.999999e+50
// (actual) 5e+50

boost::decimal::fesetround(boost::decimal::rounding_mode::fe_dec_upward);
cout << "5e+50"_DF + "4e+40"_DF << endl;
// (expect) 5.000001e+50
// (actual) 5e+50

I'm wondering if a "sticky bit"-like flag is needed in addition and subtraction.

mborland · 2025-09-05T13:32:55Z

Thanks, the failures in the subset of Intel's testcases are now resolved!

After adding more Intel's testcases (specifically the tests with non-to_nearest rounding modes), I found some failures. The failing testcase is essentially this:
boost::decimal::fesetround(boost::decimal::rounding_mode::fe_dec_downward);
cout << "5e+50"_DF - "4e+40"_DF << endl;
// (expect) 4.999999e+50
// (actual) 5e+50

boost::decimal::fesetround(boost::decimal::rounding_mode::fe_dec_upward);
cout << "5e+50"_DF + "4e+40"_DF << endl;
// (expect) 5.000001e+50
// (actual) 5e+50
I'm wondering if a "sticky bit"-like flag is needed in addition and subtraction.

I believe these are in the same boat as division where additional offset can easily be applied internally. Division was extending the lhs argument to 14 integer digits, but since the type is std::uint64_t it really costs nothing to extend it to std::numeric_limits<std::uint64_t>::digits10. Same kind of shift based alignment is occurring in both add and sub.

mborland · 2025-09-05T13:49:28Z

I'm going to move the handling of that to: #1035. I think we've already exceeded the scope of what should be in here (I made changes to div that'll need to be ported to the other types). Mainly this is pretty modular so spreading across the library should be pretty easy.

codecov · 2025-09-05T16:06:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.4%. Comparing base (63d3cfb) to head (06f26ee).
⚠️ Report is 36 commits behind head on develop.

Additional details and impacted files

@@            Coverage Diff            @@
##           develop   #1030     +/-   ##
=========================================
- Coverage     98.5%   98.4%   -0.0%     
=========================================
  Files          247     248      +1     
  Lines        16459   16491     +32     
  Branches      1857    1860      +3     
=========================================
+ Hits         16201   16221     +20     
- Misses         258     270     +12

Files with missing lines	Coverage Δ
include/boost/decimal/decimal32_t.hpp	`97.8% <100.0%> (-<0.1%)`	⬇️
include/boost/decimal/detail/div_impl.hpp	`100.0% <100.0%> (ø)`
include/boost/decimal/detail/fenv_rounding.hpp	`100.0% <100.0%> (ø)`
...ude/boost/decimal/detail/remove_trailing_zeros.hpp	`100.0% <ø> (ø)`
test/github_issue_1026.cpp	`100.0% <100.0%> (ø)`
test/random_decimal32_fast_math.cpp	`100.0% <100.0%> (ø)`
test/random_decimal32_math.cpp	`100.0% <ø> (ø)`
test/test_cosh.cpp	`100.0% <100.0%> (ø)`
test/test_expm1.cpp	`100.0% <100.0%> (ø)`
test/test_fenv.cpp	`100.0% <100.0%> (ø)`
... and 3 more

... and 4 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 63d3cfb...06f26ee. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mborland added this to the v6.0.0 milestone Sep 4, 2025

mborland self-assigned this Sep 4, 2025

mborland added the Bug Something isn't working label Sep 4, 2025

mborland mentioned this pull request Sep 4, 2025

[fenv_rounding.hpp] Rounding under roundTiesToEven needs sticky bit #1026

Closed

mborland added 14 commits September 5, 2025 15:43

Add test set from issue

b747b35

Add diffs from issue

26a7f75

Reduce constexpr code duplication

1870223

Move finding sticky bit to a function for reuse

d495292

Apply sticky bit to downward and upward rounding modes

af32aa0

Use 64 bit bitwise op

e5f7d1e

Significantly reduce branches

0363f70

Avoid overflow

0aea6d9

Skip another modulo step

5450338

Fix digit counting

e6d76fb

Collected test fixes

60202cf

Add TODO

e057198

Convert coeff to unsigned type

c84f4d7

Add fast path when we know we have a sticky bit

12146ed

mborland added 21 commits September 5, 2025 15:43

Refactor name

f783344

Add unsigned __int128 trailing zero removal function

70fc611

Add test of addition for near subnormal

ed3d6e2

Add testing at the denorms

9092fbc

Fix handling of denorms that can be represented

ae61fb6

Fix construction of subnormals

69db8cb

Apply diffs

1481375

Fix sign compare error

bf4e32a

Add additional test cases from issue

e3bdd58

Fix another sign compare error

bc5973d

Increase number of fractional digits retained from division

8481b07

Add test case

8146c65

Normalize sign handling

23421c0

Reduce complexity

3ae5be6

Delete now unused function

e2ed30f

Add additional test from discussion

2737e00

Move rounding method into its own function for easier resuse

0428801

Add bounds check resulting in a coefficient being shifted to 0

67342d6

Slightly loosen tol

0bd7dcd

Disable rounding mode dependent tests on platforms without consteval

89c91ae

Fix GCC conversion warnings

06f26ee

mborland force-pushed the 1026 branch from 8f13cb3 to 06f26ee Compare September 5, 2025 13:43

mborland merged commit 99625e8 into develop Sep 5, 2025
78 checks passed

mborland deleted the 1026 branch September 5, 2025 16:35

Implement check for sticky bit to fix rounding for all types and implement for decimal32_t #1030

Implement check for sticky bit to fix rounding for all types and implement for decimal32_t #1030

Uh oh!

Conversation

mborland commented Sep 4, 2025

Uh oh!

morinmorin commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

morinmorin commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

morinmorin commented Sep 4, 2025

Uh oh!

morinmorin commented Sep 4, 2025

Uh oh!

mborland commented Sep 4, 2025

Uh oh!

mborland commented Sep 4, 2025

Uh oh!

morinmorin commented Sep 4, 2025

Uh oh!

mborland commented Sep 4, 2025

Uh oh!

morinmorin commented Sep 5, 2025

Uh oh!

mborland commented Sep 5, 2025

Uh oh!

morinmorin commented Sep 5, 2025

Uh oh!

mborland commented Sep 5, 2025

Uh oh!

mborland commented Sep 5, 2025

Uh oh!

codecov bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Implement check for sticky bit to fix rounding for all types and implement for `decimal32_t` #1030

Implement check for sticky bit to fix rounding for all types and implement for `decimal32_t` #1030

morinmorin commented Sep 4, 2025 •

edited

Loading

morinmorin commented Sep 4, 2025 •

edited

Loading

codecov bot commented Sep 5, 2025 •

edited

Loading