-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change max_digits10 calculation to more accurate method. #249
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More complicated, but much cooler. (@jzmaddock never fails to rise to a challenge!)
(I presume there will need to be/should be a change in some Boost.Multiprecision test of the results of this?)
#ifndef BOOST_NO_CXX11_CONSTEXPR | ||
static constexpr unsigned constexpr_ceil(double d) | ||
{ | ||
return static_cast<unsigned>(d) == d ? static_cast<unsigned>(d) : static_cast<unsigned>(d) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
about that + 1
at the end. I am now doing some measurements for the value of how many extra digits are necessary to properly convert back and forth and the results are quite surprising. The error consistently gets down. I thought that 3 or 4 is useful maximum. But so far, for example when calculating sin(…)
in long double
precision (BTW: I am testing all the precision types) I got maximum number of incorrect decimal digits (calculated as log10 of error with respect to ULP (unit of least precision - the last binary digit)):
- for
1
extra string decimal digit the error is 3.1 decimal digits - for
2
error is 2.4 decimal digits - for
3
error is 0.9 decimal digits - for
4
error is 0.3 decimal digits - for
5
error is 0 decimal digits
(these are preliminary results, since I'm searching for maximum error, it will only get larger)
EDIT: these measurements aren't finished yet. I am waiting for the calculations to complete. This will take a few days. Then I will add a section to yade documentation about that.
EDIT2: In these calculations I am performing the same calculation for target type (eg. long double
and compare the results with MPFR which has over 200 digits more).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume these are the additional digits required to get sin(x) correct to the last bit?
Which is somewhat different to what we have here: "how many decimal places do I need to print to be able to round-trip back to the binary" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. In principle I wanted to test the mathematical functions - to make sure that my mapping finds the C++ code for implementation with correct precision. And that's how I found boostorg/math#307
Afterwards I realized how much the error depends on extraStringDigits10
and this forced me to make it configurable in runtime.
So yeah, it's a bit more than roundtripping. I am doing here (for all math functions):
xᵇᵃˢᵉ = random(-4π,4π) // other functions have a different range
vᵇᵃˢᵉ = sinᵇᵃˢᵉ(xᵇᵃˢᵉ) // where ᵇᵃˢᵉ means that it's using the tested type, eg. long double
vᵐᵖᶠʳ = sinᵐᵖᶠʳ(binaryᵐᵖᶠʳ(string(xᵇᵃˢᵉ)))
errorᵐᵖᶠʳ = |(binaryᵐᵖᶠʳ(string(vᵇᵃˢᵉ)) - vᵐᵖᶠʳ)/vᵐᵖᶠʳ|
then I express error relative to ULP.
Yeah, I think that it would be enough to test only the round tripping part. However I am just curious how well the particular implementations of math functions perform, and I wanted to check that :)
Which is somewhat different to what we have here: "how many decimal places do I need to print to be able to round-trip back to the binary" ?
I will add a round tripping only test to the list of my tests :)
@@ -2386,7 +2387,7 @@ class numeric_limits<boost::multiprecision::number<boost::multiprecision::mpfr_f | |||
BOOST_STATIC_CONSTEXPR int digits = static_cast<int>((Digits10 * 1000L) / 301L + ((Digits10 * 1000L) % 301 ? 2 : 1)); | |||
BOOST_STATIC_CONSTEXPR int digits10 = Digits10; | |||
// Is this really correct??? | |||
BOOST_STATIC_CONSTEXPR int max_digits10 = Digits10 + 3; | |||
BOOST_STATIC_CONSTEXPR int max_digits10 = boost::multiprecision::detail::calc_max_digits10<digits>::value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, in my previous comment I was talking about this + 3
in fact, and the +1
in max_digits_10
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, that comment "// Is this really correct???
" in this place is quite accurate :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL, yes I know!
I wish I could find a definitive reference for this somewhere, but it eludes me :(
https://www.exploringbinary.com/number-of-digits-required-for-round-trip-conversions/ is helpful. And I believe there are later papers too but can't find at present. I fear that there is no absolutely correct answer to this so sorry if this is a wild goose chase. I just wanted the same max_digits10 for float128 and cpp_bin_float_quad |
yeah. I know that I hijacked this thread a little. You were touching the lines of code which are very relevant to what I am doing right now ;)
The table in this paper indicates that 3 digits is enough, my conclusions so far are more pessimistic :( |
https://github.com/mdickinson/float-proofs/blob/master/misc/base_conversion_notes.pdf Does brute force for all binary sizes extrapolating inequality used in (above) example 8 provide a way for a confirmation of which formula is correct for round-tripping (the bottom line test)? |
Thank you, this article is a good find. I am writing a brute force right now :) We will see pretty soon. |
Add minimal test case.
I realized that my tests which involved MPFR roundtripping have higher fidelity requirements than a basic binary ↔ string roundtripping. They require that the MPFR constructed from a string had all the "extra" bits to be set to zero. Otherwise going through the function would sometimes find a "spot" where the lower bits had significant impact on the function value (an unlikely example: the Simply put I needed to recognize the distinction that when I want to create MPFR_600 (later tests I was doing with 600) I need a lot more digits to make sure that all the lower-significant bits are created with zeros. Otherwise these lower bits reflect the "cut" decimal number it received. So that wasn't the test which we needed here for basic roundtripping. Now, getting back to topic. I did the brute force search by roundtripping a random number x∈[0,1) and 1/x∈[1,∞) 1e7 times. I did the search for two kinds: roundtrip between the same type. And roundtrip to MPFR_600 (this roundtrip which required that all lower significant bits are set to zero). Here are the results:
The first two rows in the table reflect results from roundtripping, and creating MPFR_600 with all extra bits set to zeros. Third column says how many bits are there. The last column "length of 2^-N" is the number of decimal digits after the point:
You can look up details in the commit with the C++ function, and the pipeline which calculates this stuff for each type. I am now quite confident that using extraStringDigits10==4 is always enough for the purposes of this thread. You could squeeze this into 3 and I would complain just a little ;) My findings also more or less agree with the table which you have linked before. |
// never be exactly an integer so we can replace by trunc(log10(2) * d) + 2 | ||
// and avoid the call to ceil: | ||
// | ||
return static_cast<unsigned>(0.301029995663981195213738894724493026768189881462108541310 * d) + 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you force compiler to use all these digits in this log10(2) value? I had problems forcing constexpr to work with double
alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, my bad, those extra digits are wasted - it's a double precision constant. BTW I tried a reasonably exhaustive search comparing double precision calculation to 200-digit calculation and couldn't find a bit count value where they disagreed. float precision is a different matter though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think you could leave these wasted digits. Maybe someday the compilers will start picking them up :)
Since that's a double constant, with 15 (up to 17 :) digits, it means that this calculation will be incorrect for binary digits>1e15. That corresponds to 301029995663981 decimal digits. I didn't use MPFR_301029995663981 yet.
float precision is a different matter though.
This calculation agrees for float
, it has 24 bits and 6 digits10. And:
trunc(log10(2)*24)+2 == 9
, just like in that table and on page 46 of IEEE standard.
BTW I tried a reasonably exhaustive search comparing double precision calculation to 200-digit calculation and couldn't find a bit count value where they disagreed.
oh. you meant a kind of calculation which I talked a lot about in the earlier post? I had plenty of examples where they disagree. But now that I increased extraStringDigits10, I have a hard time finding them again :) Today is too late. I will check tomorrow. Thank you for trying this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That corresponds to 301029995663981 decimal digits
... provided that the unsigned
type is large enough :)
About these errors: maybe they way in which I calculate the relative error is to blame. I will check this tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know a clean method of checking a particular bit value in (any) floating point significand?
Comparing bits would be much better, I suspect that my calculation of relative error has its own numerical inaccuracy problems :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boost::math::float_distance ?
Only for two types the same though. For two different types, I would round the high precision type to the same number of bits as the low precision type so the comparison is against a "correctly rounded" value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a way with frexp
function:
#include <boost/math/constants/constants.hpp>
#include <boost/multiprecision/float128.hpp>
#include <boost/multiprecision/mpfr.hpp>
#include <iostream>
#include <limits>
#include <vector>
template <typename T> int sign(T val) { return (T(0) < val) - (val < T(0)); }
class DecomposedReal {
private:
int sig;
int exp;
std::vector<unsigned char> bits;
public:
template <typename Rr> DecomposedReal(Rr x)
{
int ex = 0;
Rr norm = frexp(abs(x), &ex);
sig = sign(x);
exp = ex - 1;
ex = 0;
int pos = 0;
bits.resize(std::numeric_limits<Rr>::digits, 0);
while (norm != 0) {
pos -= ex;
assert((ex <= 0) and (pos < int(bits.size())) and (pos >= 0));
bits[pos] = 1;
norm -= static_cast<Rr>(0.5);
norm = frexp(norm, &ex);
};
}
template <typename Rr> Rr rebuild()
{
Rr ret = 0;
int i = 0;
for (auto c : bits) {
if (c != 0) {
ret += pow(static_cast<Rr>(2), static_cast<Rr>(exp - i));
}
++i;
}
return ret * static_cast<Rr>(sig);
}
template <typename Rr = double> void print()
{
std::cout << "sign : " << sig << "\n";
std::cout << "exp : " << exp << "\n";
std::cout << "bits : ";
for (auto c : bits)
std::cout << int(c);
std::cout << "\nreconstructed number: " << rebuild<Rr>() << "\n\n";
}
};
template <typename Rr> void test(const Rr& arg)
{
std::cout << std::setprecision(std::numeric_limits<Rr>::digits10 + 3);
std::cout << "original number = " << arg << "\n";
DecomposedReal d(arg);
d.print<Rr>();
assert(arg == d.rebuild<Rr>());
};
int main()
{
test<float>(-1); // 2^0
test<double>(0); // 0
test<long double>(-4); // -2^2
test<boost::multiprecision::float128>(0.375); // 2^-2+2^-3
test<boost::multiprecision::mpfr_float_50>(1036); // 2^3+2^2+2^10
test<boost::multiprecision::mpfr_float_100>(boost::math::constants::pi<boost::multiprecision::mpfr_float_100>());
}
gives output:
original number = -1
sign : -1
exp : 0
bits : 100000000000000000000000
reconstructed number: -1
original number = 0
sign : 0
exp : -1
bits : 00000000000000000000000000000000000000000000000000000
reconstructed number: 0
original number = -4
sign : -1
exp : 2
bits : 1000000000000000000000000000000000000000000000000000000000000000
reconstructed number: -4
original number = 0.375
sign : 1
exp : -2
bits : 11000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
reconstructed number: 0.375
original number = 1036
sign : 1
exp : 10
bits : 100000011000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
reconstructed number: 1036
original number = 3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067951
sign : 1
exp : 1
bits : 1100100100001111110110101010001000100001011010001100001000110100110001001100011001100010100010111000000011011100000111001101000100101001000000100100111000001000100010100110011111001100011101000000001000001011101111101010011000111011000100111001101100100010010100010100101000001000011110011000111000110100000001001101110111101111100101
reconstructed number: 3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067951
I suppose that now I can do some bit comparison stuff :)
I'm not sure if this link will work for long, but I found the "IEEE Standard for Floating-Point Arithmetic 2019", there on pages 44-46 you have discussion about this conversion. Looks like your formula EDIT: I open that site with javascript disabled. In fact I have JS disabled almost everywhere, except github & gitlab ;) |
Well that's a relief then ;) |
No description provided.