Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change max_digits10 calculation to more accurate method. #249

Merged
merged 5 commits into from
Jun 15, 2020

Conversation

jzmaddock
Copy link
Collaborator

No description provided.

Copy link
Contributor

@pabristow pabristow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More complicated, but much cooler. (@jzmaddock never fails to rise to a challenge!)

(I presume there will need to be/should be a change in some Boost.Multiprecision test of the results of this?)

#ifndef BOOST_NO_CXX11_CONSTEXPR
static constexpr unsigned constexpr_ceil(double d)
{
return static_cast<unsigned>(d) == d ? static_cast<unsigned>(d) : static_cast<unsigned>(d) + 1;
Copy link

@cosurgi cosurgi Jun 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about that + 1 at the end. I am now doing some measurements for the value of how many extra digits are necessary to properly convert back and forth and the results are quite surprising. The error consistently gets down. I thought that 3 or 4 is useful maximum. But so far, for example when calculating sin(…) in long double precision (BTW: I am testing all the precision types) I got maximum number of incorrect decimal digits (calculated as log10 of error with respect to ULP (unit of least precision - the last binary digit)):

  • for 1 extra string decimal digit the error is 3.1 decimal digits
  • for 2 error is 2.4 decimal digits
  • for 3 error is 0.9 decimal digits
  • for 4 error is 0.3 decimal digits
  • for 5 error is 0 decimal digits

(these are preliminary results, since I'm searching for maximum error, it will only get larger)

EDIT: these measurements aren't finished yet. I am waiting for the calculations to complete. This will take a few days. Then I will add a section to yade documentation about that.

EDIT2: In these calculations I am performing the same calculation for target type (eg. long double and compare the results with MPFR which has over 200 digits more).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume these are the additional digits required to get sin(x) correct to the last bit?

Which is somewhat different to what we have here: "how many decimal places do I need to print to be able to round-trip back to the binary" ?

Copy link

@cosurgi cosurgi Jun 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. In principle I wanted to test the mathematical functions - to make sure that my mapping finds the C++ code for implementation with correct precision. And that's how I found boostorg/math#307

Afterwards I realized how much the error depends on extraStringDigits10 and this forced me to make it configurable in runtime.

So yeah, it's a bit more than roundtripping. I am doing here (for all math functions):

xᵇᵃˢᵉ = random(-4π,4π) // other functions have a different range
vᵇᵃˢᵉ = sinᵇᵃˢᵉ(xᵇᵃˢᵉ) // where ᵇᵃˢᵉ means that it's using the tested type, eg. long double
vᵐᵖᶠʳ = sinᵐᵖᶠʳ(binaryᵐᵖᶠʳ(string(xᵇᵃˢᵉ)))
errorᵐᵖᶠʳ = |(binaryᵐᵖᶠʳ(string(vᵇᵃˢᵉ)) - vᵐᵖᶠʳ)/vᵐᵖᶠʳ|
then I express error relative to ULP.

Yeah, I think that it would be enough to test only the round tripping part. However I am just curious how well the particular implementations of math functions perform, and I wanted to check that :)

Which is somewhat different to what we have here: "how many decimal places do I need to print to be able to round-trip back to the binary" ?

I will add a round tripping only test to the list of my tests :)

@@ -2386,7 +2387,7 @@ class numeric_limits<boost::multiprecision::number<boost::multiprecision::mpfr_f
BOOST_STATIC_CONSTEXPR int digits = static_cast<int>((Digits10 * 1000L) / 301L + ((Digits10 * 1000L) % 301 ? 2 : 1));
BOOST_STATIC_CONSTEXPR int digits10 = Digits10;
// Is this really correct???
BOOST_STATIC_CONSTEXPR int max_digits10 = Digits10 + 3;
BOOST_STATIC_CONSTEXPR int max_digits10 = boost::multiprecision::detail::calc_max_digits10<digits>::value;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, in my previous comment I was talking about this + 3 in fact, and the +1 in max_digits_10 :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, that comment "// Is this really correct???" in this place is quite accurate :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL, yes I know!

I wish I could find a definitive reference for this somewhere, but it eludes me :(

@pabristow
Copy link
Contributor

https://www.exploringbinary.com/number-of-digits-required-for-round-trip-conversions/ is helpful.

And I believe there are later papers too but can't find at present.

I fear that there is no absolutely correct answer to this so sorry if this is a wild goose chase.

I just wanted the same max_digits10 for float128 and cpp_bin_float_quad

@cosurgi
Copy link

cosurgi commented Jun 13, 2020

I just wanted the same max_digits10 for float128 and cpp_bin_float_quad

yeah. I know that I hijacked this thread a little. You were touching the lines of code which are very relevant to what I am doing right now ;)

https://www.exploringbinary.com/number-of-digits-required-for-round-trip-conversions/ is helpful.

The table in this paper indicates that 3 digits is enough, my conclusions so far are more pessimistic :(

@pabristow
Copy link
Contributor

https://github.com/mdickinson/float-proofs/blob/master/misc/base_conversion_notes.pdf

Does brute force for all binary sizes extrapolating inequality used in (above) example 8 provide a way for a confirmation of which formula is correct for round-tripping (the bottom line test)?

@cosurgi
Copy link

cosurgi commented Jun 13, 2020

Thank you, this article is a good find. I am writing a brute force right now :) We will see pretty soon.

@cosurgi
Copy link

cosurgi commented Jun 13, 2020

I realized that my tests which involved MPFR roundtripping have higher fidelity requirements than a basic binary ↔ string roundtripping. They require that the MPFR constructed from a string had all the "extra" bits to be set to zero. Otherwise going through the function would sometimes find a "spot" where the lower bits had significant impact on the function value (an unlikely example: the tan(x) sign change). This is why 4 or 5 extra digits make it better in my test: because it sets more and more lower bits to zero. So in fact I should have been using string with number of digits required by the MPFR_200 when feeding it a string from long double! I made here a quite specific mistake :) A lesson learnt.

Simply put I needed to recognize the distinction that when I want to create MPFR_600 (later tests I was doing with 600) I need a lot more digits to make sure that all the lower-significant bits are created with zeros. Otherwise these lower bits reflect the "cut" decimal number it received.

So that wasn't the test which we needed here for basic roundtripping.

Now, getting back to topic. I did the brute force search by roundtripping a random number x∈[0,1) and 1/x∈[1,∞) 1e7 times. I did the search for two kinds: roundtrip between the same type. And roundtrip to MPFR_600 (this roundtrip which required that all lower significant bits are set to zero). Here are the results:

type sufficient extra digits found maximum extra digits N=significand bits length of 2^-N
float 3 34 24 24
double 2 56 53 53
long double 3 61 64 64
float128 3 96 113 113
MPFR 62dec 207bit 2 161 207 207

The first two rows in the table reflect results from roundtripping, and creating MPFR_600 with all extra bits set to zeros. Third column says how many bits are there.

The last column "length of 2^-N" is the number of decimal digits after the point:

2^-24  = 0.000000059604644775390625
2^-53  = 0.00000000000000011102230246251565404236316680908203125
2^-64  = 0.0000000000000000000542101086242752217003726400434970855712890625
2^-113 = 0.00000000000000000000000000000000009629649721936179265279889712924636592690508241076940976199693977832794189453125
2^-207 = 0.000000000000000000000000000000000000000000000000000000000000004861730685829016958706300042015722062961134506813411822735247355304452214090143313424956893066963763247630148089939439159934408962726593017578125

You can look up details in the commit with the C++ function, and the pipeline which calculates this stuff for each type.

I am now quite confident that using extraStringDigits10==4 is always enough for the purposes of this thread. You could squeeze this into 3 and I would complain just a little ;) My findings also more or less agree with the table which you have linked before.

// never be exactly an integer so we can replace by trunc(log10(2) * d) + 2
// and avoid the call to ceil:
//
return static_cast<unsigned>(0.301029995663981195213738894724493026768189881462108541310 * d) + 2;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you force compiler to use all these digits in this log10(2) value? I had problems forcing constexpr to work with double alone.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my bad, those extra digits are wasted - it's a double precision constant. BTW I tried a reasonably exhaustive search comparing double precision calculation to 200-digit calculation and couldn't find a bit count value where they disagreed. float precision is a different matter though.

Copy link

@cosurgi cosurgi Jun 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think you could leave these wasted digits. Maybe someday the compilers will start picking them up :)

Since that's a double constant, with 15 (up to 17 :) digits, it means that this calculation will be incorrect for binary digits>1e15. That corresponds to 301029995663981 decimal digits. I didn't use MPFR_301029995663981 yet.

float precision is a different matter though.

This calculation agrees for float, it has 24 bits and 6 digits10. And:

trunc(log10(2)*24)+2 == 9, just like in that table and on page 46 of IEEE standard.

BTW I tried a reasonably exhaustive search comparing double precision calculation to 200-digit calculation and couldn't find a bit count value where they disagreed.

oh. you meant a kind of calculation which I talked a lot about in the earlier post? I had plenty of examples where they disagree. But now that I increased extraStringDigits10, I have a hard time finding them again :) Today is too late. I will check tomorrow. Thank you for trying this!

Copy link

@cosurgi cosurgi Jun 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That corresponds to 301029995663981 decimal digits

... provided that the unsigned type is large enough :)

About these errors: maybe they way in which I calculate the relative error is to blame. I will check this tomorrow.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know a clean method of checking a particular bit value in (any) floating point significand?

Comparing bits would be much better, I suspect that my calculation of relative error has its own numerical inaccuracy problems :/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boost::math::float_distance ?

Only for two types the same though. For two different types, I would round the high precision type to the same number of bits as the low precision type so the comparison is against a "correctly rounded" value.

Copy link

@cosurgi cosurgi Jun 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a way with frexp function:

#include <boost/math/constants/constants.hpp>
#include <boost/multiprecision/float128.hpp>
#include <boost/multiprecision/mpfr.hpp>
#include <iostream>
#include <limits>
#include <vector>

template <typename T> int sign(T val) { return (T(0) < val) - (val < T(0)); }

class DecomposedReal {
private:
	int                        sig;
	int                        exp;
	std::vector<unsigned char> bits;

public:
	template <typename Rr> DecomposedReal(Rr x)
	{
		int ex   = 0;
		Rr  norm = frexp(abs(x), &ex);
		sig      = sign(x);
		exp      = ex - 1;
		ex       = 0;
		int pos  = 0;
		bits.resize(std::numeric_limits<Rr>::digits, 0);
		while (norm != 0) {
			pos -= ex;
			assert((ex <= 0) and (pos < int(bits.size())) and (pos >= 0));
			bits[pos] = 1;
			norm -= static_cast<Rr>(0.5);
			norm = frexp(norm, &ex);
		};
	}
	template <typename Rr> Rr rebuild()
	{
		Rr  ret = 0;
		int i   = 0;
		for (auto c : bits) {
			if (c != 0) {
				ret += pow(static_cast<Rr>(2), static_cast<Rr>(exp - i));
			}
			++i;
		}
		return ret * static_cast<Rr>(sig);
	}
	template <typename Rr = double> void print()
	{
		std::cout << "sign : " << sig << "\n";
		std::cout << "exp  : " << exp << "\n";
		std::cout << "bits : ";
		for (auto c : bits)
			std::cout << int(c);
		std::cout << "\nreconstructed number: " << rebuild<Rr>() << "\n\n";
	}
};

template <typename Rr> void test(const Rr& arg)
{
	std::cout << std::setprecision(std::numeric_limits<Rr>::digits10 + 3);
	std::cout << "original number     = " << arg << "\n";
	DecomposedReal d(arg);
	d.print<Rr>();
	assert(arg == d.rebuild<Rr>());
};

int main()
{
	test<float>(-1);                                  // 2^0
	test<double>(0);                                  // 0
	test<long double>(-4);                            // -2^2
	test<boost::multiprecision::float128>(0.375);     // 2^-2+2^-3
	test<boost::multiprecision::mpfr_float_50>(1036); // 2^3+2^2+2^10
	test<boost::multiprecision::mpfr_float_100>(boost::math::constants::pi<boost::multiprecision::mpfr_float_100>());
}

gives output:

original number     = -1
sign : -1
exp  : 0
bits : 100000000000000000000000
reconstructed number: -1

original number     = 0
sign : 0
exp  : -1
bits : 00000000000000000000000000000000000000000000000000000
reconstructed number: 0

original number     = -4
sign : -1
exp  : 2
bits : 1000000000000000000000000000000000000000000000000000000000000000
reconstructed number: -4

original number     = 0.375
sign : 1
exp  : -2
bits : 11000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
reconstructed number: 0.375

original number     = 1036
sign : 1
exp  : 10
bits : 100000011000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
reconstructed number: 1036

original number     = 3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067951
sign : 1
exp  : 1
bits : 1100100100001111110110101010001000100001011010001100001000110100110001001100011001100010100010111000000011011100000111001101000100101001000000100100111000001000100010100110011111001100011101000000001000001011101111101010011000111011000100111001101100100010010100010100101000001000011110011000111000110100000001001101110111101111100101
reconstructed number: 3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067951

I suppose that now I can do some bit comparison stuff :)

@cosurgi
Copy link

cosurgi commented Jun 13, 2020

I'm not sure if this link will work for long, but I found the "IEEE Standard for Floating-Point Arithmetic 2019", there on pages 44-46 you have discussion about this conversion. Looks like your formula trunc(log10(2) * d) + 2 is exactly the same as on page 46 in that doc! :)

EDIT: I open that site with javascript disabled. In fact I have JS disabled almost everywhere, except github & gitlab ;)

@jzmaddock
Copy link
Collaborator Author

Looks like your formula trunc(log10(2) * d) + 2 is exactly the same as on page 46 in that doc! :)

Well that's a relief then ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants