Skip to content

Conversation

@leviska
Copy link

@leviska leviska commented May 26, 2021

in current implementation of integer sqrt there are lines saying, that it's slow and must be rewritten to karatsuba sqrt method

that's what I did

I've wrote a fast 64 bit sqrt using std::sqrt with correction, because it's simple but still faster, than karatsuba implementation
I've also wrote newton sqrt, but it's kinda the same on small numbers, and much longer on big numbers (probably because of kinda slow multiplication)

also I wrote some tests, because as far as I understood, there was only 1 simple test for checking actual result, not for interfaces, consts and such

a little about motivation:

I could use GMP, but on small (128-256 bits) integers, fixed size cpp_int is way faster, than gmp, so for this kind of numbers it is actually faster to use cpp_int, but the sqrt is too slow, and was taking too much time
on windows, the difference is even worse

some benchmarks, that I run are here (you can find code here https://github.com/leviska/boost-sqrt )
sqrt https://pastebin.com/WvNzVH0g (first number is the size of cpp_int, second is the size of the integer)
operations https://pastebin.com/Tk4mhAfq (you can see, that for 128-256 bit integers cpp_int is actually faster)

@ckormanyos
Copy link
Member

Hi @leviska good idea. We put some effort into this in last year's GSoC, but have not put it into Boost yet.

We will definitely take a look at your work. The timing looks impresive for hundreds and thousands of bits --- as expected. Very good.

I just fired up the CI process and approved your run on CI. So let's see how this goes. It might take a while to negotiate all the details of this thing, but it is in the direction that I think makes sense. Let's give it a whirl in CI.

@ckormanyos
Copy link
Member

I am also sure John will be interested in this activity, as we also are.

@jzmaddock
Copy link
Collaborator

This is really cool - many thanks!

I'll investigate in more detail when I've finished some bug quashing!

@ckormanyos
Copy link
Member

ckormanyos commented May 27, 2021

Hi @leviska you might find this hard to believe, but i think some of the failures in CI are due to a small artifact in your kara sqrt implementation.

In this line, you do not initialize the variable q in a compile-time constexpr context.

I think it might help to try to modify the line like:

Integer q { };

@leviska
Copy link
Author

leviska commented May 27, 2021

hi, yes, I saw that CI failed and will try to fix it in a few days

@leviska
Copy link
Author

leviska commented May 31, 2021

Ok, so it seems that I've fixed one of the bugs and some linting things, but there is one thing that I didn't fix

Some compilers (clang only???) doesn't have constexpr std::sqrt, so I wanted to replace it with just karatsuba for constexpr version

I found __builtin_is_constant_evaluated function, that can help me with that, but it needs to be put accurately, so I'll wait for CI/CD to finish, see what compilers and versions are missing this, and try to fix it for them

Also, if there is some define for this builtin I would be glad, if you pointed me to it :)

@jzmaddock
Copy link
Collaborator

Actually I think it's only gcc that has a constexpr sqrt at present.

There is a macro BOOST_MP_IS_CONST_EVALUATED(x) which you will find used round the place that returns true only if x is a constexpr variable. In C++20 it becomes std::is_constant_evaluated(), but for older compilers (mostly GCC) there are options for implementing this such as __builtin_constant_p. There is also BOOST_MP_NO_CONSTEXPR_DETECTION set when BOOST_MP_IS_CONST_EVALUATED doesn't work. If you look for examples you should get the idea, note that the argument to BOOST_MP_IS_CONST_EVALUATED probably needs to be a builtin integer, but just calling .size() on the cpp_int_backend should be OK for that.

@leviska
Copy link
Author

leviska commented Jun 1, 2021

It seems that I've fixed basic checks, can you run additional workflows, please? :)

Also if you have any review suggestions, I'm open to them

@jzmaddock
Copy link
Collaborator

Running GHA now.

@ckormanyos
Copy link
Member

ckormanyos commented Jun 1, 2021

It looks like you have added a base-case square root for small bit counts that is actually capable of C++20 constexpr-ness.

That is the right way to go.

As this progresses, there is probably a more efficient version thereof that handles more bits and might improve efficiency.

Let's see how this goes and we can as a next step get a fast basecase square root.

template <class Integer>
BOOST_MP_CXX14_CONSTEXPR typename std::enable_if<boost::multiprecision::detail::is_integral<Integer>::value, Integer>::type sqrt(const Integer& x, Integer& r)
{
if (x == Integer(0)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point, these casts shouldn't be needed (and might actually be expensive), you could probably rewrite as:

if(x.is_zero()){
   r = 0u;
   return 0u;
}

else
#endif
// we can calculate it faster with std::sqrt
if (bits <= 64) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you felt so inclined to do so... it might pay to check which is fastest here:

  • double precision sqrt and 53 bits.
  • long double and 64 bits, but now using the old x87 co-processor.
  • __float128 and 113 bits (when supported).

There is also an assumption here that long double has a 64-bit mantissa, but 53 and 113 bit mantissa's are also common as I'm sure you know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it make sense to check? :

if(bits <= std::numeric_limits<double>::digits)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and then use sqrt from <cmath> if that is your inclination for base-case square root?

--g;
} while (g >= 0);
// https://hal.inria.fr/file/index/docid/72854/filename/RR-3805.pdf
size_t b = bits / 4;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, your friendly curmudgeon here ;)

One of the things we found when implementing Karatsuba multiplication was that variable/temporary control was critical to performance. I count at least 7 temporaries here, I admit it would make the code less neat, but some variable reuse would be a good thing. This won't make much difference for fixed-precision cpp_int's, but for variable precision case, it saves a bunch of memory allocations.

The other thing we did - and this is very cpp_int specific - was to figure out up front how much memory would be required for all the variables used in the algorithm (right through all the recursions), and allocate that up front, then share it up between the variables in each recursion. See

// There are 2 entry point routines for Karatsuba multiplication:
and following code.

However, I do appreciate this is already a noticeable improvement, so if you've had enough for now, that's fair enough frankly!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also forgot to say.... THANK YOU for this, it's much needed for sure.

@leviska
Copy link
Author

leviska commented Jun 1, 2021

also, some checks failed, but I'm not sure, is this my fault, or not? there's something about bin_float, and I'm a little confused

@jzmaddock
Copy link
Collaborator

lso, some checks failed, but I'm not sure, is this my fault, or not?

Sigh... only sort of, something, maybe your changes or not, has pushed the size of those object files over the limit msvc can cope with, ignore those for now, I'll look at them tomorrow.

@leviska
Copy link
Author

leviska commented Jun 2, 2021

Ok, so I've sped up the code a bit
Comparison with previous version is here: https://pastebin.com/GBwWJSeY (5-20% faster)
What I did is replaced all the long arithmetic lines into a @= b ones, which is (I hope at least) doesn't allocate. And did a little here and there to reduce allocs.

Some more info:

  1. I tried to return value as argument (make func void), and it's actually kinda slowed the function for a little (for like 1%). So I reverted the changes. Seems like return value optimisations are doing good here :)
  2. There is a little slow down on 32-64 bits, because we now always create the temp variable. It can be sped up back to the original easily with something like if msb(x) < 64 { return fast_sqrt(x); }, I just forgot to do that, probably will fix it
  3. I tried to write function without additional allocs (basically with single Integer q for all the recursions), and it happens to be slower (than this new version, that was taken as base): https://pastebin.com/n3mLM9rz
    I assume, that I could do something horribly wrong, so here is the code https://pastebin.com/1p7Yg2TF , so may be you'll find the reason for it being slower.
    The idea for this is that we actually need new variable only to store x >> (b * 2), so why don't we do nothing with x, and in the place where we actually need to use it (it's actually three places: the 64bit sqrt, and two bit AND (that I use to get the a_i variables from the original paper)), we will adjust for offset, which will be equal to all b*2 from previous calls. I think, that the main reason is that because we keep full size of x, all the operations with it and t variable are slower (because the numbers are larger), so overall this slows us down, not speeds up

Also I have a question about t = 0 and t = x lines: does it allocate? How to do it the fastest way (reuse the buffer inside t, if it exists)?

I thought about float128, and I think I will skip it, because I don't know about it too much (is there even an instruction/function for sqrt(floa128)?). I left the 64 bits there, but I'll try to experiment with it, because as I just found out you can directly cast cpp_int to double, so this "while" algo can be good even for bigger numbers may be?

I can't write is_zero, because native numbers don't support it, and I didn't wanted to do a lot of template programming. May be in the future I'll do it :)

@leviska
Copy link
Author

leviska commented Jun 8, 2021

Hi, do you have any feedback yet? 👀

@ckormanyos
Copy link
Member

do you have any feedback yet?

Sorry about that. I did not notice that we need to approve the run in CI, which I have just approved and CI is running at the time of writing this post.

Thanks for pointing this out. Let's see how CI runs.

This is an interesting case involving a function that is very interesting for Multiprecision and sought after. Yet shakes the foundation of the architecture, as integer suare root is widely used in multiprecision integer, rational, float and complex.

We should see how CI runs, then see what John thinks. If CI passes, we tend to trust our CI testing rather well. If CI testing turns out to be incomplete, we improve CI testing. Maybe we could also run some dedicated user tests and ensure that everything is also working cleanly in a radically multithreaded environment. We also need to be wary if any problems might creep up on other architectures or in more massive parallelization.

As a first step, CI is off to the races...

@jzmaddock
Copy link
Collaborator

Other than my previous comments... do we have a Google benchmark before and after? Something like sqrt(2^n-1) for n < 100 < n < 1000. Should be O(N) I think before, and O(ln(N)) now?

@ckormanyos
Copy link
Member

ckormanyos commented Jun 9, 2021

do we have a Google benchmark before and after? Something like sqrt(2^n-1) for n < 100 < n < 1000

Actually for testing integer sqrt, I would recommend populating the binary digits of the integer fully with pseudo-random numbers when benchmarking. We could test branch vs. 1.76. When testing, one must ensure that the test candidates have enough binary digits to actually reach the expected range of Karatsuba benefit threshold, which also should be investigated where exactly that threshold is.

I could write up some benchmark code if that helps.

@leviska
Copy link
Author

leviska commented Jun 9, 2021

image
(on OX is the size of the int, on OY is the log(time) of execution in ns (sorry for Russian))

I have this graph, which is based on this benchmark results https://github.com/leviska/boost-sqrt/blob/main/fin/sqrt.json

If you want some other benchmark, I could do it, just tell me the details

@ckormanyos
Copy link
Member

ckormanyos commented Jun 9, 2021

I have this graph, which is based on this benchmark

This is very nice. I will study those details more closely. I will also run some spot benchmarks to compare with yours at just a few digit counts for fiexed and also report my findings.

If you want some other benchmark, I could do it, just tell me the details

Actually, I do want more --- for many many more digits to verify the sub-quadratic expectation of Karatsuba method. This should really kick in around 10,000 ... 100,000 decimal digits, probably becoming visible around 20,000 digits. I think could you add some points, maybe just for fixed, GMP, 1.76 at 2^15, 2^16, maybe even 2^17 binary digits? I would really like to see the original code rise quadratically and the Karatsuba remain non-quadratic. Maybe choose a table or a non-logarithmic scale for the bes exhibition of this behavior

@leviska
Copy link
Author

leviska commented Jun 9, 2021

Ok, so I did one more benchmark with 2^n scale https://pastebin.com/QmWN4weF
Input integers were random with set size

Some images:

sizes (6)
("honest" graph)

sizes (2)
(log time)

sizes (3)
(log time and log bits (except 96 bits))

(just noticed, that last size is cropped. It's "131072" bits)

@jzmaddock
Copy link
Collaborator

Just starting to work through this, some random observations:

  • It's nice and quick - well done!
  • I ran some random tests for about half a day and didn't find any errors, so that's good too :)
  • The current code is only used for native integers and isn't currently hooked up for multiprecision types at all - that goes via eval_sqrt. I'll take care of hooking that up.

I ran some tests using various std::sqrt implementations, what I'm basically seeing is:

  • 113-bit sqrtq is an order of magnitude slower than 80 or 64 bit sqrt and shouldn't be used as it's much slower than going via Karatsuba (this has implications where long double is a synthesised 128-bit type).
  • 80 and 64 bit sqrt's seem to take about the same amount of time to execute so we should definitely use 80-bit sqrt's when available. When it's not available we should be using 64-bit sqrt's (53 bit mantissa) to avoid to much bit twiddling in the correction code (at present the code assumes a 64-bit mantissa long double is always available which isn't true for say msvc).
  • I tried a "fast path" approach which leaves out all the Karatsuba code and just calls std::sqrt in a simple inline function when the Integer has fewer than 64 bits.... and.... it took exactly the same time!

More soon.

@leviska
Copy link
Author

leviska commented Jun 15, 2021

well done

thanks!

isn't currently hooked up for multiprecision types at all

oops 😰. Thanks for the fix :)

at present the code assumes a 64-bit mantissa long double is always available which isn't true for say msvc

it tries to convert to long double, but (afaik) without support it will be just double, and the correction code will fix the errors.
For 53 bit double the error after sqrt will be <=2 (<=1 from rounding into the double, and <=1 from sqrt and converting back), so cycles will not be long. I already mentioned this somewhere above :)

and.... it took exactly the same time

actually, it kinda could be faster, because there is this one line, which allocates unused variable for <64 bits

jzmaddock added a commit that referenced this pull request Jun 18, 2021
Hooks up backend-based version for cpp_int.
Also adds simple google-benchmark.
From #328.
@jzmaddock jzmaddock mentioned this pull request Jun 18, 2021
@jzmaddock
Copy link
Collaborator

There's a problem with the correction code at the end, but I don't quite see the issue at present, test case is:

   std::string s = "0x20000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000";
   s += "0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000";

   typedef boost::multiprecision::::number<boost::multiprecision::cpp_int_backend<66454, 66454, boost::multiprecision::unsigned_magnitude>, boost::multiprecision::et_off>   bi;

   bi si1(s.c_str());
   bi r;
   bi j = sqrt_new(si1, r);

Which asserts inside the final subtraction because r < q.

If we use a signed type like cpp_int, then we get a different issue: then the remainder comes out larger than the sqrt.

Any ideas?

@leviska
Copy link
Author

leviska commented Jun 20, 2021

I've checked this test both with your typedef and with cpp_int using my implementation (that I've commited), and it's all working fine

I saw, that you did some stuff to the function in another PR, may be there is a typo somewhere?

Or is sqrt_new my commited function?

@jzmaddock
Copy link
Collaborator

OK, here's a self contained test case - I've just cut and pasted your code from this PR for the purpose of the test.

Tests fail with unsigned types only:

//  Copyright 2020 John Maddock. Distributed under the Boost
//  Software License, Version 1.0. (See accompanying file
//  LICENSE_1_0.txt or copy at https://www.boost.org/LICENSE_1_0.txt

#include <iostream>
#include <boost/multiprecision/cpp_int.hpp>
#include <boost/multiprecision/gmp.hpp>
#include <boost/multiprecision/integer.hpp>
#include <cmath>


using namespace boost::multiprecision;

namespace ksqrt {

   template <class Integer>
   Integer karatsuba_sqrt(const Integer& x, Integer& r, Integer& t, size_t bits)
   {
#ifndef BOOST_MP_NO_CONSTEXPR_DETECTION
      // std::sqrt is not constexpr by standard, so use this 
      if (BOOST_MP_IS_CONST_EVALUATED(bits)) {
         if (bits <= 4) {
            if (x == 0) {
               r = 0u;
               return 0u;
            }
            else if (x < 4) {
               r = x - 1;
               return 1u;
            }
            else if (x < 9) {
               r = x - 4;
               return 2u;
            }
            else {
               r = x - 9;
               return 3u;
            }
         }
      }
      else
#endif
         // we can calculate it faster with std::sqrt
         if (bits <= 64) {
            const uint64_t int32max = uint64_t((std::numeric_limits<uint32_t>::max)());
            uint64_t val = static_cast<uint64_t>(x);
            uint64_t s64 = static_cast<uint64_t>(std::sqrt(static_cast<long double>(val)));
            // converting to long double can loose some precision, and `sqrt` can give eps error, so we'll fix this
            // this is needed
            while (s64 > int32max || s64 * s64 > val) s64--;
            // in my tests this never fired, but theoretically this might be needed
            while (s64 < int32max && (s64 + 1) * (s64 + 1) <= val) s64++;
            r = val - s64 * s64;
            return s64;
         }
      // https://hal.inria.fr/file/index/docid/72854/filename/RR-3805.pdf
      size_t b = bits / 4;
      Integer q = x;
      q >>= b * 2;
      Integer s = karatsuba_sqrt(q, r, t, bits - b * 2);
      t = 0u;
      bit_set(t, b * 2);
      r <<= b;
      t--;
      t &= x;
      t >>= b;
      t += r;
      s <<= 1;
      divide_qr(t, s, q, r);
      r <<= b;
      t = 0u;
      bit_set(t, b);
      t--;
      t &= x;
      r += t;
      s <<= (b - 1); // we already <<1 it before
      s += q;
      q *= q;
      // we substract after, so it works for unsigned integers too
      if (r < q) {
         t = s;
         t <<= 1;
         t--;
         r += t;
         s--;
      }
      r -= q;
      return s;
   }

   template <class Integer>
   Integer sqrt(const Integer& x, Integer& r)
   {
      if (x == 0u) {
         r = 0u;
         return 0u;
      }
      Integer t{};
      return karatsuba_sqrt(x, r, t, msb(x) + 1);
   }

   template <class Integer>
   Integer sqrt(const Integer& x)
   {
      Integer r(0);
      return sqrt(x, r);
   }

} // namespace ksqrt


int main()
{
   typedef boost::multiprecision::number<boost::multiprecision::cpp_int_backend<66454, 66454, boost::multiprecision::unsigned_magnitude>, boost::multiprecision::et_off>   bi;
   //typedef boost::multiprecision::cpp_int   bi;

   bi one(1);

   for (unsigned i = 0; i < 66454; ++i)
   {
      bi value = one << i;
      bi rem;
      bi result = ksqrt::sqrt(value, rem);
      if (result * result > value)
      {
         std::cout << "Failed test at i = " << i << std::endl;
      }
      if (result * result + rem != value)
      {
         std::cout << "Failed test at i = " << i << std::endl;
      }
      if (rem < 0)
      {
         std::cout << "Failed test at i = " << i << std::endl;
      }
      if ((result + 1) * (result + 1) <= value)
      {
         std::cout << "Failed test at i = " << i << std::endl;
      }
   }


}

@jzmaddock
Copy link
Collaborator

Ooops, sorry about the copy write declaration on that snippet.... cut and past error :(

@jzmaddock
Copy link
Collaborator

I might have found the cause - there's a bug in our existing Karatsuba multiplication - it gets the right values in the limbs but breaks some of the classes internal invariants. Testing the fix now....

@jzmaddock jzmaddock merged commit 019a169 into boostorg:develop Jun 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants