/ OpenDreamKit Public

# D5.6: Parallelise the relation sieving component of the Quadratic Sieve and implement a parallel version of Block-Wiedemann linear algebra over GF2 and implement large prime variants.#119

Closed
opened this issue Sep 8, 2015 · 22 comments
Closed

# D5.6: Parallelise the relation sieving component of the Quadratic Sieve and implement a parallel version of Block-Wiedemann linear algebra over GF2 and implement large prime variants. #119

opened this issue Sep 8, 2015 · 22 comments
Assignees
Labels
Milestone

# Context

One of the approaches toward OpenDreamKit's aim of High Performance Mathematical Computing is to explore the addition of very fine grained parallelism (e.g. threading or SIMD) to some key computational components like Flint and MPIR. In this deliverable, we tackle two typical algorithms of computer algebra: integer factorization and block Wiedemann for finding kernel vectors of matrices over a finite field.

Whilst the threading of the quadratic sieve looks very promising, our conclusion is that the SIMD speedup that we implemented for the block Wiedemann algorithm (and the threaded experiments we did) weren't particularly useful for the quadratic sieve. We expect that only a number field sieve implementation, with matrices that are truly massive (millions of rows) would benefit from the kind of speedup we saw.

This doesn't mean that SIMD is not useful in general, it very much is, see for example D5.5 and D5.7 where SIMD proves to be a big win. SIMD just hasn't proved to be a major benefit for a project like the quadratic sieve, where only a small part of the runtime depends on the linear algebra phase of the algorithm.

The other improvements for the quadratic sieve that we describe below, such as threading the relation sieving where we get a speedup of up to 3 on 4 cores, and the various algorithmic improvements that we describe below, turned out to be much more important in practice.

# Report on writing a parallel implementation of the quadratic sieve

## Problem description

The Quadratic Sieve is an algorithm for factoring integers n = pq, with n typically in the 15-90 decimal digit range. For this deliverable, we have implemented a quadratic sieve with the following features:

• Small prime variant
• Large prime variant with in-memory hash table and on-disk pseudorelation storage
• Multiple-polynomial self initialisation with Carrier-Wagstaff method
• Polynomial generation using Bradford-Monagan-Percival method
• Cache efficient sieving
• Block-Lanczos linear algebra (adapted from msieve)
• Knuth-Schroeppel multiplier
• Threaded relation sieving (using OpenMP)

### Results

Some partial progress towards this goal had already been completed before the OpenDreamKit project began, but the code didn't compile or run and was in an unusable state. This included a partial implementation of a single large prime variant quadratic sieve implemented by a Google Summer of Code student, based on a prior version for small integers only. This was not parallelised and a rough sketch only. It didn't compile and wasn't correct or complete.

We have completed the implementation and the code has already been merged into the `Flint` repository.

Our implementation is not competitive below 130 bits (~40 digits), due to the Carrier-Wagstaff/Bradford-Monagan-Percival combination. Here are some timings for larger factorisations comparing `Pari/GP` with our implementation in `Flint` on one and four cores.

In our tests, no significant improvements were observed for larger numbers of cores, so we report only on the improvements for four cores here.

n (bits) Pari Flint 4 core
130 0.11 0.24 0.28
140 0.22 0.33 0.30
150 0.43 0.55 0.39
160 0.79 0.99 0.56
170 1.8 1.4 0.83
180 3.8 3.0 1.7
190 8.6 4.8 2.4
200 17.6 10.2 6.0
210 38.4 20.9 13.5
220 67.7 33.2 15.8
230 199 78 34
240 257 140 52
250 444 248 92
260 1175 708 283
270 2199 1522 544

## Future possibilities

Possible future projects could include:

• Cache efficient handling of factor base and self-initialisation data structures
• PPMPQS method
• Parallelising the file handling
• Writing an efficient self-initialising MPQS for 20-128 bits.

### Testing the quadratic sieve code

The `Flint` repository is available here: https://github.com/wbhart/flint2

To build and test the code mentioned above, you must have `GMP/MPIR` installed on your machine (refer to your system documentation for how to do this). Then do:

``````git clone https://github.com/wbhart/flint2
cd flint
./configure --with-mpir=/path/to/mpir --with-mpfr=/path/to/mpfr --enable-openmp
make
make check MOD=qsieve
``````

Full instructions on how to build `Flint` are available in the `Flint` documentation, available at the `Flint` website.

The quadratic sieve functionality is made available by including qsieve.h, and the main interface is via the function:

``````void qsieve_factor(fmpz_factor_t factors, const fmpz_t n)
``````

## Blog post

A blog post about the improvements to the quadratic sieve in Flint is available at https://wbhart.blogspot.de/2017/02/integer-factorisation-in-flint.html .

# Report on parallel block Wiedemann

## Problem description

The Block Wiedemann algorithm computes kernel vectors of a matrix over a finite field. Along with the Block Lanczos algorithm it is often used in the Quadratic Sieve and Number Field sieve over GF2, but also has other applications over a prime p.

## Work done

We investigated parallelising the Block Wiedemann algorithm using the external library `M4RI` for basic linear algebra over GF2. This required three components to be written.

• A sparse matrix library over GFp for `Flint`
• An implementation of the Block Wiedemann algorithm over GF2
• A naive quadratic sieve to generate realistic data

These were implemented by Anders Jensen and Alex Best and are available as modules for Flint and as an external implementation respectively.

## Conclusions

Our experiments showed that it was possible to make use of the highly optimised SIMD arithmetic in the external library `M4RI` as a form of parallelism, to speed up the Block Wiedermann algorithm, and that this outperformed Block Wiedermann using `M4RI` in threaded mode, at least for the size and sparsity of problem encountered in the quadratic sieve! The result was comparable to the Block Lanczos code already used in Flint for the quadratic sieve.

The sparse matrix library was not the focus of this deliverable, but will be expanded at a later date and included in Flint. Because of the additional external dependency on `M4RI`, the Block Wiedemann implementation isn't eligible for inclusion in Flint, but the code is available to members of the OpenDreamKit collaboration. Future work may see the removal of this external dependency so that it can be merged into Flint. However, we currently feel that this isn't something that should be made a priority.

In fact, the quadratic sieve implementation we now have, even in parallel mode, doesn't have linear algebra as a bottleneck. This is because of the extremely small factor base sizes that are possible with the particular implementation of the quadratic sieve that we completed as part of this project.

added this to the D5.6 milestone Sep 8, 2015
modified the milestones: Month 18: 2017-02-28, D5.6 Mar 22, 2016
assigned wbhart and unassigned ClementPernet Sep 9, 2016

### bpilorget commented Nov 21, 2016

 @ClementPernet (WP leader) and @wbhart (Lead beneficiary) This deliverable is due for February 2017

### bpilorget commented Nov 21, 2016

 Thank you for all the information. The problem is that Month 18, February 2017, is a very strict deadline for the Commission. I let @ClementPernet and @nthiery react if they have any suggestion to help you with this.

### wbhart commented Nov 21, 2016

 My post was simply meant to be an up-to-date progress report. We are of course fully aware of all the issues locally and can work through potential solutions here. If I were to request any changes at this stage it would simply be to change the wording of the deliverable from triple to double large prime variant. Even with this change, we would still be delivering a parallelised, internationally competitive quadratic sieve. The two circumstances that led to this were that Anders got a permanent position (we were sad that he left, but happy that he got a permanent job!) and secondly, preliminary work on the double large prime variant that was supposed to be done before ODK started, never got completed. So, to put it more concretely as a question: is it possible to change the word "triple" to "double" in the deliverable for this project, and if we did that, would anyone object? It certainly wouldn't be a change that would affect any other deliverables for ODK, and would scarcely affect the outcome, but would potentially save a lot of unnecessary panic (and work). On 21 November 2016 at 17:47, bpilorget notifications@github.com wrote: Thank you for all the information. The problem is that Month 18, February 2017, is a very strict deadline for the Commission. I let @ClementPernet https://github.com/ClementPernet and @nthiery https://github.com/nthiery react if they have any suggestion to help you with this. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub #119 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AAOzpo1LTPHGToF6oZyP2GGqTWGEB5P6ks5rAcsPgaJpZM4F5zGd .

### nthiery commented Nov 24, 2016

 No strong opinion myself;so by default I will just trust your judgement. We can use the upcoming amendment to adjust the deliverable. Opinion anyone? Especially potential users of such features?

### JohnCremona commented Nov 24, 2016 via email

 On 24 November 2016 at 09:14, Nicolas M. Thiéry ***@***.***> wrote: No strong opinion myself;so by default I will just trust your judgement. We can use the upcoming amendment to adjust the deliverable. Opinion anyone? Especially potential users of such features? I think Bill's suggestion is eminently reasonable. From the outside, the difference between "double prime" and "triple prime" looks like a small incremental change, but of course implementing these is a lot of work and extremely hard to estimate in advance how long it would take. … — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#119 (comment)>, or mute the thread .

### KBelabas commented Nov 24, 2016 via email

 * Nicolas M. Thiéry [2016-11-24 10:14]: No strong opinion myself;so by default I will just trust your judgement. We can use the upcoming amendment to adjust the deliverable. Opinion anyone? Especially potential users of such features? Triple large prime variation will not matter unless users run a major factoring effort (few months of sieving first). IMHO it's a good research goal, but certainly not a critical feature for users. I'd release PPMPQS as a finished product on schedule. And (then or later) PPP as a prototype, for alpha-testing and feedback. Cheers, K.B. P.S. I'm interested in integer factoring benchmarks. (Esp. in the < 90 digits range :-) -- Karim Belabas, IMB (UMR 5251) Tel: (+33) (0)5 40 00 26 17 Universite de Bordeaux Fax: (+33) (0)5 40 00 21 23 351, cours de la Liberation http://www.math.u-bordeaux.fr/~kbelabas/ F-33405 Talence (France) http://pari.math.u-bordeaux.fr/ [PARI/GP] `

### wbhart commented Nov 24, 2016 via email

 Thanks for the support everyone! We had a brief discussion locally about this, and I've actually come up with an even better strategy and suggested it to Wolfram. It's closer to what Karim recommends, but is slightly more involved. Therefore I'll wait until Wolfram has had time to follow the whole thing up. Of course I'll update the ticket once we've decided exactly which changes we want to request, and hopefully people can chime in if they agree that it's a sound plan. Bill. … On 24 November 2016 at 22:56, Karim Belabas ***@***.***> wrote: * Nicolas M. Thiéry [2016-11-24 10:14]: > No strong opinion myself;so by default I will just trust your judgement. We can use the upcoming amendment to adjust the deliverable. > > Opinion anyone? Especially potential users of such features? Triple large prime variation will not matter unless users run a major factoring effort (few months of sieving first). IMHO it's a good research goal, but certainly not a critical feature for users. I'd release PPMPQS as a finished product on schedule. And (then or later) PPP as a prototype, for alpha-testing and feedback. Cheers, K.B. P.S. I'm interested in integer factoring benchmarks. (Esp. in the < 90 digits range :-) -- Karim Belabas, IMB (UMR 5251) Tel: (+33) (0)5 40 00 26 17 Universite de Bordeaux Fax: (+33) (0)5 40 00 21 23 351, cours de la Liberation http://www.math.u-bordeaux.fr/~kbelabas/ F-33405 Talence (France) http://pari.math.u-bordeaux.fr/ [PARI/GP] ` — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#119 (comment)>, or mute the thread .

### ClementPernet commented Nov 25, 2016

 I agree with the previous comments: having a strong double large prime to deliver is good enough, and potientially an alpha version later on of the triple version even better. Keep up informed about your new strategy.

changed the title D5.6: Parallelise the relation sieving component of the Quadratic Sieve and implement a parallel version of Block-Wiederman linear algebra over GF2 and the triple large prime variant. D5.6: Parallelise the relation sieving component of the Quadratic Sieve and implement a parallel version of Block-Wiederman linear algebra over GF2 and implement large prime variants. Dec 6, 2016

### wbhart commented Jan 19, 2017 • edited

 @KBelabas you said you were interested in timings in the < 90 digit range. Here is what I have so far. As you can see this Carrier-Wagstaff/Bradford-Monagan-Percival approach we have taken is absolute rubbish below 170 bits. But it is working nicely from about 170 bits onwards. In future, we might use the Pari strategy for smaller factorisations. I (re)read the Pari source code the other night to remind myself how that works. In the past we've been unable to make SIMPQS robust enough for small factorisations with the polynomial strategy we were previously using, but the polynomial generation in Pari seems to be robust enough. (The worry is always running out of polynomials or having too many duplicate relations, both of which we were regularly hitting with the old simpqs I wrote years ago.) Timings are in seconds on a 2.2GHz Opteron server (single core only), using Pari-2.9.1 and the latest development version of Flint. ``````n (bits) Pari Flint ================= 30 0.00016 0.11 40 0.00073 0.11 50 0.0012 0.11 60 0.0019 0.11 70 0.0031 0.11 80 0.0056 0.12 90 0.0082 0.13 100 0.013 0.15 110 0.022 0.17 120 0.051 0.19 130 0.11 0.24 140 0.22 0.33 150 0.43 0.55 160 0.79 0.99 170 1.8 1.4 180 3.8 3.0 190 8.6 4.8 200 17.6 10.2 210 38.4 20.9 220 67.7 33.2 230 199 78 240 257 140 250 444 248 260 1175 708 270 2199 1522 ================ ``````

### KBelabas commented Jan 20, 2017 via email

 * wbhart [2017-01-19 18:29]: @KBelabas you said you were interested in timings in the < 90 digit range. Here is what I have so far. As you can see this Carrier-Wagstaff/Bradford-Monagan-Percival approach we have taken absolute rubbish below 170 bits. But it is working nicely from about 170 bits onwards. In future, we might use the Pari strategy for smaller factorisations. I (re)read the Pari source code the other night to remind myself how that works. In the past we've been unable to make SIMPQS robust enough, but it seems to be pretty robust in Pari. n (bits) Pari Flint ================= 30 0.00016 0.11 [...] 270 2199 1522 Interesting, thanks! Can you send me the benchmark code used for Pari ? Cheers, K.B. -- Karim Belabas, IMB (UMR 5251) Tel: (+33) (0)5 40 00 26 17 Universite de Bordeaux Fax: (+33) (0)5 40 00 21 23 351, cours de la Liberation http://www.math.u-bordeaux.fr/~kbelabas/ F-33405 Talence (France) http://pari.math.u-bordeaux.fr/ [PARI/GP] `

### wbhart commented Jan 20, 2017 via email

 It was very primitive @KBelabas. I just did something like the following in GP: f(b,k)={for(i=1,k, s = nextprime(random(2^(b/2)))*nextprime(random(2^(b/2))); factorint(s, 14))} # f(30,100000) for 30 bits, and so on. Of course I divided by the number of iterations to get the average time. The number of iterations varied between the two systems, especially where Flint was so much slower. But for Pari I generally tried to keep the total time below about 60s, though in no case did I use less than 5 iterations below 230 bits. The numbers I factored from 230 bits onward are: 230: 1033005553733597551486054475473301536956276424884162978347314111210331 240: 694421043016559099938956613657502234908133400914093615344502918125299503 250: 846521313169799911792277779197149004149926546076302314337091688511275034397 260: 1068467576946595043900454053298713863804327606923444414074010450337936633096519 270: 618784014304300826386461364940832504316398894596588623706570810918031692462246587 The numbers you end up with, the way I'm generating them, are probably all just slightly smaller than the stated number of bits. But both systems were factoring the same numbers from 230 bits onward. Bill. … On 20 January 2017 at 01:28, Karim Belabas ***@***.***> wrote: * wbhart [2017-01-19 18:29]: > @KBelabas you said you were interested in timings in the < 90 digit range. Here is what I have so far. As you can see this Carrier-Wagstaff/Bradford-Monagan-Percival approach we have taken absolute rubbish below 170 bits. But it is working nicely from about 170 bits onwards. > > In future, we might use the Pari strategy for smaller factorisations. I (re)read the Pari source code the other night to remind myself how that works. In the past we've been unable to make SIMPQS robust enough, but it seems to be pretty robust in Pari. > > n (bits) Pari Flint > ================= > 30 0.00016 0.11 [...] > 270 2199 1522 Interesting, thanks! Can you send me the benchmark code used for Pari ? Cheers, K.B. -- Karim Belabas, IMB (UMR 5251) Tel: (+33) (0)5 40 00 26 17 <+33%205%2040%2000%2026%2017> Universite de Bordeaux Fax: (+33) (0)5 40 00 21 23 <+33%205%2040%2000%2021%2023> 351, cours de la Liberation http://www.math.u-bordeaux.fr/~kbelabas/ F-33405 Talence (France) http://pari.math.u-bordeaux.fr/ [PARI/GP] ` — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#119 (comment)>, or mute the thread .

### wbhart commented Jan 24, 2017

 The parallel version of the sieve is working. It maxes out at about 8 threads on my machine, with a roughly 6x speedup for sufficiently large factorisations.

changed the title D5.6: Parallelise the relation sieving component of the Quadratic Sieve and implement a parallel version of Block-Wiederman linear algebra over GF2 and implement large prime variants. D5.6: Parallelise the relation sieving component of the Quadratic Sieve and implement a parallel version of Block-Wiedemann linear algebra over GF2 and implement large prime variants. Jan 25, 2017

### nthiery commented Feb 6, 2017

 Dear M18 deliverable leaders, Just a reminder that reports are due for mid-february, to buy us some time for proofreading, feedback, and final submission before February 28th. See our README for details on the process. In practice, I'll be offline February 12-19, and the week right after will be pretty busy. Therefore, it would be helpful if a first draft could be available sometime this week, so that I can have a head start reviewing it. Thanks in advance!

added the FLINT label Feb 21, 2017

### nthiery commented Feb 24, 2017

 Note: I am now proofreading the issue description above.

mentioned this issue Feb 24, 2017

### nthiery commented Feb 24, 2017

 Proofreading done. The issue description is fair game again. Same as for D5.5: this is looking good; I just left a little TODO.

### wbhart commented Feb 27, 2017

 I've dealt with the TODO and added a link to the blog post.

### wbhart commented Feb 27, 2017

 @nthiery All done I believe.

### nthiery commented Feb 27, 2017 via email

 ***@***.*** All done I believe. Excellent, thanks! Submission planned for after lunch :-)

### nthiery commented Feb 27, 2017

 Submitted! Thanks a lot @wbhart :-)

closed this as completed Feb 27, 2017
mentioned this issue Mar 14, 2017